Datawhale X 李宏毅苹果书 AI夏令营 学习笔记2

112 阅读1分钟

1.2 线性模型

通常一个模型的修改,往往来自于对这个问题的理解,即领域知识。

把输入的特征 x 乘上一个权重,再加上一个偏置就得到预测的结果,这样的模型称为线性模型(linear model)

1.2.1 分段线性曲线

显然线性模型有很大的限制,这一种来自于模型的限制称为模型的偏差,无法模拟真实的情况。

红色线,即分段线性曲线(piecewise linear curve)可以看作是一个常数,再加上一堆蓝色的函数

Pasted image 20240829101759.png

构建红色曲线

//想起电路理论中的单位阶跃函数和闸门函数

如果xxyy的关系是连续曲线,只要有足够的蓝色函数把它加起来,分段曲线可以逼近任何连续曲线

如何表现蓝色函数?

Sigmoid 函数表达式:

y=c11+e(b+wx1)y=c \frac{1}{1+e^{-(b+w x_{1})}}

Pasted image 20240829184740.png

使用 Sigmoid 逼近 Hard Sigmoid

Pasted image 20240829185141.png

使用 Hard Sigmoid 来合成红色函数

y=b+icisigmoid(bi+wix1)y=b+\sum_{i}c_{i}sigmoid(b_{i}+w_{i}x_{1})
y=b+icisigmoid(bi+jwijxj)y=b+\sum_{i}c_{i}sigmoid(b_{i}+\sum_{j}w_{ij}x_{j})
r1=b1+w11x1+w12x2+w13x3r_1=b_{1}+w_{11}x_{1}+w_{12}x_{2}+w_{13}x_{3}
......

use linear algebra to represent

r=b+Wxr=b+Wx

Pasted image 20240830105503.png

y=b+cTσ(b+Wx)y=b+c^{T}\sigma (b+Wx)

//cc and second bb are vectors

use vector θ\theta to represent all unknown parameters

how to find θ=argminL\theta^{*}=arg\min L?

grad(θ)=g=L(θ0)grad(\theta)=g=\nabla L(\theta^{0})
θ1θ0ηg\theta^{1}\leftarrow \theta^{0}-\eta g

Pasted image 20240830112452.png

update and epoch

Sigmoid -> ReLU (Rectified linear Unit)

Pasted image 20240830113010.png

y=b+2icimax(0,bi+jwijxj)y=b+\sum_{2i} c_{i} max(0, b_{i}+\sum_{j}w_{ij}x_{j})

another way to improve performance, add layers

Pasted image 20240830113555.png

a=σ(b+Wa)a=σ(b+Wx)a'=\sigma(b'+W'a)\leftarrow a=\sigma(b+Wx)

it's called Neural Network, and it's newer name is Deep Learning

Pasted image 20240830114134.png

Deep vs. Fat, talk about this later

overfitting can result from too much layers

after class thinking

Mr. Lee is quite humorous than most of my teachers in HUST I will continue to watch his ML course videos

I can imagine the shock that Neural Network could bring to people in the last century

also, for AlexNet, see Li Mu's introduction video and [[../../../read AI papers with Li Mu/read AI papers with Li Mu catalog|my local notes]]