神经网络

85 阅读1分钟

好吧, 说实在的, 我是没学懂这节课啦, o(TヘTo) 就当做做笔记啦

神经模型

image.png

image.png

a1(2)=h(θ10(1)x0+θ11(1)x1+θ12(1)x2+θ13(1)x3)a_1^{(2)}=h\left(\theta_{10}^{(1)}x_0 + \theta_{11}^{(1)}x_1 + \theta_{12}^{(1)}x_2 + \theta_{13}^{(1)}x_3\right)
a2(2)=h(θ20(1)x0+θ21(1)x1+θ22(1)x2+θ23(1)x3)a_2^{(2)}=h\left(\theta_{20}^{(1)}x_0 + \theta_{21}^{(1)}x_1 + \theta_{22}^{(1)}x_2 + \theta_{23}^{(1)}x_3\right)
a3(2)=h(θ30(1)x0+θ31(1)x1+θ32(1)x2+θ33(1)x3)a_3^{(2)}=h\left(\theta_{30}^{(1)}x_0 + \theta_{31}^{(1)}x_1 + \theta_{32}^{(1)}x_2 + \theta_{33}^{(1)}x_3\right)
hθ(x)=a1(3)=h(θ10(2)a0(2)+θ11(2)a1(2)+θ12(2)a2(2)+θ13(2)a3(2))h_\theta(x)=a_1^{(3)} =h\left(\theta_{10}^{(2)}a_0^{(2)} + \theta_{11}^{(2)}a_1^{(2)} + \theta_{12}^{(2)}a_2^{(2)} + \theta_{13}^{(2)}a_3^{(2)}\right)

右上角的 (2) 代表第二层

此时可以令 z1(2)=Θ10(1)x0+z_1^{(2)} = \Theta_{10}^{(1)}x_0 + \cdots , 即 z1(2)=Θ(1)a(1)z_1^{(2)} = \Theta^{(1)}a^{(1)}

那么就有 a1(2)=h(z1(2))a_1^{(2)} = h\left(z_1^{(2)}\right)

% 使用 Theta1 和 Theta2 对 X 每一行进行预测
X = [ones(m, 1) X];

a2 = sigmoid(X * Theta1'); 
a2 = [ones(m, 1) a2]; 

h = sigmoid(a2 * Theta2'); 

[~, p] = max(h, [], 2);

感知器示例

y^={1h0.50h<0.5\hat{y}=\begin{cases} 1 & h \ge 0.5 \\ 0 & h \lt 0.5 \end{cases}

逻辑与

hθ(x)=sigmoid(x1+x21.5)h_\theta(x)=sigmoid(x_1+x_2-1.5)

image.png

逻辑或

hθ(x)=sigmoid(x1+x20.5)h_\theta(x)=sigmoid(x_1+x_2-0.5)

image.png

逻辑非

hθ(x)=sigmoid(x1+0.5)h_\theta(x)=sigmoid(-x_1+0.5)

image.png

异或

y=x1x2=(x1x2)(x1x2)y=x_1\oplus x_2 = (\overline{x}_1 \cap x_2) \cup (x_1 \cap \overline{x}_2)

image.png

image.png

代价函数

image.png

logistic 回归的代价函数:

J(θ)=1mi=1m(y(i)loghθ(x(i))+(1y(i))log(1hθ(x(i))))+λ2mj=1nθj2J(\theta) = -\frac{1}{m}\sum_{i=1}^m \left( y^{(i)} log h_\theta \left(x^{(i)} \right) + \left(1-y^{(i)}\right) log \left(1 - h_\theta \left(x^{(i)}\right) \right) \right) + \frac{\lambda}{2m}\sum_{j=1}^n\theta^2_j

神经网络的代价函数

J(θ)=1mi=1mk=1K(yk(i)log(hΘ(x(i)))k+(1yk(i))log(1hΘ(x(i))))+λ2ml=1L1i=1slj=1sl+1(θji(l))2J(\theta) = -\frac{1}{m} \sum_{i=1}^m \sum_{k=1}^K \left( y^{(i)}_k log \left(h_\Theta \left(x^{(i)}\right) \right)_k + \left(1-y^{(i)}_k\right)log \left (1-h_\Theta \left(x^{(i)}\right) \right) \right) + \frac{\lambda}{2m} \sum_{l=1}^{L-1}\sum_{i=1}^{s_l} \sum_{j=1}^{s_l+1} \left(\theta^{(l)}_{ji}\right)^2

其中 K 代表第几层, 正则项为从第一层到 L-1 层

X = [ones(m, 1) X];

a1 = X; 

z2 = a1 * Theta1'; 
a2 = [ones(m, 1) sigmoid(z2)]; 

z3 = a2 * Theta2'; 
a3 = sigmoid(z3); % 共 3 层

I = eye(num_labels);
Y = I(y,:);

re = (sum(sum(Theta1(:, 2:end).^2)) + ...
     sum(sum(Theta2(:, 2:end).^2))) ... 
     * lambda / 2 / m; 

% 代价函数
J = -sum(sum((Y .* log(a3) + (1 - Y) ...
    .* log(1 - a3)))) / m +re;

BP算法

全称 BackPropagation (反向传播算法)

最终层的误差如何分解到各个节点的输出上?

根据 权重 一层层的往 分解

image.png

Sigmoid 梯度

sigmoid(z)=g(z)=11+ezsigmoid(z) = g(z) = \frac{1}{1+e^{-z}}
grad=g(z)=g(z)(1g(z))grad = g'(z) = g(z)(1-g(z))
g = sigmoid(z) .* (1 - sigmoid(z));

梯度

已知 z(i+1)=θ(i)a(i)z^{(i+1)}=\theta^{(i)} a^{(i)}

令误差为

δ(4)=Ja(4)a(4)z(4)\delta^{(4)} = \frac{\partial J\partial a^{(4)}}{\partial a^{(4)} \partial z^{(4)}}
δ(3)=Ja(4)z(4)a(3)a(4)z(4)a(3)z(3)\delta^{(3)} = \frac{\partial J \partial a^{(4)} \partial z^{(4)} \partial a^{(3)}}{\partial a^{(4)} \partial z^{(4)} \partial a^{(3)} \partial z^{(3)}}

那么梯度为

JΘ(2)=Ja(4)z(4)a(3)a(4)z(4)a(3)z(3)z(3)Θ(2)\frac{\partial J}{\partial \Theta^{(2)}} = \frac{\partial J \partial a^{(4)} \partial z^{(4)} \partial a^{(3)}}{\partial a^{(4)} \partial z^{(4)} \partial a^{(3)} \partial z^{(3)}}·\frac{\partial z^{(3)}}{\partial \Theta^{(2)}}

梯度下降法计算 Θ\Theta :

Θij(2)(t+1)=Θij(2)(t)α×JΘij(2)\Theta_{ij}^{(2)}(t+1)=\Theta_{ij}^{(2)}(t)-\alpha×\frac{\partial J}{\partial \Theta_{ij}^{(2)}}

已知

Ja(4)=a(4)ya(4)(1a(4))\frac{\partial J}{\partial a^{(4)}} = \frac{a^{(4)} - y} {a^{(4)}(1-a^{(4)})}
a(4)z(4)=a(4)(1a(4))\frac{\partial a^{(4)}}{\partial z^{(4)}} = a^{(4)}(1-a^{(4)})

因此

δ(4)=Ja(4)a(4)z(4)=a(4)y\color{red}{ \delta^{(4)} = \frac{\partial J \partial a^{(4)}} {\partial a^{(4)} \partial z^{(4)}} = a^{(4)} - y }

其中 a(i)=h(z(i))a^{(i)} = h(z^{(i)})

同理, 已知 z(4)=Θ(3)a(3)z^{(4)}=\Theta^{(3)}a^{(3)}, 即

z(4)a(3)=Θ(3)\frac{\partial z^{(4)}} {\partial a^{(3)}} = \Theta^{(3)}

那么

δ(3)=Ja(4)a(4)z(4)z(4)z(3)a(3)z(3)=δ(4)Θ(3)h(z(3))\color{red}{ \delta^{(3)} = \frac{\partial J \partial a^{(4)}}{\partial a^{(4)} \partial z^{(4)}} · \frac{\partial z^{(4)}}{\partial z^{(3)}} · \frac{\partial a^{(3)}}{\partial z^{(3)}} = \delta^{(4)} · \Theta^{(3)} · h'(z^{(3)}) }

同理

δ(2)=δ(3)Θ(2)h(z(2))\color{red}{ \delta^{(2)} = \delta^{(3)} · \Theta^{(2)} · h'(z^{(2)}) }
delta3 = a3 - Y; % 因为只有 3 层才这样写
dt = delta3 * Theta2;
delta2 = dt(:, 2:end) .* sigmoidGradient(z2);

梯度为

JΘ(3)=δ(4)a(3)\frac{\partial J}{\partial \Theta^{(3)}} = \delta^{(4)}·a^{(3)}
JΘ(2)=δ(3)a(2)\frac{\partial J}{\partial \Theta^{(2)}} = \delta^{(3)}·a^{(2)}
JΘ(1)=δ(2)a(1)\frac{\partial J}{\partial \Theta^{(1)}} = \delta^{(2)}·a^{(1)}

梯度下降

Δij(l)=Δij(l)+δj(l+1)ai(l)\Delta^{(l)}_{ij} = \Delta^{(l)}_{ij} + \delta^{(l+1)}_ja^{(l)}_i
Delta1 = delta2' * a1;
Delta2 = delta3' * a2;

正则化

JΘij(l)={1mΔij(l)+λmθij(l)j>01mΔij(l)j=0 \frac{\partial J}{\partial\Theta^{(l)}_{ij}}= \begin{cases} \frac{1}{m}\Delta^{(l)}_{ij}+\frac{\lambda}{m}\theta^{(l)}_{ij} & j > 0 \\\\ \frac{1}{m}\Delta^{(l)}_{ij} & j = 0 \end{cases}
Theta1_grad = Delta1 / m + lambda * ... 
        [zeros(hidden_layer_size , 1) Theta1(:, 2:end)] / m; 

Theta2_grad = Delta2 / m + lambda * ... 
        [zeros(num_labels , 1) Theta2(:, 2:end)] / m;
        
% 梯度
grad = [Theta1_grad(:) ; Theta2_grad(:)];