反向传播

45 阅读1分钟

image.png

image.png

image-20250415160954245.png 由上图可知,神经元参数的变化会导致损失函数数值的变化。因此,对于输出层第1个神经元,定义

w1n+1=(w1,w2,w3,w4),b1n+1=b1w_1^{n+1} = (w_1, w_2, w_3, w_4), b_1^{n+1} = b_1

由于J是一个标量,标量对向量的求导等于标量对向量中各个元素求导,则:

dw1n+1=Jw1n+1=[Jw1Jw2Jw3Jw4]dw_1^{n+1} = \frac{\partial J}{\partial w_1^{n+1}} = \left[ \frac{\partial J}{\partial w_1} \quad \frac{\partial J}{\partial w_2} \quad \frac{\partial J}{\partial w_3} \quad \frac{\partial J}{\partial w_4} \right]

db1n+1=Jb1db_1^{n+1} = \frac{\partial J}{\partial b_1}

由上图可知,n+1层第一个神经元权值w1n+1w_1^{n+1}的变化引起的连锁变化为:

w1n+1>z1n+1>(a1n+1,a2n+1,a3n+1)>Jw_1^{n+1} -> z_1^{n+1} -> (a_1^{n+1}, a_2^{n+1}, a_3^{n+1}) -> J

所以:

dw1n+1=Jw1n+1=Ja1n+1a1n+1z1n+1z1n+1w1n+1+Ja2n+1a2n+1z1n+1z1n+1w1n+1+Ja3n+1a3n+1z1n+1z1n+1w1n+1dw_1^{n+1} = \frac{\partial J}{\partial w_1^{n+1}} = \frac{\partial J}{\partial a_1^{n+1}} \frac{\partial a_1^{n+1}}{\partial z_1^{n+1}} \frac{\partial z_1^{n+1}}{\partial w_1^{n+1}} + \frac{\partial J}{\partial a_2^{n+1}} \frac{\partial a_2^{n+1}}{\partial z_1^{n+1}} \frac{\partial z_1^{n+1}}{\partial w_1^{n+1}} + \frac{\partial J}{\partial a_3^{n+1}} \frac{\partial a_3^{n+1}}{\partial z_1^{n+1}} \frac{\partial z_1^{n+1}}{\partial w_1^{n+1}}

定义:

A=[a1a2a3a4a5a6]Z=[z1z2z3z4z5z6]Y=[y1y2y3y4y5y6]A = \begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \\ a_5 & a_6 \end{bmatrix} \quad Z = \begin{bmatrix} z_1 & z_2 \\ z_3 & z_4 \\ z_5 & z_6 \end{bmatrix} \quad Y = \begin{bmatrix} y_1 & y_2 \\ y_3 & y_4 \\ y_5 & y_6 \end{bmatrix}

z1n+1=(z1,z2),a1n+1=(a1,a2)z_1^{n+1} = (z_1, z_2), a_1^{n+1} = (a_1, a_2) z2n+1=(z3,z4),a2n+1=(a3,a4)z_2^{n+1} = (z_3, z_4), a_2^{n+1} = (a_3, a_4) z3n+1=(z5,z6),a3n+1=(a5,a6)z_3^{n+1} = (z_5, z_6), a_3^{n+1} = (a_5, a_6)

Z的第一列是第一个样本对应3个神经元的线性输出。

[Ja1Ja2]\left[ \frac{\partial J}{\partial a_1} \quad \frac{\partial J}{\partial a_2} \right]这个是横着看,同一个神经元的输出,两个样本,就是a1n+1a_1^{n+1}

分别计算dw1n+1dw_1^{n+1}的每一项:

Ja1n+1a1n+1z1n+1z1n+1w1n+1=[Ja1Ja2][a1z1a1z2a2z1a2z2][z1w1z1w2z1w3z1w4]\frac{\partial J}{\partial a_1^{n+1}} \frac{\partial a_1^{n+1}}{\partial z_1^{n+1}} \frac{\partial z_1^{n+1}}{\partial w_1^{n+1}} = \left[ \frac{\partial J}{\partial a_1} \quad \frac{\partial J}{\partial a_2} \right] \begin{bmatrix} \frac{\partial a_1}{\partial z_1} & \frac{\partial a_1}{\partial z_2} \\ \frac{\partial a_2}{\partial z_1} & \frac{\partial a_2}{\partial z_2} \end{bmatrix} \begin{bmatrix} \frac{\partial z_1}{\partial w_1} & \frac{\partial z_1}{\partial w_2} & \frac{\partial z_1}{\partial w_3} & \frac{\partial z_1}{\partial w_4} \end{bmatrix}

Ja3n+1a3n+1z1n+1z1n+1w1n+1=[Ja5Ja6][a5z1a5z2a6z1a6z2][z1w1z1w2z1w3z1w4]\frac{\partial J}{\partial a_3^{n+1}} \frac{\partial a_3^{n+1}}{\partial z_1^{n+1}} \frac{\partial z_1^{n+1}}{\partial w_1^{n+1}} = \left[ \frac{\partial J}{\partial a_5} \quad \frac{\partial J}{\partial a_6} \right] \begin{bmatrix} \frac{\partial a_5}{\partial z_1} & \frac{\partial a_5}{\partial z_2} \\ \frac{\partial a_6}{\partial z_1} & \frac{\partial a_6}{\partial z_2} \end{bmatrix} \begin{bmatrix} \frac{\partial z_1}{\partial w_1} & \frac{\partial z_1}{\partial w_2} & \frac{\partial z_1}{\partial w_3} & \frac{\partial z_1}{\partial w_4} \end{bmatrix}

Ja2n+1a2n+1z1n+1z1n+1w1n+1=[Ja3Ja4][a3z1a3z2a4z1a4z2][z1w1z1w2z1w3z1w4]\frac{\partial J}{\partial a_2^{n+1}} \frac{\partial a_2^{n+1}}{\partial z_1^{n+1}} \frac{\partial z_1^{n+1}}{\partial w_1^{n+1}} = \left[ \frac{\partial J}{\partial a_3} \quad \frac{\partial J}{\partial a_4} \right] \begin{bmatrix} \frac{\partial a_3}{\partial z_1} & \frac{\partial a_3}{\partial z_2} \\ \frac{\partial a_4}{\partial z_1} & \frac{\partial a_4}{\partial z_2} \end{bmatrix} \begin{bmatrix} \frac{\partial z_1}{\partial w_1} & \frac{\partial z_1}{\partial w_2} & \frac{\partial z_1}{\partial w_3} & \frac{\partial z_1}{\partial w_4} \end{bmatrix}

J=y1loga1+y3loga3+y5loga5+y2loga2+y4loga4+y6loga6J = y_1 \cdot \log a_1 + y_3 \log a_3 + y_5 \log a_5 + y_2 \log a_2 + y_4 \log a_4 + y_6 \log a_6

a1=ez1ez1+ez3+ez5a3=ez3ez1+ez3+ez5a5=ez5ez1+ez3+ez5a_1 = \frac{e^{z_1}}{e^{z_1} + e^{z_3} + e^{z_5}} \quad a_3 = \frac{e^{z_3}}{e^{z_1} + e^{z_3} + e^{z_5}} \quad a_5 = \frac{e^{z_5}}{e^{z_1} + e^{z_3} + e^{z_5}}

image.png

image.png

image.png

image.png

image.png