PyTorch 自动求导机制(2)

6,210 阅读1分钟

torch.autograd.backward


当进行如下操作时,''RuntimeError: grad can be implicitly created only for scalar outputs'' 的错误。

fig1


当改为:

fig2
没有报错。


假设{\bf c,a,b}分别是大小为m\times n, m\times k, k\times n的矩阵,且有{\bf c}^{m\times n} = {\bf a}^{m \times k} \times {\bf b}^{k \times n},即:

\begin{bmatrix}
    c_{1,1} & \cdots & c _{1,n} \\
    \vdots & \ddots & \vdots \\
    c_{m,1} & \cdots & c _{m,n}
\end{bmatrix}
= 
\begin{bmatrix}
    a_{1,1} & \cdots & a _{1,k} \\
    \vdots & \ddots & \vdots \\
    a_{m,1} & \cdots & a _{m,k}
\end{bmatrix}
\times
\begin{bmatrix}
    b_{1,1} & \cdots & b _{1,n} \\
    \vdots & \ddots & \vdots \\
    b_{k,1} & \cdots & b _{k,n}
\end{bmatrix}
\tag{1}

PyTorch的自动求导过程如下:

\frac{\partial{c _{1,1}}}{\partial{\bf b}^{k\times n}} = 
    \begin{bmatrix}
        \frac{\partial{c _{1,1}}}{\partial b _{1,1}} & \cdots & \frac{\partial{c _{1,1}}}{\partial b _{1,n}} \\
        \vdots & \ddots & \vdots \\
        \frac{\partial{c _{1,1}}}{\partial b _{k,1}} & \cdots & \frac{\partial{c _{1,1}}}{\partial b _{k,n}}
    \end{bmatrix} 
    \tag{2}
\nabla_{{\bf b}^{k\times n}} {\bf c} = 
\sum _{i=1}^{m} \sum _{j=1}^{n} \frac{\partial{c _{i,j}}} {\partial{\bf b}^{k\times n}}\tag{3}

回到最开始的代码,有

{\bf a}=
\begin{bmatrix}
a _{1,1} & a _{1,2} \\
a _{2,1} & a _{2,2}
\end{bmatrix}
=
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix},\tag{4}
{\bf b}=
\begin{bmatrix}
b _{1,1} & b _{1,2} & b _{1,3} \\
b _{2,1} & b _{2,2} & b _{2,3}
\end{bmatrix}
=
\begin{bmatrix}
1 & 2 & 3 \\
4 & 5 & 6
\end{bmatrix},\tag{5}
{\bf c}=
\begin{bmatrix}
c _{1,1} & c _{1,2} & c _{1,3} \\
c _{2,1} & c _{2,2} & c _{2,3}
\end{bmatrix}\tag{6}

根据(2)式,有:

c _{1,1} = a _{1,1} b _{1,1} + a _{1,2} b _{2,1},\tag{7}
c _{1,2} = a _{1,1}b _{1,2} + a _{1,2}b _{2,2},\tag{8}
c_{1,3} = a_{1,1}b_{1,3} + a_{1,2}b_{2,3},\tag{9}
c_{2,1} = a_{2,1}b_{1,1} + a_{2,2}b_{2,1},\tag{10}
c_{2,2} = a_{2,1}b_{1,2} + a_{2,2}b_{2,2},\tag{11}
c_{2,3} = a_{2,1}b_{1,3} + a_{2,2}b_{2,3},\tag{12}

所以根据(3)式有:

\frac{\partial{c_{1,1}}}{\partial{\bf b}} = 
\begin{bmatrix}
a_{1,1} & 0 & 0\\
a_{1,2} & 0 & 0
\end{bmatrix},\tag{13}
\frac{\partial{c_{1,2}}}{\partial{\bf b}} = 
\begin{bmatrix}
0 & a_{1,1} & 0\\
0 & a_{1,2} & 0
\end{bmatrix},\tag{14}
\frac{\partial{c_{1,3}}}{\partial{\bf b}} = 
\begin{bmatrix}
0 & 0 & a_{1,1}\\
0 & 0 & a_{1,2}
\end{bmatrix},\tag{15}
\frac{\partial{c_{2,1}}}{\partial{\bf b}} = 
\begin{bmatrix}
a_{2,1} & 0 & 0\\
a_{2,2} & 0 & 0
\end{bmatrix},\tag{16}
\frac{\partial{c_{2,2}}}{\partial{\bf b}} = 
\begin{bmatrix}
0 & a_{2,1} & 0\\
0 & a_{2,2} & 0
\end{bmatrix},\tag{17}
\frac{\partial{c_{2,3}}}{\partial{\bf b}} = 
\begin{bmatrix}
0 & 0 & a_{2,1}\\
0 & 0 & a_{2,2}
\end{bmatrix},\tag{18}

将式(13)~(18)加起来即得到b.grad,同理可得到a.grad。 其中c.backward(torch.ones_like(c))backward()的参数是与{\bf c}^{m\times n}大小相同且全为1的矩阵,其中矩阵每个位置的值【相对应】的是式(13)~(18)的系数。 即:

b.grad=1\times (13) + 1\times(14) + 1\times(15) + 1\times(16) + 1\times(17) +  1\times(18) = 
\begin{bmatrix}
4 & 4 & 4 \\
6 & 6 & 6
\end{bmatrix}

相应的】如果c.backward()的【参数】为任意与矩阵{\bf c}^{m\times n}【形状一致】的矩阵都可,所得到的各元素梯度乘上对应位置的系数即可。