神经网络基础——矩阵求导运算

5 阅读3分钟

本篇文章不涉及过多理论推导。关于本篇文章,在实践中会用即可。 矩阵求导有以下9种可能情况:

自变量/因变量标量yy向量y\mathbf y矩阵YY
标量xxdydx\frac{dy}{dx}dydx\frac{d\mathbf y}{dx}dYdx\frac{dY}{dx}
向量x\mathbf xdydx\frac{dy}{d\mathbf x}dydx\frac{d\mathbf y}{d\mathbf x}dYdx\frac{dY}{d\mathbf x}
矩阵XXdydX\frac{dy}{dX}dydX\frac{d\mathbf y}{dX}dYdX\frac{dY}{dX}

以下分别讨论。

一阶导数

标量对标量求导

这里是最简单的情况,本篇默认读者已掌握,不再赘述。

向量对标量求导

行向量对标量求导

设行向量y=(y1,y2,,yn)\mathbf y=(y_1,y_2,\cdots,y_n),则有

yx=[y1xy2xynx]\frac{\partial\mathbf y}{\partial x}=\begin{bmatrix} \frac{\partial y_1}{\partial x} & \frac{\partial y_2}{\partial x} & \cdots & \frac{\partial y_n}{\partial x} \end{bmatrix}

列向量对标量求导

设列向量y=[y1y2yn]\mathbf y=\begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix},则有

yx=[y1xy2xynx]\frac{\partial\mathbf y}{\partial x}=\begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ \vdots\\ \frac{\partial y_n}{\partial x} \end{bmatrix}

矩阵对标量求导

Y=[y11y12y1ny21y22y2nym1ym2ymn]Y=\begin{bmatrix} y_{11} & y_{12} & \cdots & y_{1n}\\ y_{21} & y_{22} & \cdots & y_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ y_{m1} & y_{m2} & \cdots & y_{mn} \end{bmatrix},则有

Yx=[y11xy12xy1nxy21xy22xy2nxym1xym2xymnx]\frac{\partial Y}{\partial x}=\begin{bmatrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \cdots & \frac{\partial y_{1n}}{\partial x}\\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \cdots & \frac{\partial y_{2n}}{\partial x}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \cdots & \frac{\partial y_{mn}}{\partial x} \end{bmatrix}

标量对向量求导

标量对行向量求导

x=(x1,x2,,xm)\mathbf x=(x_1,x_2,\cdots,x_m),则有

yx=[yx1yx2yxm]\frac{\partial y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \cdots & \frac{\partial y}{\partial x_m} \end{bmatrix}

标量对列向量求导

x=[x1x2xm]\mathbf x=\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_m \end{bmatrix},则有

yx=[yx1yx2yxm]\frac{\partial y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial y}{\partial x_1}\\ \frac{\partial y}{\partial x_2}\\ \vdots\\ \frac{\partial y}{\partial x_m} \end{bmatrix}

标量对矩阵求导

X=[x11x12x1nx21x22x2nxm1xm2xmn]X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1n}\\ x_{21} & x_{22} & \cdots & x_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} & x_{m2} & \cdots & x_{mn} \end{bmatrix},则有

yX=[yx11yx12yx1nyx21yx22yx2nyxm1yxm2yxmn]\frac{\partial y}{\partial X}=\begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \cdots & \frac{\partial y}{\partial x_{1n}}\\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{2n}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{m1}} & \frac{\partial y}{\partial x_{m2}} & \cdots & \frac{\partial y}{\partial x_{mn}} \end{bmatrix}

向量对向量求导

列向量对行向量求导(雅可比矩阵)

y=[y1y2yn],x=(x1,x2,,xm)\mathbf y=\begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix},\mathbf x=(x_1,x_2,\cdots,x_m),则有

yx=[y1x1y1x2y1xmy2x1y2x2y2xmynx1ynx2ynxm]\frac{\partial\mathbf y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_m}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_m}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_n}{\partial x_1} & \frac{\partial y_n}{\partial x_2} & \cdots & \frac{\partial y_n}{\partial x_m} \end{bmatrix}

上式也称为雅可比矩阵

行向量对列向量求导

y=(y1,y2,,yn),x=[x1x2xm]\mathbf y=(y_1,y_2,\cdots,y_n),\mathbf x=\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_m \end{bmatrix},则有

yx=[y1x1y2x1ynx1y1x2y2x2ynx2y1xmy2xmynxm]\frac{\partial\mathbf y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \cdots & \frac{\partial y_n}{\partial x_1}\\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_n}{\partial x_2}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_1}{\partial x_m} & \frac{\partial y_2}{\partial x_m} & \cdots & \frac{\partial y_n}{\partial x_m} \end{bmatrix}

行向量对行向量求导

y=(y1,y2,,yn),x=(x1,x2,,xm)\mathbf y=(y_1,y_2,\cdots,y_n),\mathbf x=(x_1,x_2,\cdots,x_m),则有

yx=[yx1yx2yxm]\frac{\partial\mathbf y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial\mathbf{y}}{\partial x_1} & \frac{\partial\mathbf{y}}{\partial x_2} & \cdots & \frac{\partial\mathbf{y}}{\partial x_m} \end{bmatrix}

其中yxi\frac{\partial\mathbf{y}}{x_i}的定义见行向量对标量求导

列向量对列向量求导

y=[y1y2yn],x=[x1x2xm]\mathbf y=\begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix},\mathbf x=\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_m \end{bmatrix},则有

yx=[yx1yx2yxm]\frac{\partial\mathbf y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial\mathbf{y}}{\partial x_1}\\ \frac{\partial\mathbf{y}}{\partial x_2}\\ \vdots\\ \frac{\partial\mathbf{y}}{\partial x_m} \end{bmatrix}

其中yxi\frac{\partial\mathbf{y}}{x_i}的定义见列向量对标量求导

矩阵对向量求导

矩阵对行向量求导

Y=[y11y12y1ny21y22y2nym1ym2ymn],x=(x1,x2,,xc)Y=\begin{bmatrix} y_{11} & y_{12} & \cdots & y_{1n}\\ y_{21} & y_{22} & \cdots & y_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ y_{m1} & y_{m2} & \cdots & y_{mn} \end{bmatrix},\mathbf x=(x_1,x_2,\cdots,x_c),则有

Yx=[Yx1Yx2Yxc]\frac{\partial Y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial Y}{\partial x_1} & \frac{\partial Y}{\partial x_2} & \cdots & \frac{\partial Y}{\partial x_c} \end{bmatrix}

其中Yxi\frac{\partial Y}{x_i}的定义见矩阵对标量求导

矩阵对列向量求导

Y=[y11y12y1ny21y22y2nym1ym2ymn],x=[x1x2xr]Y=\begin{bmatrix} y_{11} & y_{12} & \cdots & y_{1n}\\ y_{21} & y_{22} & \cdots & y_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ y_{m1} & y_{m2} & \cdots & y_{mn} \end{bmatrix},\mathbf x=\begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_r \end{bmatrix},则有

Yx=[Yx1Yx2Yxr]\frac{\partial Y}{\partial\mathbf x}=\begin{bmatrix} \frac{\partial Y}{\partial x_1}\\ \frac{\partial Y}{\partial x_2}\\ \vdots\\ \frac{\partial Y}{\partial x_r} \end{bmatrix}

其中Yxi\frac{\partial Y}{x_i}的定义见矩阵对标量求导

向量对矩阵求导

行向量对矩阵求导

y=(y1,y2,,yn),X=[x11x12x1cx21x22x2cxr1xr2xrc]\mathbf y=(y_1,y_2,\cdots,y_n),X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1c}\\ x_{21} & x_{22} & \cdots & x_{2c}\\ \vdots & \vdots & \ddots & \vdots\\ x_{r1} & x_{r2} & \cdots & x_{rc} \end{bmatrix},则有

yX=[yx11yx12yx1cyx21yx22yx2cyxr1yxr2yxrc]\frac{\partial\mathbf y}{\partial X}=\begin{bmatrix} \frac{\partial\mathbf y}{\partial x_{11}} & \frac{\partial\mathbf y}{\partial x_{12}} & \cdots & \frac{\partial\mathbf y}{\partial x_{1c}}\\ \frac{\partial\mathbf y}{\partial x_{21}} & \frac{\partial\mathbf y}{\partial x_{22}} & \cdots & \frac{\partial\mathbf y}{\partial x_{2c}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial\mathbf y}{\partial x_{r1}} & \frac{\partial\mathbf y}{\partial x_{r2}} & \cdots & \frac{\partial\mathbf y}{\partial x_{rc}} \end{bmatrix}

其中yxij\frac{\partial\mathbf y}{x_{ij}}的定义见行向量对标量求导

列向量对矩阵求导

y=[y1y2yn],X=[x11x12x1cx21x22x2cxr1xr2xrc]\mathbf y=\begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix},X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1c}\\ x_{21} & x_{22} & \cdots & x_{2c}\\ \vdots & \vdots & \ddots & \vdots\\ x_{r1} & x_{r2} & \cdots & x_{rc} \end{bmatrix},则有

yX=[yx11yx12yx1cyx21yx22yx2cyxr1yxr2yxrc]\frac{\partial\mathbf y}{\partial X}=\begin{bmatrix} \frac{\partial\mathbf y}{\partial x_{11}} & \frac{\partial\mathbf y}{\partial x_{12}} & \cdots & \frac{\partial\mathbf y}{\partial x_{1c}}\\ \frac{\partial\mathbf y}{\partial x_{21}} & \frac{\partial\mathbf y}{\partial x_{22}} & \cdots & \frac{\partial\mathbf y}{\partial x_{2c}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial\mathbf y}{\partial x_{r1}} & \frac{\partial\mathbf y}{\partial x_{r2}} & \cdots & \frac{\partial\mathbf y}{\partial x_{rc}} \end{bmatrix}

其中yxij\frac{\partial\mathbf y}{x_{ij}}的定义见列向量对标量求导

矩阵对矩阵求导

Y=[y11y12y1ny21y22y2nym1ym2ymn],X=[x11x12x1cx21x22x2cxr1xr2xrc]Y=\begin{bmatrix} y_{11} y_{12} \cdots y_{1n}\\ y_{21} y_{22} \cdots y_{2n}\\ \vdots \vdots \ddots \vdots\\ y_{m1} y_{m2} \cdots y_{mn} \end{bmatrix},X=\begin{bmatrix} x_{11} & x_{12} & \cdots & x_{1c}\\ x_{21} & x_{22} & \cdots & x_{2c}\\ \vdots & \vdots & \ddots & \vdots\\ x_{r1} & x_{r2} & \cdots & x_{rc} \end{bmatrix},则有

YX=[Yx11Yx12Yx1cYx21Yx22Yx2cYxr1Yxr2Yxrc]\frac{\partial Y}{\partial X}=\begin{bmatrix} \frac{\partial Y}{\partial x_{11}} & \frac{\partial Y}{\partial x_{12}} & \cdots & \frac{\partial Y}{\partial x_{1c}}\\ \frac{\partial Y}{\partial x_{21}} & \frac{\partial Y}{\partial x_{22}} & \cdots & \frac{\partial Y}{\partial x_{2c}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial Y}{\partial x_{r1}} & \frac{\partial Y}{\partial x_{r2}} & \cdots & \frac{\partial Y}{\partial x_{rc}} \end{bmatrix}

其中Yxij\frac{\partial Y}{x_{ij}}的定义见矩阵对标量求导

二阶导数

二阶导数可以视作对一阶导数的导数,从而推导出来。这里要介绍的是一个特别的导数结果,称为海森矩阵(Hessian矩阵)。

海森矩阵

设标量yy和行向量x=(x1,x2,,xm)\mathbf{x}=(x_1,x_2,\cdots,x_m)。Hessian矩阵定义为yyx\mathbf{x}的二阶导(第二次求导要转置行向量x\mathbf{x}),即

H(y)=2yx2=[2yx122yx1x22yx1xm2yx2x12yx222yx2xm2yxmx12yxmx22yxm2]\begin{aligned} H(y)&=\frac{\partial^2 y}{\partial\mathbf{x}^2}\\ &=\begin{bmatrix} \frac{\partial^2 y}{\partial x_1^2} & \frac{\partial^2 y}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 y}{\partial x_1 \partial x_m}\\ \frac{\partial^2 y}{\partial x_2 \partial x_1} & \frac{\partial^2 y}{\partial x_2^2} & \cdots & \frac{\partial^2 y}{\partial x_2 \partial x_m}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial^2 y}{\partial x_m \partial x_1} & \frac{\partial^2 y}{\partial x_m \partial x_2} & \cdots & \frac{\partial^2 y}{\partial x_m^2} \end{bmatrix} \end{aligned}