复合函数的前向微分与反向自动微分计算

9 阅读8分钟

复合函数的前向微分与反向自动微分计算

关于

前向与反向自动微分:数学

先复习一下微积分求导法则

微积分求导法则复习

乘法法则

f(x)=u(x)×v(x)f(x) = u(x) \times v(x)
dydx=dudx×v+dvdx×uf(x)=uv+vu\begin{aligned} \frac{dy}{dx} &= \frac{du}{dx} \times v + \frac{dv}{dx} \times u \\ f'(x) &= u'v + v'u \end{aligned}
f(x)=(3x5)×(4x+7)u=3x5v=4x+7u=3v=4f(x)=3(4x+7)+4(3x5)=12x+21+12x20=24x+1=24x+1\begin{aligned} f(x)&=(3 x-5) \times(4 x+7) \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=3(4 x+7)+4(3 x-5) \\ &=12 x+21+12 x-20=24 x+1 \\ &=24 x+1 \end{aligned}

除法法则

f(x)=u(x)v(x)f(x) = \frac{u(x)}{v(x)}
f(x)=uvvuv2dydx=dudxvdvdxuv2\begin{aligned} f'(x) &= \frac{u'v - v'u}{v^2} \\ \frac{dy}{dx} &= \frac{\frac{du}{dx}v - \frac{dv}{dx}u}{v^2} \end{aligned}
f(x)=3x54x+7u=3x5v=4x+7u=3v=4f(x)=3(4x+7)4(3x5)(4x+7)2=12x+2112x+20(4x+7)2=41(4x+7)2\begin{aligned} f(x)&=\frac{3 x-5}{4 x+7} \\ u&=3 x-5 \quad v=4 x+7 \\ u^{\prime}&=3 \quad v^{\prime}=4 \\ f^{\prime}(x)&=\frac{3(4 x+7)-4(3 x-5)}{(4 x+7)^2} \\ &=\frac{12 x+21-12 x+20}{(4 x+7)^2} \\ &=\frac{41}{(4 x+7)^2} \end{aligned}

cos和sin求导

y=sin(x)dydx=cos(x)\begin{aligned} y &= \sin(x) \\ \frac{dy}{dx} &= \cos(x) \end{aligned}
y=cos(x)dydx=sin(x)\begin{aligned} y = \cos(x) \\ \frac{dy}{dx} = -\sin(x) \end{aligned}

链式法则(单变量复合函数)

y=f(u)u=f(x)y = f(u) \quad u = f(x)
dydx=dydududx\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx}
y=(2x+4)3y=u3 and u=2x+4dydu=3u2dudx=2dydx=3u2×2=2×3(2x+4)2=6(2x+4)2\begin{aligned} y&=(2 x+4)^3 \\ y&=u^3 \text { and } u=2 x+4 \\ \frac{d y}{d u}&=3 u^2 \quad \frac{d u}{d x}=2 \\ \frac{d y}{d x}&=3 u^2 \times 2=2 \times 3(2 x+4)^2 \\ &=6(2 x+4)^2 \end{aligned}

多变量链式法则(Case 1)

z=f(x,y)x=g(t)y=h(t)\begin{aligned} z &= f(x,y) \\ x &= g(t) \\ y &= h(t) \\ \end{aligned}
dzdt=fxdxdt+fydydt\frac{d z}{d t}=\frac{\partial f}{\partial x} \frac{d x}{d t}+\frac{\partial f}{\partial y} \frac{d y}{d t}

多变量链式法则(Case 2)

z=f(x,y)x=g(s,t)y=h(s,t)\begin{aligned} z &= f(x,y) \\ x & = g(s,t) \\ y &= h(s,t) \end{aligned}
zs=zxxs+zyyszt=zxxt+zyyt\frac{\partial z}{\partial s}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial s} \quad \frac{\partial z}{\partial t}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial t}

当计算zs\frac{\partial z}{\partial s}时,我们保持(hold)tt 固定并计算 zzss 的普通导数,即应用多变量链式法则(Case 1)。计算zt\frac{\partial z}{\partial t}时同理。

多变量链式法则(广义版)

u=f(x1,x2,,xn)xk=g(t1,t2,,tm)for 1kn\begin{aligned} u &= f(x_1, x_2, \ldots, x_n) \\ x_k &= g(t_1, t_2, \ldots, t_m) \qquad \text{for } 1 \leq k \leq n \end{aligned}
uti=ux1x1ti+ux2x2ti++uxnxntifor 1im\begin{aligned} &\frac{\partial u}{\partial t_i}=\frac{\partial u}{\partial x_1} \frac{\partial x_1}{\partial t_i}+\frac{\partial u}{\partial x_2} \frac{\partial x_2}{\partial t_i}+\cdots+\frac{\partial u}{\partial x_n} \frac{\partial x_n}{\partial t_i} \end{aligned} \qquad \text{for } 1 \leq i \leq m

复合函数,偏微分,链式法则,前向和反向自动微分

前向与反向的计算顺序

对于组合函数:

y=f(g(h(x)))=f(g(h(w0)))=f(g(w1))=f(w2)=w3w0=xw1=h(w0)w2=g(w1)w3=f(w2)=y\begin{aligned} y & =f(g(h(x)))=f\left(g\left(h\left(w_0\right)\right)\right)=f\left(g\left(w_1\right)\right)=f\left(w_2\right)=w_3 \\ w_0 & =x \\ w_1 & =h\left(w_0\right) \\ w_2 & =g\left(w_1\right) \\ w_3 & =f\left(w_2\right)=y \end{aligned}

链式法则将给出:

yx=yw2w2w1w1x=f(w2)w2g(w1)w1h(w0)x\begin{aligned} \frac{\partial y}{\partial x}&=\frac{\partial y}{\partial w_2} \frac{\partial w_2}{\partial w_1} \frac{\partial w_1}{\partial x}=\frac{\partial f\left(w_2\right)}{\partial w_2} \frac{\partial g\left(w_1\right)}{\partial w_1} \frac{\partial h\left(w_0\right)}{\partial x} \end{aligned}

计算顺序:

  • 前向微分计算时 ,先计算w1/x\partial w_1 / \partial x,然后计算w2/w1\partial w_2/\partial w_1,最后计算y/w2\partial y / \partial w_2
  • 反向微分计算时,先计算y/w2\partial y / \partial w_2,然后计算w2/w1\partial w_2/\partial w_1,最后计算w1/x\partial w_1 / \partial x

前向微分

对于组合函数:

r=?s=?t=?x=g(r,s,t)y=h(r,s,t)z=i(r,s,t)u=f(x,y,z)\begin{aligned} r &= ? \\ s &= ? \\ t &= ? \\ x &= g(r,s,t) \\ y & = h(r,s,t) \\ z &= i(r,s,t) \\ u &= f(x,y,z) \end{aligned}

前向微分计算:

rv=?sv=?tv=?xv=xrrv+xssv+xttvyv=yrrv+yssv+yttvzv=zrrv+zssv+zttvuv=uxxv+uyyv+uzzv\begin{aligned} \frac{\partial r}{\partial v} &= ? \\ \frac{\partial s}{\partial v} &= ? \\ \frac{\partial t}{\partial v} &= ? \\ \\ \frac{\partial x}{\partial v} &= \frac{\partial x}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial x}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial x}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial y}{\partial v} &= \frac{\partial y}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial y}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial y}{\partial t}\frac{\partial t}{\partial v} \\ \frac{\partial z}{\partial v} &= \frac{\partial z}{\partial r}\frac{\partial r}{\partial v} + \frac{\partial z}{\partial s}\frac{\partial s}{\partial v} + \frac{\partial z}{\partial t}\frac{\partial t}{\partial v} \\ \\ \frac{\partial u}{\partial v}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial v}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial v}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial v} \end{aligned}

v=rv=r,即将rr作为独立变量并将sstt固定时,可得

rv=1sv=0tv=0ur=uxxr+uyyr+uzzr\begin{aligned} \frac{\partial r}{\partial v} &= 1 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial r}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial r}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial r}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial r} \end{aligned}

v=sv=s,即将ss作为独立变量并将rrtt固定时,可得

rv=0sv=1tv=0us=uxxs+uyys+uzzs\begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 1 \\ \frac{\partial t}{\partial v} &= 0 \\ \frac{\partial u}{\partial s}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial s}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial s}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial s} \end{aligned}

v=tv=t,即将tt作为独立变量并将ssrr固定时,可得

rv=0sv=0tv=1ut=uxxt+uyyt+uzzt\begin{aligned} \frac{\partial r}{\partial v} &= 0 \\ \frac{\partial s}{\partial v} &= 0 \\ \frac{\partial t}{\partial v} &= 1 \\ \frac{\partial u}{\partial t}&=\frac{\partial u}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial u}{\partial y} \frac{\partial y}{\partial t}+\frac{\partial u}{\partial z} \frac{\partial z}{\partial t} \end{aligned}

反向微分

对于组合函数:

u1=r(x1,x2)u2=s(x1,x2)y1=f(u1,u2)y2=g(u1,u2)y3=h(u1,u2)\begin{aligned} u_1 &= r(x_1, x_2) \\ u_2 &= s(x_1, x_2) \\ y_1 &= f(u_1, u_2) \\ y_2 &= g(u_1, u_2) \\ y_3 &= h(u_1, u_2) \end{aligned}

反向微分计算:

sy1=?sy2=?sy3=?su1=sy1y1u1+sy2y2u1+sy3y3u1su2=sy1y1u2+sy2y2u2+sy3y3u2sx1=su1u1x1+su2u2x1sx2=su1u1xx+su2u2xx\begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \frac{\partial s}{\partial y_3} &= ? \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_1} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_1} \\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} + \frac{\partial s}{\partial y_2}\frac{\partial y_2}{\partial u_2} + \frac{\partial s}{\partial y_3}\frac{\partial y_3}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned}

可以想象有一个函数s=function(y1,y2,y3)s=function(y_1,y_2,y_3)

s=y1s=y_1,即将y1y_1作为独立变量并将y2y_2y3y_3固定时,可得

sy1=1sy2=0sy3=0su1=sy1y1u1su2=sy1y1u2sx1=su1u1x1+su2u2x1sx2=su1u1xx+su2u2xx\begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \frac{\partial s}{\partial y_3} &= 0 \\ \\ \frac{\partial s}{\partial u_1} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_1}\\ \frac{\partial s}{\partial u_2} &= \frac{\partial s}{\partial y_1}\frac{\partial y_1}{\partial u_2} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_1} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial u_1}\frac{\partial u_1}{\partial x_x} + \frac{\partial s}{\partial u_2}\frac{\partial u_2}{\partial x_x} \end{aligned}

以例子说明自动微分的计算

例子

假设有2个输入变量(x1x_1, x2x_2)和2个输出变量(y1y_1, y2y_2):

m1=x1x2+sin(x1)m2=4x1+2x2+cos(x2)y1=m1+m2y2=m1m2(1)\begin{aligned} m_1 &= x_1 \cdot x_2 + \sin(x_1) \\ m_2 &= 4x_1 + 2x_2 + \cos(x_2) \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{1}

即:

y1=x1x2+sin(x1)+4x1+2x2+cos(x2)y2=(x1+x2+sin(x1))(4x1+2x2+cos(x2))\begin{aligned} y_1 &= x_1 \cdot x_2 + \sin(x_1) + 4x_1 + 2x_2 + \cos(x_2) \\ y_2 &= (x_1 + x_2 + \sin(x_1)) \cdot (4x_1 + 2x_2 + \cos(x_2)) \end{aligned}

其中:

y1x1=x2+cos(x1)+4y1x2=x1+2sin(x2)y2x1=(x2+cos(x1))m2+m14\begin{aligned} \frac{\partial y_1}{\partial x_1} &= x_2 + \cos(x_1) + 4 \\ \frac{\partial y_1}{\partial x_2} &= x_1 + 2 - \sin(x_2) \\ \frac{\partial y_2}{\partial x_1} &= (x_2 + \cos(x_1)) \cdot m_2 + m_1 \cdot 4 \end{aligned}

接下来,我们将以这个例子说明如何进行前向自动微分和反向自动微分

前向自动微分

我们将用到如下的链式法则:

wt=i(wuiuit)=wu1u1t+wu2u2t+\begin{align} \frac{\partial w}{\partial t} &= \sum_i \left(\frac{\partial w}{\partial u_i} \cdot \frac{\partial u_i}{\partial t}\right) \\ &= \frac{\partial w}{\partial u_1} \cdot \frac{\partial u_1}{\partial t} + \frac{\partial w}{\partial u_2} \cdot \frac{\partial u_2}{\partial t} + \cdots \end{align}

其中:

  • ww表示输出
    • 在例子中,为y1y_1或者y2y_2
  • uiu_i表示直接影响ww的输入变量
    • 在例子中,为aabb
  • tt表示有待给出的输入变量
    • 在例子中,为x1x_1或者x2x_2其中之一

在计算之前,我们先将公式(1)分解为简单的算子计算:

x1=?x2=?a=x1x2b=sin(x1)c=4x1+2x2d=cos(x2)m1=a+bm2=c+dy1=m1+m2y2=m1m2(2)\begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2}

现在我们对有待给出的变量tt求导:

x1t=?x2t=?at=x2x1t+x1x2tbt=cos(x1)x1tct=4x1t+2x2tdt=sin(x2)x2tm1t=at+btm2t=ct+dty1t=m1t+m2ty2t=m1tm2+m2tm1\begin{aligned} \frac{\partial x_1}{\partial t} &= ? \\ \frac{\partial x_2}{\partial t} &= ? \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} \\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 \end{aligned}

前面有提到tt是有待给出的,现在是时候给出了:

  • t=x1t=x_1代入以上公式,则x1t=1\frac{\partial x_1}{\partial t} = 1x2t=0\frac{\partial x_2}{\partial t}=0,然后可以计算y1x1\frac{\partial y_1}{\partial x_1}y2x1\frac{\partial y_2}{\partial x_1}
x1t=1x2t=0at=x2x1t+x1x2t=x2bt=cos(x1)x1t=cos(x1)ct=4x1t+2x2t=4dt=sin(x2)x2t=0m1t=at+bt=x2+cos(x1)m2t=ct+dt=4y1t=m1t+m2t=x2+cos(x1)+4y2t=m1tm2+m2tm1=(x2+cos(x1))m2+4m1\begin{aligned} \frac{\partial x_1}{\partial t} &= 1 \\ \frac{\partial x_2}{\partial t} &= 0 \\ \\ \frac{\partial a}{\partial t} &= x_2\frac{\partial x_1}{\partial t} + x_1 \frac{\partial x_2}{\partial t} = x_2 \\ \frac{\partial b}{\partial t} &= \cos(x_1) \frac{\partial x_1}{\partial t} = \cos(x_1) \\ \\ \frac{\partial c}{\partial t} &= 4\frac{\partial x_1}{\partial t} + 2 \frac{\partial x_2}{\partial t} = 4 \\ \frac{\partial d}{\partial t} &= -\sin(x_2)\frac{\partial x_2}{\partial t} = 0\\ \\ \frac{\partial m_1}{\partial t} &= \frac{\partial a}{\partial t} + \frac{\partial b}{\partial t} = x_2 + \cos(x_1) \\ \frac{\partial m_2}{\partial t} &= \frac{\partial c}{\partial t} + \frac{\partial d}{\partial t} = 4 \\ \\ \frac{\partial y_1}{\partial t} &= \frac{\partial m_1}{\partial t} + \frac{\partial m_2}{\partial t} = x_2 + \cos(x_1) + 4 \\ \frac{\partial y_2}{\partial t} &= \frac{\partial m_1}{\partial t} \cdot m_2 + \frac{\partial m_2}{\partial t} \cdot m_1 = (x_2 + \cos(x_1)) \cdot m_2 + 4 \cdot m_1 \end{aligned}
  • t=x2t=x_2代入以上公式,则x1t=0\frac{\partial x_1}{\partial t} = 0x2t=1\frac{\partial x_2}{\partial t}=1,然后可以计算y1x2\frac{\partial y_1}{\partial x_2}y2x2\frac{\partial y_2}{\partial x_2}

可以推断:

  • 当有nn个输入变量时(本例中有2个),需要计算nn次上述公式。
  • 假设神经网络中的输入是一张1280 x 720的图片,输出是51个浮点数,那么前向微分方法则需要计算921600次。

反向自动微分

我们将用到如下的链式法则:

su=i(wiuswi)=w1usw1+w2usw2+\begin{align} \frac{\partial s}{\partial u} &= \sum_i \left(\frac{\partial w_i}{\partial u} \cdot \frac{\partial s}{\partial w_i}\right) \\ &= \frac{\partial w_1}{\partial u} \cdot \frac{\partial s}{\partial w_1} + \frac{\partial w_2}{\partial u} \cdot \frac{\partial s}{\partial w_2} + \cdots \end{align}

其中:

  • uu 表示输入变量
  • wiw_i 表示依赖 uu 的输出变量
  • ss 表示有待给出的变量

回顾拆解后的简单算子计算(2):

x1=?x2=?a=x1x2b=sin(x1)c=4x1+2x2d=cos(x2)m1=a+bm2=c+dy1=m1+m2y2=m1m2(2)\begin{aligned} x_1 &= ? \\ x_2 &= ? \\ \\ a &= x_1 \cdot x_2 \\ b &= \sin(x_1) \\ \\ c &= 4x_1 + 2x_2 \\ d &= \cos(x_2) \\ \\ m_1 &= a + b \\ m_2 &= c + d \\ \\ y_1 &= m_1 + m_2 \\ y_2 &= m_1 \cdot m_2 \end{aligned} \tag{2}

现在计算反向微分:

sy1=?sy2=?sm1=sy1y1m1+sy2y2m1sm2=sy1y1m2+sy2y2m2sa=sm1m1asb=sm1m1bsc=sm2m2csd=sm2m2dsx1=saax1+sbbx1+sccx1sx2=saax1+sccx1+sddx1\begin{aligned} \frac{\partial s}{\partial y_1} &= ? \\ \frac{\partial s}{\partial y_2} &= ? \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_1} \end{aligned}

s=y1s=y_1时:

sy1=1sy2=0sm1=sy1y1m1+sy2y2m1=1sm2=sy1y1m2+sy2y2m2=1sa=sm1m1a=1sb=sm1m1b=1sc=sm2m2c=1sd=sm2m2d=1sx1=saax1+sbbx1+sccx1=1x2+1cos(x1)+14=x2+cos(x1)+4sx2=saax2+sccx2+sddx2=1x1+12+1(sin(x2))=x1+2sin(x2)\begin{aligned} \frac{\partial s}{\partial y_1} &= 1 \\ \frac{\partial s}{\partial y_2} &= 0 \\ \\ \frac{\partial s}{\partial m_1} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_1} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_1} = 1 \\ \frac{\partial s}{\partial m_2} &= \frac{\partial s}{\partial y_1} \frac{\partial y_1}{\partial m_2} + \frac{\partial s}{\partial y_2} \frac{\partial y_2}{\partial m_2} = 1 \\ \\ \frac{\partial s}{\partial a} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial a} = 1 \\ \frac{\partial s}{\partial b} &= \frac{\partial s}{\partial m_1}\frac{\partial m_1}{\partial b} = 1 \\ \frac{\partial s}{\partial c} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial c} = 1 \\ \frac{\partial s}{\partial d} &= \frac{\partial s}{\partial m_2}\frac{\partial m_2}{\partial d} = 1 \\ \\ \frac{\partial s}{\partial x_1} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_1} + \frac{\partial s}{\partial b}\frac{\partial b}{\partial x_1} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_1} = 1 \cdot x_2 + 1 \cdot \cos(x_1) + 1 \cdot 4 = x_2 + \cos(x_1) + 4 \\ \frac{\partial s}{\partial x_2} &= \frac{\partial s}{\partial a}\frac{\partial a}{\partial x_2} + \frac{\partial s}{\partial c}\frac{\partial c}{\partial x_2} + \frac{\partial s}{\partial d}\frac{\partial d}{\partial x_2} = 1 \cdot x_1 + 1 \cdot 2 + 1 \cdot (-\sin(x_2)) = x_1 + 2 -\sin(x_2) \end{aligned}

同理可以计算当s=y2s=y_2时。

可以推断:

  • 当有nn个输出变量时(本例中有2个),需要计算nn次上述公式。
  • 假设神经网络中的输入是一张1280 x 720的图片,输出是51个浮点数,那么反向微分方法则需要计算51次。