PyTorch自动求导机制(1)

895 阅读1分钟

torch.autograd.backward

1. Jacobian matrix

In vector calculus, the Jacobian matrix is the matrix of all first-order partial derivatives of a vector-valued function. When the matrix is a square matrix, both the matrix and its determinant are referred to as the Jacobian in literature.

Suppose {\bf f}:{\Bbb R}^{n} \to {\Bbb R}^{m} is a function which takes as input the vector {\bf x} \in {\Bbb R}^{n} and produces as output the vector {\bf f}({\bf x}) \in {\Bbb R}^{m}. Then the Jacobian matrix {\bf J} of {\bf f} is an m \times n matrix, usually defined and arranged as follows:

{\bf J}= 
    \begin{bmatrix}
        \frac{\partial {\bf f}}{\partial x_1} & \cdots & \frac{\partial {\bf f}}{\partial x_n}
    \end{bmatrix}=
    \begin{bmatrix}
        \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_n} \\
        \vdots & \ddots & \vdots \\
        \frac{\partial f_m}{\partial x_1} & \cdots & \frac{\partial f_m}{\partial x_n}
    \end{bmatrix}

or, component-wise:

{\bf J}_{ij} = \frac{\partial f_i}{\partial x_j}

2. Examples

假设:

x \to y \to z, \\
    y=f(x), z=g(y), \\
    f: {\Bbb R}^{a} \to {\Bbb R}^{b}, \\
    g: {\Bbb R}^{b} \to {\Bbb R}^{c}

则有:

{\bf J_{x \to y}}=
    \begin{bmatrix}
        \frac{\partial f_1}{\partial x_1} & \cdots & \frac{\partial f_1}{\partial x_a} \\
        \vdots & \ddots & \vdots \\
        \frac{\partial f_b}{\partial x_1} & \cdots & \frac{\partial f_b}{\partial x_a}
    \end{bmatrix},
    {\bf J_{y \to z}}=
    \begin{bmatrix}
        \frac{\partial g_1}{\partial y_1} & \cdots & \frac{\partial g_1}{\partial y_b} \\
        \vdots & \ddots & \vdots \\
        \frac{\partial g_c}{\partial y_1} & \cdots & \frac{\partial g_c}{\partial y_b}
    \end{bmatrix}

{\bf x} 
    =
    \begin{bmatrix}
        x_{11}, & x_{12}
    \end{bmatrix}
    =
    \begin{bmatrix}
        1, & 2
    \end{bmatrix},
    {\bf y}
    =
    \begin{bmatrix}
    2x_1 + x_2^2, & x_1^2 + 2x_2^3
    \end{bmatrix}

所以:

{\bf J_{x \to y}}
    =
    \begin{bmatrix}
        2, & 2x_2 \\
        2x_1, & 6x_2^2
    \end{bmatrix}
    =
    \begin{bmatrix}
    2, & 4 \\
    2, & 24
    \end{bmatrix}

假设y.backward(torch.tensor(k_1, k_2)),则结果为:k_1 * [2, 2x_2] + k2 * [2x_1, 6x_2^2]

>>> import torch

>>> x = torch.tensor([[1, 2]], requires_grad=True)
>>> y = torch.zeros(1,2)
>>> y[0,0] = 2 * x[0,0] + x[0,1] ** 2
>>> y[0,1] = x[0,0] ** 2 + 2 * x[0,1] ** 3
>>> print(x, y)
tensor([[ 1,  2]]) tensor([[  6.,  17.]])

>>> y.backward(torch.tensor([[1,0]]), retain_graph=True)
>>> print(x.grad)
tensor([[ 2,  4]])

>>> x.grad.zero_()
>>> y.backward(torch.tensor([[0,1]]), retain_graph=True)
>>> print(x.grad)
tensor([[  2,  24]])

>>> x.grad.zero_()
>>> y.backward(torch.tensor([[1,2]]), retain_graph=True)
>>> print(x.grad)
tensor([[  6,  52]])

>>> x.grad.zero_()
>>> y.backward(torch.tensor([[2,1]]), retain_graph=True)
>>> print(x.grad)
tensor([[  6,  32]])

即:

  • (k1,k2)=(1,0)\longrightarrow 1 * [2, 4] + 0 * [2, 24] = [2, 4]
  • (k1,k2)=(0,1)\longrightarrow 0 * [2, 4] + 1 * [2, 24] = [2, 24]
  • (k1,k2)=(1,2)\longrightarrow 1 * [2, 4] + 2 * [2, 24] = [6, 52]
  • (k1,k2)=(2,1)\longrightarrow 2 * [2, 4] + 1 * [2, 24] = [6, 32]