通过示例学习 PyTorch（Learning PyTorch with Examples）通过示例学习 PyTorch

通过示例学习 PyTorch（Learning PyTorch with Examples）

注意
这是一篇较早的 PyTorch 教程。你可以在「学习基础知识」系列中查看我们最新的入门内容。

本教程通过独立可运行的示例，介绍 PyTorch 的核心基础概念。

PyTorch 的核心提供两大功能：

支持 GPU 加速的 n 维张量（Tensor），功能类似 NumPy
用于构建和训练神经网络的 自动微分（Automatic differentiation）

我们将以用三阶多项式拟合 (y=\sin(x)) 作为贯穿全程的示例。网络包含 4 个可学习参数，使用梯度下降最小化网络输出与真实值之间的欧式距离，从而拟合随机数据。

注意
你可以在页面末尾浏览所有独立小节的示例。

运行以下教程前，请确保已安装 torch 和 numpy 库。

一、张量（Tensors）

热身：numpy

在介绍 PyTorch 之前，我们先用 numpy 实现网络。

Numpy 提供 n 维数组对象与大量数值计算函数，是通用科学计算框架，但对计算图、深度学习、梯度一无所知。不过我们仍可以用 numpy 手动实现前向传播与反向传播，用三阶多项式拟合正弦函数。

# -*- coding: utf-8 -*-
import numpy as np
import math

# 生成随机输入与输出数据
x = np.linspace(-math.pi, math.pi, 2000)
y = np.sin(x)

# 随机初始化权重
a = np.random.randn()
b = np.random.randn()
c = np.random.randn()
d = np.random.randn()

learning_rate = 1e-6
for t in range(2000):
    # 前向传播：计算预测值 y
    # y = a + b x + c x^2 + d x^3
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # 计算并打印损失
    loss = np.square(y_pred - y).sum()
    if t % 100 == 99:
        print(t, loss)

    # 反向传播：计算 a、b、c、d 关于损失的梯度
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # 更新权重
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a} + {b} x + {c} x^2 + {d} x^3')

PyTorch：张量（Tensors）

Numpy 虽好用，但无法利用 GPU 加速。现代深度网络中，GPU 常能提供 50 倍以上加速，因此 Numpy 已无法满足现代深度学习需求。

这里介绍 PyTorch 最核心概念：Tensor（张量）。

概念上与 numpy 数组完全一致
可跟踪计算图与梯度
可在 GPU 上加速运算

下面用 PyTorch 张量实现同样的三阶多项式拟合正弦函数，仍手动实现前向与反向传播：

# -*- coding: utf-8 -*-
import torch
import math

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") # 取消注释即可在 GPU 上运行

# 创建输入输出张量
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# 随机初始化权重
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # 前向传播
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # 计算损失
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    # 手动反向传播计算梯度
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # 更新权重
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

二、自动微分（Autograd）

PyTorch：张量与 autograd

上面的例子中，我们手动实现了前向与反向传播。对简单网络尚可接受，但对大型复杂网络会变得极其繁琐。

庆幸的是，我们可以使用 自动微分 自动完成反向传播。 PyTorch 的 autograd 包就是做这件事的：

前向传播会自动定义计算图
图中节点是张量，边是由输入张量计算输出张量的函数
反向传播可自动求梯度

使用很简单：

若张量 x 的 requires_grad=True
则 x.grad 会保存该张量关于某个标量的梯度

下面用 Tensor + autograd 实现多项式拟合，不再手动写反向传播：

import torch
import math

# 优先使用可用加速器（CUDA/MPS/XPU 等），否则用 CPU
dtype = torch.float
device = torch.accelerator.current_accelerator().type if torch.accelerator.is_available() else "cpu"
print(f"Using {device} device")
torch.set_default_device(device)

# 输入输出张量，默认不需要梯度
x = torch.linspace(-1, 1, 2000, dtype=dtype)
y = torch.exp(x)

# 权重张量，设置 requires_grad=True
a = torch.randn((), dtype=dtype, requires_grad=True)
b = torch.randn((), dtype=dtype, requires_grad=True)
c = torch.randn((), dtype=dtype, requires_grad=True)
d = torch.randn((), dtype=dtype, requires_grad=True)

initial_loss = 1.
learning_rate = 1e-5
for t in range(5000):
    # 前向传播
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # 计算损失
    loss = (y_pred - y).pow(2).sum()

    if t == 0:
        initial_loss = loss.item()

    if t % 100 == 99:
        print(f'Iteration t = {t:4d}  loss(t)/loss(0) = {round(loss.item()/initial_loss, 6):10.6f}  a = {a.item():10.6f}  b = {b.item():10.6f}  c = {c.item():10.6f}  d = {d.item():10.6f}')

    # 自动反向传播：计算所有 requires_grad=True 张量的梯度
    loss.backward()

    # 手动更新权重，不需要记录梯度
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # 梯度清零
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

PyTorch：定义新的 autograd 函数

底层上，每个原生 autograd 算子都包含前向和反向两个函数：

forward：从输入张量计算输出张量
backward：接收输出的梯度，计算输入张量的梯度

在 PyTorch 中，我们可以继承 torch.autograd.Function，轻松自定义自动微分算子。

本例中，我们将模型定义为： [ y = a + bP_3(c+dx) ] 其中 (P_3(x)=\frac{1}{2}(5x^3-3x)) 是三阶勒让德多项式。我们自定义 autograd 函数实现 (P_3)：

import torch
import math

class LegendrePolynomial3(torch.autograd.Function):
    """
    自定义 autograd 函数
    继承 torch.autograd.Function，实现 forward 和 backward
    """
    @staticmethod
    def forward(ctx, input):
        """
        前向传播：接收输入，返回输出
        ctx 用于保存前向信息，供反向传播使用
        """
        ctx.save_for_backward(input)
        return 0.5 * (5 * input ** 3 - 3 * input)

    @staticmethod
    def backward(ctx, grad_output):
        """
        反向传播：接收输出的梯度，返回输入的梯度
        """
        input, = ctx.saved_tensors
        return grad_output * 1.5 * (5 * input ** 2 - 1)

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0")

x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# 初始化参数
a = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
b = torch.full((), -1.0, device=device, dtype=dtype, requires_grad=True)
c = torch.full((), 0.0, device=device, dtype=dtype, requires_grad=True)
d = torch.full((), 0.3, device=device, dtype=dtype, requires_grad=True)

learning_rate = 5e-6
for t in range(2000):
    # 自定义函数别名
    P3 = LegendrePolynomial3.apply

    # 前向传播
    y_pred = a + b * P3(c + d * x)

    # 损失
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # 反向传播
    loss.backward()

    # 更新参数
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} * P3({c.item()} + {d.item()} x)')

三、nn 模块（神经网络高层 API）

计算图与 autograd 非常强大，但对大型网络来说级别太低。构建网络时，我们更习惯用**层（layer）**来组织计算，部分层包含可学习参数。

PyTorch 的 nn 包提供了高层抽象：

nn.Module：相当于神经网络的一层
接收输入张量，输出张量
内部可保存可学习参数
内置常用损失函数

PyTorch：nn

下面用 nn 实现多项式拟合模型：

# -*- coding: utf-8 -*-
import torch
import math

# 输入输出
x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

# 构造输入特征：x, x^2, x^3
p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)  # shape (2000, 3)

# 使用 nn.Sequential 定义模型
model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

# 损失函数：均方误差
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):
    # 前向传播
    y_pred = model(xx)

    # 损失
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    # 梯度清零
    model.zero_grad()

    # 反向传播
    loss.backward()

    # 更新权重
    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

# 取出线性层
linear_layer = model[0]

print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

PyTorch：optim

到目前为止，我们都是手动更新参数。对简单 SGD 没问题，但实际常用更复杂的优化器：Adam、RMSProp、AdaGrad 等。

PyTorch 的 optim 包封装了各种优化算法。下面用 RMSprop 优化器训练：

# -*- coding: utf-8 -*-
import torch
import math

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)
loss_fn = torch.nn.MSELoss(reduction='sum')

# 优化器：RMSprop
learning_rate = 1e-3
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

for t in range(2000):
    y_pred = model(xx)
    loss = loss_fn(y_pred, y)

    if t % 100 == 99:
        print(t, loss.item())

    # 梯度清零
    optimizer.zero_grad()

    # 反向传播
    loss.backward()

    # 优化器更新参数
    optimizer.step()

linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')

PyTorch：自定义 nn Modules

如果你需要比 Sequential 更复杂的模型，可以自定义 Module：

继承 nn.Module
实现 forward 函数

下面把三阶多项式封装为自定义模块：

# -*- coding: utf-8 -*-
import torch
import math

class Polynomial3(torch.nn.Module):
    def __init__(self):
        super().__init__()
        # 将参数定义为 nn.Parameter
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        # 前向传播
        return self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3

    def string(self):
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3'

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model = Polynomial3()

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)

for t in range(2000):
    y_pred = model(x)
    loss = criterion(y_pred, y)

    if t % 100 == 99:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

PyTorch：控制流 + 权重共享

作为动态图与权重共享的示例，我们实现一个特殊模型：

每次前向随机选择 3~5 阶多项式
4 阶、5 阶复用同一个权重 e

可以用正常 Python 控制流（for/if）定义 forward：

# -*- coding: utf-8 -*-
import random
import torch
import math

class DynamicNet(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.a = torch.nn.Parameter(torch.randn(()))
        self.b = torch.nn.Parameter(torch.randn(()))
        self.c = torch.nn.Parameter(torch.randn(()))
        self.d = torch.nn.Parameter(torch.randn(()))
        self.e = torch.nn.Parameter(torch.randn(()))

    def forward(self, x):
        y = self.a + self.b * x + self.c * x ** 2 + self.d * x ** 3
        # 随机选 4 或 5 阶
        for exp in range(4, random.randint(4, 6)):
            y += self.e * x ** exp
        return y

    def string(self):
        return f'y = {self.a.item()} + {self.b.item()} x + {self.c.item()} x^2 + {self.d.item()} x^3 + {self.e.item()} x^4 ? + {self.e.item()} x^5 ?'

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

model = DynamicNet()

criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-8, momentum=0.9)

for t in range(30000):
    y_pred = model(x)
    loss = criterion(y_pred, y)

    if t % 2000 == 1999:
        print(t, loss.item())

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

print(f'Result: {model.string()}')

示例汇总

你可以在这里快速查阅所有示例：

张量

热身：numpy
PyTorch：Tensors

自动微分

PyTorch：Tensors and autograd
PyTorch：Defining New autograd Functions

nn 模块

PyTorch：nn
PyTorch：optim
PyTorch：Custom nn Modules
PyTorch：Control Flow + Weight Sharing

通过示例学习 PyTorch（Learning PyTorch with Examples）