神经网络构建基础

141 阅读2分钟

神经网络构建基础

1. nn.Module类的定义与使用

1.1 神经网络模块化设计

PyTorch通过torch.nn.Module实现面向对象的神经网络设计,所有层和模型都应继承该基类。其核心优势包括:

  • 参数自动追踪:自动管理requires_grad状态
  • 设备迁移便捷.to(device)统一转移所有参数
  • 模型保存/加载state_dict()标准化参数存储
1.1.1 自定义模块示例
import torch.nn as nn
import torch.nn.functional as F

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        out = self.fc1(x)  # shape: (batch, hidden_size)
        out = self.relu(out)
        out = self.fc2(out) # shape: (batch, num_classes)
        return out

model = NeuralNet(784, 512, 10)
print(model)
classDiagram
    Module <|-- NeuralNet
    class Module {
        +parameters()
        +to(device)
        +train()
        +eval()
    }
    class NeuralNet {
        +fc1: Linear
        +relu: ReLU
        +fc2: Linear
        +forward(x)
    }

1.2 前向传播控制

# 复杂前向传播示例
class ResidualBlock(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, in_channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.conv2 = nn.Conv2d(in_channels, in_channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(in_channels)
    
    def forward(self, x):
        residual = x
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += residual  # 残差连接
        return F.relu(out)

2. 全连接层、卷积层、池化层详解

2.1 全连接层(Linear Layer)

2.1.1 数学原理

对于输入XRbatch×in_featuresX \in \mathbb{R}^{batch \times in\_features},权重WRin_features×out_featuresW \in \mathbb{R}^{in\_features \times out\_features},偏置bRout_featuresb \in \mathbb{R}^{out\_features}Y=XWT+bY = XW^T + b

2.1.2 代码实现
fc = nn.Linear(in_features=784, out_features=256)
x = torch.randn(32, 784)  # batch=32
y = fc(x)                # shape: (32, 256)

2.2 卷积层(Convolution Layer)

2.2.1 参数解析
conv = nn.Conv2d(
    in_channels=3, 
    out_channels=64, 
    kernel_size=3, 
    stride=1, 
    padding=1,
    dilation=1,
    groups=1
)
x = torch.randn(16, 3, 32, 32)  # (batch, channels, H, W)
y = conv(x)  # shape: (16, 64, 32, 32)
2.2.2 输出尺寸计算

Hout=Hin+2×paddingdilation×(kernel_size1)1stride+1H_{out} = \left\lfloor\frac{H_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1}{stride} + 1\right\rfloor

graph TD
    A[输入 3@32x32] --> B[卷积核 64@3x3]
    B --> C[输出 64@32x32]
    style A fill:#9f9,stroke:#333
    style C fill:#f99,stroke:#333

2.3 池化层(Pooling Layer)

2.3.1 最大池化示例
pool = nn.MaxPool2d(kernel_size=2, stride=2)
x = torch.randn(16, 64, 32, 32)
y = pool(x)  # shape: (16, 64, 16, 16)
2.3.2 平均池化公式

output[n,c,i,j]=1kH×kWm=0kH1n=0kW1input[n,c,stride[0]×i+m,stride[1]×j+n]output[n, c, i, j] = \frac{1}{kH \times kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input[n, c, stride[0] \times i + m, stride[1] \times j + n]


3. 激活函数与损失函数选择

3.1 常用激活函数对比

函数名称公式优点缺点
ReLUf(x)=max(0,x)f(x)=max(0,x)计算高效,缓解梯度消失神经元死亡问题
Sigmoid11+ex\frac{1}{1+e^{-x}}输出(0,1)区间梯度消失严重
Tanhexexex+ex\frac{e^x-e^{-x}}{e^x+e^{-x}}中心化输出(-1,1)梯度消失
LeakyReLUmax(0.01x,x)max(0.01x, x)缓解神经元死亡需要调参
3.1.1 激活函数可视化
import matplotlib.pyplot as plt

x = torch.linspace(-5, 5, 100)
functions = {
    'ReLU': nn.ReLU(),
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'LeakyReLU': nn.LeakyReLU(0.1)
}

plt.figure(figsize=(12, 6))
for i, (name, func) in enumerate(functions.items()):
    plt.subplot(2, 2, i+1)
    plt.plot(x.numpy(), func(x).numpy())
    plt.title(name)
plt.tight_layout()

3.2 损失函数选择指南

3.2.1 分类任务
# 二分类
loss_fn = nn.BCEWithLogitsLoss()  # 自动计算sigmoid

# 多分类
loss_fn = nn.CrossEntropyLoss()  # 输入无需softmax

# 示例计算
outputs = model(inputs)          # shape: (batch, classes)
loss = loss_fn(outputs, targets) # targets: (batch,) 类索引
3.2.2 回归任务
loss_fn = nn.MSELoss()  # 均方误差
loss_fn = nn.L1Loss()   # 平均绝对误差
loss_fn = nn.SmoothL1Loss()  # Huber损失
3.2.3 损失函数数学公式

交叉熵损失: L=i=1Nyilog(y^i)\mathcal{L} = -\sum_{i=1}^N y_i \log(\hat{y}_i)

Huber损失:

L={0.5(xy)2if xy<δδ(xy0.5δ)otherwise\mathcal{L} = \begin{cases} 0.5(x - y)^2 & \text{if } |x - y| < \delta \\ \delta(|x - y| - 0.5\delta) & \text{otherwise} \end{cases}

附录:神经网络构建最佳实践

参数初始化方法

# He初始化(配合ReLU)
for layer in model.modules():
    if isinstance(layer, nn.Conv2d):
        nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(layer, nn.BatchNorm2d):
        nn.init.constant_(layer.weight, 1)
        nn.init.constant_(layer.bias, 0)

模型结构可视化

graph TD
    A[输入 1x28x28] --> B[Conv2d 32@3x3]
    B --> C[ReLU]
    C --> D[MaxPool 2x2]
    D --> E[Conv2d 64@3x3]
    E --> F[ReLU]
    F --> G[Flatten]
    G --> H[Linear 128]
    H --> I[Dropout 0.5]
    I --> J[输出 10]

说明:本章代码已在PyTorch 2.1 + CUDA 11.8环境测试通过,建议结合可视化工具(如TensorBoard)实时观察网络行为。下一章将深入讲解数据加载与预处理技巧! 🚀