神经网络构建基础
1. nn.Module类的定义与使用
1.1 神经网络模块化设计
PyTorch通过torch.nn.Module实现面向对象的神经网络设计,所有层和模型都应继承该基类。其核心优势包括:
- 参数自动追踪:自动管理
requires_grad状态 - 设备迁移便捷:
.to(device)统一转移所有参数 - 模型保存/加载:
state_dict()标准化参数存储
1.1.1 自定义模块示例
import torch.nn as nn
import torch.nn.functional as F
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super().__init__()
self.fc1 = nn.Linear(input_size, hidden_size)
self.relu = nn.ReLU()
self.fc2 = nn.Linear(hidden_size, num_classes)
def forward(self, x):
out = self.fc1(x) # shape: (batch, hidden_size)
out = self.relu(out)
out = self.fc2(out) # shape: (batch, num_classes)
return out
model = NeuralNet(784, 512, 10)
print(model)
classDiagram
Module <|-- NeuralNet
class Module {
+parameters()
+to(device)
+train()
+eval()
}
class NeuralNet {
+fc1: Linear
+relu: ReLU
+fc2: Linear
+forward(x)
}
1.2 前向传播控制
# 复杂前向传播示例
class ResidualBlock(nn.Module):
def __init__(self, in_channels):
super().__init__()
self.conv1 = nn.Conv2d(in_channels, in_channels, 3, padding=1)
self.bn1 = nn.BatchNorm2d(in_channels)
self.conv2 = nn.Conv2d(in_channels, in_channels, 3, padding=1)
self.bn2 = nn.BatchNorm2d(in_channels)
def forward(self, x):
residual = x
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
out += residual # 残差连接
return F.relu(out)
2. 全连接层、卷积层、池化层详解
2.1 全连接层(Linear Layer)
2.1.1 数学原理
对于输入,权重,偏置:
2.1.2 代码实现
fc = nn.Linear(in_features=784, out_features=256)
x = torch.randn(32, 784) # batch=32
y = fc(x) # shape: (32, 256)
2.2 卷积层(Convolution Layer)
2.2.1 参数解析
conv = nn.Conv2d(
in_channels=3,
out_channels=64,
kernel_size=3,
stride=1,
padding=1,
dilation=1,
groups=1
)
x = torch.randn(16, 3, 32, 32) # (batch, channels, H, W)
y = conv(x) # shape: (16, 64, 32, 32)
2.2.2 输出尺寸计算
graph TD
A[输入 3@32x32] --> B[卷积核 64@3x3]
B --> C[输出 64@32x32]
style A fill:#9f9,stroke:#333
style C fill:#f99,stroke:#333
2.3 池化层(Pooling Layer)
2.3.1 最大池化示例
pool = nn.MaxPool2d(kernel_size=2, stride=2)
x = torch.randn(16, 64, 32, 32)
y = pool(x) # shape: (16, 64, 16, 16)
2.3.2 平均池化公式
3. 激活函数与损失函数选择
3.1 常用激活函数对比
| 函数名称 | 公式 | 优点 | 缺点 |
|---|---|---|---|
| ReLU | 计算高效,缓解梯度消失 | 神经元死亡问题 | |
| Sigmoid | 输出(0,1)区间 | 梯度消失严重 | |
| Tanh | 中心化输出(-1,1) | 梯度消失 | |
| LeakyReLU | 缓解神经元死亡 | 需要调参 |
3.1.1 激活函数可视化
import matplotlib.pyplot as plt
x = torch.linspace(-5, 5, 100)
functions = {
'ReLU': nn.ReLU(),
'Sigmoid': nn.Sigmoid(),
'Tanh': nn.Tanh(),
'LeakyReLU': nn.LeakyReLU(0.1)
}
plt.figure(figsize=(12, 6))
for i, (name, func) in enumerate(functions.items()):
plt.subplot(2, 2, i+1)
plt.plot(x.numpy(), func(x).numpy())
plt.title(name)
plt.tight_layout()
3.2 损失函数选择指南
3.2.1 分类任务
# 二分类
loss_fn = nn.BCEWithLogitsLoss() # 自动计算sigmoid
# 多分类
loss_fn = nn.CrossEntropyLoss() # 输入无需softmax
# 示例计算
outputs = model(inputs) # shape: (batch, classes)
loss = loss_fn(outputs, targets) # targets: (batch,) 类索引
3.2.2 回归任务
loss_fn = nn.MSELoss() # 均方误差
loss_fn = nn.L1Loss() # 平均绝对误差
loss_fn = nn.SmoothL1Loss() # Huber损失
3.2.3 损失函数数学公式
交叉熵损失:
Huber损失:
附录:神经网络构建最佳实践
参数初始化方法
# He初始化(配合ReLU)
for layer in model.modules():
if isinstance(layer, nn.Conv2d):
nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(layer, nn.BatchNorm2d):
nn.init.constant_(layer.weight, 1)
nn.init.constant_(layer.bias, 0)
模型结构可视化
graph TD
A[输入 1x28x28] --> B[Conv2d 32@3x3]
B --> C[ReLU]
C --> D[MaxPool 2x2]
D --> E[Conv2d 64@3x3]
E --> F[ReLU]
F --> G[Flatten]
G --> H[Linear 128]
H --> I[Dropout 0.5]
I --> J[输出 10]
说明:本章代码已在PyTorch 2.1 + CUDA 11.8环境测试通过,建议结合可视化工具(如TensorBoard)实时观察网络行为。下一章将深入讲解数据加载与预处理技巧! 🚀