神经网络构建基础 # 神经网络构建基础 ## 1. `nn.Module`类的定义与使用 ### 1.1 神经网

神经网络构建基础

1. `nn.Module`类的定义与使用

1.1 神经网络模块化设计

PyTorch通过torch.nn.Module实现面向对象的神经网络设计，所有层和模型都应继承该基类。其核心优势包括：

参数自动追踪：自动管理requires_grad状态
设备迁移便捷：.to(device)统一转移所有参数
模型保存/加载：state_dict()标准化参数存储

1.1.1 自定义模块示例

import torch.nn as nn
import torch.nn.functional as F

class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super().__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)
    
    def forward(self, x):
        out = self.fc1(x)  # shape: (batch, hidden_size)
        out = self.relu(out)
        out = self.fc2(out) # shape: (batch, num_classes)
        return out

model = NeuralNet(784, 512, 10)
print(model)

classDiagram
    Module <|-- NeuralNet
    class Module {
        +parameters()
        +to(device)
        +train()
        +eval()
    }
    class NeuralNet {
        +fc1: Linear
        +relu: ReLU
        +fc2: Linear
        +forward(x)
    }

1.2 前向传播控制

# 复杂前向传播示例
class ResidualBlock(nn.Module):
    def __init__(self, in_channels):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, in_channels, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.conv2 = nn.Conv2d(in_channels, in_channels, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(in_channels)
    
    def forward(self, x):
        residual = x
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += residual  # 残差连接
        return F.relu(out)

2. 全连接层、卷积层、池化层详解

2.1 全连接层（Linear Layer）

2.1.1 数学原理

对于输入 $X \in \mathbb{R}^{batch \times in\_features}$ ，权重 $W \in \mathbb{R}^{in\_features \times out\_features}$ ，偏置 $b \in \mathbb{R}^{out\_features}$ ： $Y = XW^T + b$

2.1.2 代码实现

fc = nn.Linear(in_features=784, out_features=256)
x = torch.randn(32, 784)  # batch=32
y = fc(x)                # shape: (32, 256)

2.2 卷积层（Convolution Layer）

2.2.1 参数解析

conv = nn.Conv2d(
    in_channels=3, 
    out_channels=64, 
    kernel_size=3, 
    stride=1, 
    padding=1,
    dilation=1,
    groups=1
)
x = torch.randn(16, 3, 32, 32)  # (batch, channels, H, W)
y = conv(x)  # shape: (16, 64, 32, 32)

2.2.2 输出尺寸计算

$H_{out} = \left\lfloor\frac{H_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1}{stride} + 1\right\rfloor$

graph TD
    A[输入 3@32x32] --> B[卷积核 64@3x3]
    B --> C[输出 64@32x32]
    style A fill:#9f9,stroke:#333
    style C fill:#f99,stroke:#333

2.3 池化层（Pooling Layer）

2.3.1 最大池化示例

pool = nn.MaxPool2d(kernel_size=2, stride=2)
x = torch.randn(16, 64, 32, 32)
y = pool(x)  # shape: (16, 64, 16, 16)

2.3.2 平均池化公式

$output[n, c, i, j] = \frac{1}{kH \times kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input[n, c, stride[0] \times i + m, stride[1] \times j + n]$

3. 激活函数与损失函数选择

3.1 常用激活函数对比

函数名称	公式	优点	缺点
ReLU	$f(x)=max(0,x)$	计算高效，缓解梯度消失	神经元死亡问题
Sigmoid	$\frac{1}{1+e^{-x}}$	输出(0,1)区间	梯度消失严重
Tanh	$\frac{e^x-e^{-x}}{e^x+e^{-x}}$	中心化输出(-1,1)	梯度消失
LeakyReLU	$max(0.01x, x)$	缓解神经元死亡	需要调参

3.1.1 激活函数可视化

import matplotlib.pyplot as plt

x = torch.linspace(-5, 5, 100)
functions = {
    'ReLU': nn.ReLU(),
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'LeakyReLU': nn.LeakyReLU(0.1)
}

plt.figure(figsize=(12, 6))
for i, (name, func) in enumerate(functions.items()):
    plt.subplot(2, 2, i+1)
    plt.plot(x.numpy(), func(x).numpy())
    plt.title(name)
plt.tight_layout()

3.2 损失函数选择指南

3.2.1 分类任务

# 二分类
loss_fn = nn.BCEWithLogitsLoss()  # 自动计算sigmoid

# 多分类
loss_fn = nn.CrossEntropyLoss()  # 输入无需softmax

# 示例计算
outputs = model(inputs)          # shape: (batch, classes)
loss = loss_fn(outputs, targets) # targets: (batch,) 类索引

3.2.2 回归任务

loss_fn = nn.MSELoss()  # 均方误差
loss_fn = nn.L1Loss()   # 平均绝对误差
loss_fn = nn.SmoothL1Loss()  # Huber损失

3.2.3 损失函数数学公式

交叉熵损失： $\mathcal{L} = -\sum_{i=1}^N y_i \log(\hat{y}_i)$

Huber损失：

\mathcal{L} = \begin{cases} 0.5(x - y)^2 & \text{if } |x - y| < \delta \\ \delta(|x - y| - 0.5\delta) & \text{otherwise} \end{cases}

附录：神经网络构建最佳实践

参数初始化方法

# He初始化（配合ReLU）
for layer in model.modules():
    if isinstance(layer, nn.Conv2d):
        nn.init.kaiming_normal_(layer.weight, mode='fan_out', nonlinearity='relu')
    elif isinstance(layer, nn.BatchNorm2d):
        nn.init.constant_(layer.weight, 1)
        nn.init.constant_(layer.bias, 0)

模型结构可视化

graph TD
    A[输入 1x28x28] --> B[Conv2d 32@3x3]
    B --> C[ReLU]
    C --> D[MaxPool 2x2]
    D --> E[Conv2d 64@3x3]
    E --> F[ReLU]
    F --> G[Flatten]
    G --> H[Linear 128]
    H --> I[Dropout 0.5]
    I --> J[输出 10]

说明：本章代码已在PyTorch 2.1 + CUDA 11.8环境测试通过，建议结合可视化工具（如TensorBoard）实时观察网络行为。下一章将深入讲解数据加载与预处理技巧！ 🚀

神经网络构建基础