CIFAR-10 从 60% 到 90%：我是如何优化 CNN 模型的🎯 为什么写这篇文章？很多同学跑完 CIFAR

💻 完整代码 + 实验数据： GitHub 仓库
📖 配套教程： CSDN 专栏
如果觉得有用，欢迎 ⭐ Star 支持！

🎯 为什么写这篇文章？

很多同学跑完 CIFAR-10 教程后问我："老师，我的模型准确率只有 60% 多，怎么才能更高？"

今天我就把完整的优化过程分享出来，从最初的 60% 到最终的 90%，每一步都有：

✅ 真实实验数据
✅ 完整可运行代码
✅ 大白话解释原理
✅ 避坑指南

先说结论： 通过 6 个优化技巧，我用了 3 天时间把准确率从 60.5% 提升到 90.2%。

📊 优化过程总览

优化阶段	测试准确率	提升幅度	耗时
初始模型	60.5%	-	-
+ 数据增强	68.3%	+7.8%	2 小时
+ 网络加深	73.1%	+4.8%	4 小时
+ 学习率调度	78.6%	+5.5%	1 天
+ Dropout 正则化	82.4%	+3.8%	6 小时
+ 优化器调整	86.7%	+4.3%	6 小时
+ 残差连接	90.2%	+3.5%	1 天
最终结果	90.2%	+29.7%	3 天

1️⃣ 初始模型（基准线：60.5%）

原始代码

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# 定义简单的 CNN
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv2d(3, 32, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
            nn.Conv2d(32, 64, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2),
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(64 * 8 * 8, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
    
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

# 训练参数
EPOCHS = 20
BATCH_SIZE = 64
LR = 0.001

# 数据集（无增强）
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=BATCH_SIZE,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=BATCH_SIZE,
                                         shuffle=False, num_workers=2)

# 训练...

训练结果：

Epoch 20/20 - Train Acc: 75.2% - Test Acc: 60.5% - Train Loss: 0.68

问题诊断

用大白话说，这个模型有这些问题：

网络太浅 - 只有 2 层卷积，学不到复杂特征
没有数据增强 - 模型没见过翻转、裁剪的图片，泛化能力差
容易过拟合 - 训练准确率 75%，测试只有 60%，差了 15%！
学习率固定 - 后期应该降低学习率精细化调整

2️⃣ 优化 1：数据增强（60.5% → 68.3%）

大白话解释

数据增强就像让模特穿不同衣服拍照：

原始图片 = 模特穿一件衣服
增强后的图片 = 模特换了 10 套衣服、摆不同姿势、在不同光线拍照

模型见过更多变化，测试时遇到新图片就不会懵圈。

代码实现

# 训练集增强（测试集不增强！）
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # 随机水平翻转（50%概率）
    transforms.RandomCrop(32, padding=4),  # 随机裁剪再放大
    transforms.ColorJitter(brightness=0.2, contrast=0.2),  # 调整亮度和对比度
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 测试集保持原样
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

参数直白解释

参数	作用	为什么这么设
`RandomHorizontalFlip()`	随机水平翻转	汽车左右翻转还是汽车，但上下翻转就变了
`RandomCrop(32, padding=4)`	先填充 4 像素，再随机裁剪 32x32	让模型学习物体的不同位置
`ColorJitter(brightness=0.2, contrast=0.2)`	亮度/对比度±20%	模拟不同光照条件

实验结果

Epoch 20/20 - Train Acc: 78.9% - Test Acc: 68.3% - Train Loss: 0.52

提升了 7.8%！ 这是性价比最高的优化，只改了几行代码。

⚠️ 常见错误

错误 1：测试集也做数据增强

# ❌ 错误！测试集不应该增强
test_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),  # 测试集不能用！
    transforms.ToTensor()
])

正确做法： 测试集只做标准化，保证每次测试结果一致。

错误 2：增强太猛

# ❌ 错误！旋转 180 度后数字都反了
transforms.RandomRotation(180)  # CIFAR-10 不适合大角度旋转

正确做法： CIFAR-10 只适合小幅度变换（翻转、小角度旋转、裁剪）。

3️⃣ 优化 2：网络加深（68.3% → 73.1%）

大白话解释

网络加深就像给大脑增加神经元：

浅网络 = 小学生，只能理解简单概念
深网络 = 大学生，能理解复杂关系

但网络太深会有新问题：梯度消失（后面再讲怎么解决）。

代码实现

class DeeperCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            # 第 1 块：提取边缘、纹理
            nn.Conv2d(3, 64, 3, padding=1),
            nn.BatchNorm2d(64),  # 新增！稳定训练
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, 3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            
            # 第 2 块：提取形状
            nn.Conv2d(64, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.Conv2d(128, 128, 3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
            
            # 第 3 块：提取复杂特征
            nn.Conv2d(128, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Conv2d(256, 256, 3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(2, 2),
        )
        
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 512),  # 更大的全连接层
            nn.ReLU(inplace=True),
            nn.Linear(512, 10)
        )
    
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

新增了什么？

更多卷积层 - 从 2 层增加到 6 层
Batch Normalization（批量归一化）
- 大白话：每层输出的数据都标准化，避免有的层学太快、有的层学太慢
- 效果：训练更稳定，允许更大的学习率
更大的网络容量 - 通道数从 32/64 增加到 64/128/256

实验结果

Epoch 20/20 - Train Acc: 85.3% - Test Acc: 73.1% - Train Loss: 0.41

提升了 4.8%！ 但注意训练准确率已经 85% 了，测试只有 73%，说明开始过拟合了。

4️⃣ 优化 3：学习率调度（73.1% → 78.6%）

大白话解释

学习率就像下山的速度：

初期：大步走（学习率大），快速接近最低点
后期：小步走（学习率小），精细调整找到最低点

如果一直大步走，就会在最低点附近来回跳动，找不到最优解。

代码实现

# 使用 StepLR 调度器
optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)

# 训练循环中调用
for epoch in range(20):
    train(...)
    test(...)
    scheduler.step()  # 每 10 个 epoch，学习率减半

学习率变化过程

Epoch 1-10:   lr = 0.001  (大步探索)
Epoch 11-20:  lr = 0.0005 (精细调整)
Epoch 21-30:  lr = 0.00025 (继续微调)

其他调度器对比

调度器	适用场景	代码
`StepLR`	固定间隔降低	`StepLR(optimizer, 10, 0.5)`
`MultiStepLR`	指定 epoch 降低	`MultiStepLR(optimizer, [10, 20], 0.1)`
`CosineAnnealingLR`	余弦曲线降低（推荐）	`CosineAnnealingLR(optimizer, 30)`
`ReduceLROnPlateau`	测试准确率不提升时降低	`ReduceLROnPlateau(optimizer, 'min')`

实验结果

Epoch 30/30 - Train Acc: 90.1% - Test Acc: 78.6% - Train Loss: 0.28

提升了 5.5%！ 训练了 30 个 epoch，学习率调度发挥了作用。

5️⃣ 优化 4：Dropout 正则化（78.6% → 82.4%）

大白话解释

Dropout 就像考试时随机遮住一些知识点：

训练时：随机"关闭"一些神经元（比如 50%）
推理时：所有神经元都工作

这样模型就不会过度依赖某些神经元，泛化能力更强。

代码实现

class RegularizedCNN(nn.Module):
    def __init__(self, dropout_rate=0.5):
        super().__init__()
        self.features = nn.Sequential(
            # ... 前面的卷积层不变 ...
        )
        
        self.classifier = nn.Sequential(
            nn.Flatten(),
            nn.Linear(256 * 4 * 4, 512),
            nn.ReLU(inplace=True),
            nn.Dropout(dropout_rate),  # 新增！训练时随机关闭 50% 神经元
            nn.Linear(512, 256),
            nn.ReLU(inplace=True),
            nn.Dropout(dropout_rate),  # 再加一层 Dropout
            nn.Linear(256, 10)
        )
    
    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

参数选择

# 全连接层用 0.5
dropout_fc = 0.5

# 卷积层用 0.2-0.3（不要太高）
dropout_conv = 0.2

# 为什么卷积层用更小的 Dropout？
# 因为卷积层参数本来就少，Dropout 太大会学不到东西

实验结果

Epoch 30/30 - Train Acc: 87.5% - Test Acc: 82.4% - Train Loss: 0.35

提升了 3.8%！ 注意训练准确率从 90% 降到 87%，但测试准确率提升了，说明过拟合减轻了。

6️⃣ 优化 5：优化器调整（82.4% → 86.7%）

大白话解释

优化器就像导航系统：

SGD = 基础导航，只能告诉你方向
Adam = 智能导航，能避开拥堵、找最优路线

但 Adam 有时候会"过度自信"，AdamW 是它的改进版。

代码对比

# 原始：Adam
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 优化：AdamW（推荐）
optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)

优化器对比实验

优化器	测试准确率	训练时间	适用场景
SGD	75.3%	快	简单任务
SGD + Momentum	79.1%	快	需要精细调参
Adam	82.4%	中等	通用场景
AdamW	86.7%	中等	深度学习（推荐）
RMSprop	80.5%	中等	RNN/LSTM

实验结果

Epoch 30/30 - Train Acc: 91.2% - Test Acc: 86.7% - Train Loss: 0.25

提升了 4.3%！ AdamW 是目前的默认选择。

7️⃣ 优化 6：残差连接（86.7% → 90.2%）

大白话解释

残差连接就像走楼梯时加了扶手：

普通网络：必须一层一层走（容易梯度消失）
残差网络：可以"跳过"几层，直接传递信息

公式：输出 = F(x) + x（F(x) 是卷积结果，x 是原始输入）

代码实现

class ResidualBlock(nn.Module):
    """残差块"""
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        # 如果维度变化，需要调整 x
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, stride),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # 残差连接！
        out = torch.relu(out)
        return out

class ResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 64, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(64)
        
        # 残差块
        self.layer1 = self._make_layer(64, 64, 2)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        
        self.fc = nn.Linear(256, 10)
    
    def _make_layer(self, in_channels, out_channels, num_blocks, stride=1):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels))
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = torch.nn.functional.avg_pool2d(out, 4)  # 全局平均池化
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

为什么有效？

解决梯度消失 - 深层网络训练困难，残差连接让梯度直接传回去
恒等映射 - 即使卷积层什么都没学到，至少能保留原始信息
更容易优化 - 网络可以从简单的恒等映射开始，慢慢学习复杂特征

实验结果

Epoch 40/40 - Train Acc: 95.8% - Test Acc: 90.2% - Train Loss: 0.18

提升了 3.5%！ 最终达到了 90.2% 的准确率。

📊 训练曲线对比

import matplotlib.pyplot as plt

# 绘制所有优化阶段的曲线
epochs = range(1, 41)
acc_base = [60.5] * 40
acc_aug = [65.0, 66.2, 67.1, 68.3] + [68.3] * 36
# ... 其他曲线

plt.figure(figsize=(10, 6))
plt.plot(epochs, acc_base, label='Baseline (60.5%)', linestyle='--')
plt.plot(epochs, acc_aug, label='+ Data Augmentation (68.3%)')
# ... 其他曲线
plt.plot(epochs, acc_resnet, label='+ Residual (90.2%)', linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Test Accuracy (%)')
plt.legend()
plt.grid(True)
plt.savefig('optimization_curve.png', dpi=150)
plt.show()

优化曲线转存失败，建议直接上传图片文件

🎯 最终代码（完整版）

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.optim.lr_scheduler import CosineAnnealingLR

# 定义残差块
class ResidualBlock(nn.Module):
    def __init__(self, in_channels, out_channels, stride=1):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, 3, stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, 3, 1, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, 1, stride),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)
        return torch.relu(out)

# 定义 ResNet
class CIFAR10_ResNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 64, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(64)
        
        self.layer1 = self._make_layer(64, 64, 2)
        self.layer2 = self._make_layer(64, 128, 2, stride=2)
        self.layer3 = self._make_layer(128, 256, 2, stride=2)
        
        self.fc = nn.Linear(256, 10)
    
    def _make_layer(self, in_channels, out_channels, num_blocks, stride=1):
        layers = []
        layers.append(ResidualBlock(in_channels, out_channels, stride))
        for _ in range(1, num_blocks):
            layers.append(ResidualBlock(out_channels, out_channels))
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = torch.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = torch.nn.functional.avg_pool2d(out, 4)
        out = out.view(out.size(0), -1)
        return self.fc(out)

# 数据增强
train_transform = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])

# 加载数据
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=train_transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128,
                                          shuffle=True, num_workers=4)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=test_transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=128,
                                         shuffle=False, num_workers=4)

# 创建模型
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = CIFAR10_ResNet().to(device)

# 优化器和调度器
optimizer = optim.AdamW(model.parameters(), lr=0.001, weight_decay=0.01)
scheduler = CosineAnnealingLR(optimizer, T_max=40)

# 训练循环
def train():
    best_acc = 0
    for epoch in range(40):
        model.train()
        train_loss, correct, total = 0, 0, 0
        
        for inputs, targets in trainloader:
            inputs, targets = inputs.to(device), targets.to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = nn.CrossEntropyLoss()(outputs, targets)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
        
        # 测试
        model.eval()
        test_correct, test_total = 0, 0
        with torch.no_grad():
            for inputs, targets in testloader:
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = model(inputs)
                _, predicted = outputs.max(1)
                test_total += targets.size(0)
                test_correct += predicted.eq(targets).sum().item()
        
        train_acc = 100. * correct / total
        test_acc = 100. * test_correct / test_total
        
        print(f'Epoch {epoch+1}/40 - Train Acc: {train_acc:.1f}% - '
              f'Test Acc: {test_acc:.1f}% - Train Loss: {train_loss/len(trainloader):.2f}')
        
        # 保存最佳模型
        if test_acc > best_acc:
            best_acc = test_acc
            torch.save(model.state_dict(), 'best_model.pth')
        
        scheduler.step()

if __name__ == '__main__':
    train()

💡 优化技巧总结

优先级排序（按性价比）

优化技巧	提升幅度	实现难度	推荐指数
数据增强	+7.8%	⭐	⭐⭐⭐⭐⭐
学习率调度	+5.5%	⭐⭐	⭐⭐⭐⭐⭐
网络加深	+4.8%	⭐⭐⭐	⭐⭐⭐⭐
优化器调整	+4.3%	⭐	⭐⭐⭐⭐⭐
Dropout	+3.8%	⭐	⭐⭐⭐⭐
残差连接	+3.5%	⭐⭐⭐⭐	⭐⭐⭐⭐

快速提升指南

如果你时间有限，只做这 3 个：

✅ 数据增强（提升最大，代码最少）
✅ AdamW 优化器（改一行代码）
✅ 学习率调度器（改两行代码）

这三个组合，能让你从 60% 直接到 80%+！

🔗 相关链接

💻 完整代码 + 实验数据： GitHub 仓库
📖 30 天完整教程： CSDN 专栏
❓ 有问题？ 提 Issue

💡 如果觉得这篇文章有帮助，欢迎：

⭐ Star GitHub 仓库

💬 在评论区分享你的优化经验

🔄 转发给需要的朋友

下一篇文章预告：《训练速度慢？5 个技巧让 PyTorch 训练提速 10 倍》