今天继续巩固CNN和网络训练的函数用法
- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊
一. 实验环境
- 语言环境:Python3.8
- 编译器:jupyter notebook
- 深度学习环境:Pytorch
- 整体使用ipynotebook编码,便于使用中间结果测试
检查GPU是否可用
device = torch.device("cuda:0" if torch.cuda.is_available() else 'cpu')
print(device)
# cuda:0
二. 数据集处理
实验中使用torchvision封装好的数据集,测试代码如下: 这次自己踩的坑是torch.utils.data.DataLoader注意大小写,写成小写的会识别不出
import torchvision
from torch.utils.data import DataLoader
train_ds = torchvision.datasets.CIFAR10('data',
train=True,
transform=torchvision.transforms.ToTensor(), # 将数据类型转化为Tensor
download=True)
test_ds = torchvision.datasets.CIFAR10('data',
train=False,
transform=torchvision.transforms.ToTensor(), # 将数据类型转化为Tensor
download=True)
batch_size = 32
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
test_dl = DataLoader(test_ds, batch_size=batch_size, shuffle=False)
data, label = next(iter(train_dl))
print(data.shape)
print(label.shape)
# torch.Size([32, 3, 32, 32])
# torch.Size([32])
三. 网络搭建
对于图像数据的处理方法:
- 卷积之后要跟池化层
- 在分类头之前要tensor.view或者torch.flatten拉平shape
- 卷积或者MLP后要加激活函数,常用relu,用法:F.relu(data)
卷积层:
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
池化层:
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
中间懒得算卷积维度的话,就测试打印中间结果
class CNN(nn.Module):
def __init__(self, num_class) -> None:
super().__init__()
self.num_class = num_class
self.hidden_dimension = 256
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, dilation=1)
self.pool1 = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, dilation=1)
self.pool2 = nn.MaxPool2d(2)
self.MLP = nn.Linear(64, num_class)
def forward(self, data):
return self.conv1(data)
net = CNN(num_class=10)
output = net(data)
print(output.shape)
# torch.Size([32, 32, 30, 30])
手动搭建的网络结构,后续重复的结构可以考虑用nn.Seqential() 优化
import torch.nn.functional as F
class CNN(nn.Module):
def __init__(self, num_class) -> None:
super().__init__()
self.num_class = num_class
self.conv1 = nn.Conv2d(3, 32, kernel_size=3, dilation=1)
self.pool1 = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, dilation=1)
self.pool2 = nn.MaxPool2d(2)
self.dropout = nn.Dropout(0.1)
self.MLP = nn.Linear(2304, self.num_class)
def forward(self, data):
data = self.pool1(F.relu(self.conv1(data)))
data = self.pool2(F.relu(self.conv2(data)))
b, _, _, _ = data.shape
data = data.view(b, -1)
data = self.MLP(data)
data = self.dropout(data)
return data
net = CNN(num_class=10).to(device)
output = net(data)
print(output.shape)
# torch.Size([32, 10])
四. 训练与测试函数
训练前需要定义超参数,几个必需的:学习率、优化器、损失函数。
learning_rate = 0.001
optimizer = Adam(net.parameters(), lr=learning_rate)
loss = CrossEntropyLoss()
epochs = 30
踩过的坑:
- 定义优化器时,要将model.parameters()作为析构函数参数传入。
- 注意device的使用,训练前将model.to(device), data.to(device)
- 如何查看tensor和model的device?
- print(next(model.parameters()).device)
- print(data.device)
model.train()的作用是启用 Batch Normalization 和 Dropout。model.eval()的作用是不启用 Batch Normalization 和 Dropout。- nn.CrossEntropyLoss()不用经过softmax,传入训练后数据
[batch_size, num_class]以及标签[batch]即可。
这里偷懒了,直接用了之前写过的train和test函数。
- train函数传入dataloader、model、loss、optimizer
- test函数传入dataloader、model、loss
- 返回的都是记录准确率和loss的列表,后续可以考虑加上验证集以及tensorboard、wandb等可视化
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset) # 训练集的大小,一共60000张图片
num_batches = len(dataloader) # 批次数目,1875(60000/32)
train_loss, train_acc = 0, 0 # 初始化训练损失和正确率
for X, y in dataloader: # 获取图片及其标签
X, y = X.to(device), y.to(device)
# 计算预测误差
pred = model(X) # 网络输出
loss = loss_fn(pred, y) # 计算网络输出和真实值之间的差距,targets为真实值,计算二者差值即为损失
# 反向传播
optimizer.zero_grad() # grad属性归零
loss.backward() # 反向传播
optimizer.step() # 每一步自动更新
# 记录acc与loss
train_acc += (pred.argmax(1) == y).type(torch.float).sum().item()
train_loss += loss.item()
train_acc /= size
train_loss /= num_batches
return train_acc, train_loss
def test (dataloader, model, loss_fn):
size = len(dataloader.dataset) # 测试集的大小,一共10000张图片
num_batches = len(dataloader) # 批次数目,313(10000/32=312.5,向上取整)
test_loss, test_acc = 0, 0
# 当不进行训练时,停止梯度更新,节省计算内存消耗
with torch.no_grad():
for imgs, target in dataloader:
imgs, target = imgs.to(device), target.to(device)
# 计算loss
target_pred = model(imgs)
loss = loss_fn(target_pred, target)
test_loss += loss.item()
test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()
test_acc /= size
test_loss /= num_batches
return test_acc, test_loss
踩得最大的坑
定义模型时,在forward函数里面定义了一个nn.Linear。导致模型.to(device)时,这个模块没有加载到GPU上。
解决方法:仅在模型__init__方法里定义网络模块。
五. 主函数测试
from torch.optim import Adam
from torch.nn import CrossEntropyLoss
learning_rate = 0.001
optimizer = Adam(net.parameters(), lr=learning_rate)
loss = CrossEntropyLoss()
epochs = 30
train_acc = []
train_loss = []
test_loss = []
test_acc = []
for i in range(epochs):
net.train()
acc_train, loss_train = train(train_dl, net, loss, optimizer)
net.eval()
acc_test, loss_test = test(test_dl, net, loss)
template = ('Epoch:{:2d}, Train_acc:{:.3f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%,Test_loss:{:.3f}')
print(template.format(i + 1, acc_train, loss_train, acc_test, loss_test))
print("done")
运行结果: