Machine-Learning-Mastery-PyTorch-教程-一-Machine Learning Maste

Machine Learning Mastery PyTorch 教程（一）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

在 PyTorch 中使用单层神经网络构建图像分类器

machinelearningmastery.com/building-an-image-classifier-with-a-single-layer-neural-network-in-pytorch/

单层神经网络，也称为单层感知器，是最简单的神经网络类型。它仅由一层神经元组成，这些神经元连接到输入层和输出层。在图像分类器的情况下，输入层是图像，输出层是类别标签。

要在 PyTorch 中使用单层神经网络构建图像分类器，首先需要准备数据。这通常包括将图像和标签加载到 PyTorch 数据加载器中，然后将数据拆分为训练集和验证集。一旦数据准备好了，你就可以定义你的神经网络。

接下来，你可以使用 PyTorch 的内置函数在你的训练数据上训练网络，并在验证数据上评估其性能。你还需要选择一个优化器，如随机梯度下降（SGD），以及一个损失函数，如交叉熵损失。

注意，单层神经网络可能并不适合所有任务，但作为简单的分类器，它可以很好地发挥作用，并且有助于你理解神经网络的内部工作原理，并能够调试它。

所以，让我们构建我们的图像分类器。在这个过程中你将学习到：

如何在 PyTorch 中使用和预处理内置数据集。
如何在 PyTorch 中构建和训练自定义神经网络。
如何在 PyTorch 中一步步构建图像分类器。
如何使用训练好的模型在 PyTorch 中进行预测。

我们开始吧。

在 PyTorch 中使用单层神经网络构建图像分类器。

概述

本教程分为三个部分；它们是

准备数据集
构建模型架构
训练模型

准备数据集

在本教程中，你将使用 CIFAR-10 数据集。这个数据集用于图像分类，由 60,000 张 32×32 像素的彩色图像组成，分为 10 个类别，每个类别有 6,000 张图像。数据集包括 50,000 张训练图像和 10,000 张测试图像。类别包括飞机、汽车、鸟类、猫、鹿、狗、青蛙、马、船和卡车。CIFAR-10 是一个广泛使用的数据集，适用于机器学习和计算机视觉研究，因为它相对较小且简单，但足够具有挑战性，需要使用深度学习方法。这个数据集可以很方便地导入到 PyTorch 库中。

下面是操作方法。

import torch
import torchvision
import torchvision.transforms as transforms

# import the CIFAR-10 dataset
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transforms.ToTensor())
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transforms.ToTensor())

如果你以前从未下载过数据集，你可能会看到这段代码显示了图像下载的来源：

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
  0%|          | 0/170498071 [00:00<!--?, ?it/s]
Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified

你指定了数据集应下载的 root 目录，并设置 train=True 以导入训练集，设置 train=False 以导入测试集。download=True 参数将在指定的 root 目录中如果数据集尚未存在时进行下载。

构建神经网络模型

定义一个简单的神经网络 SimpleNet，它继承自 torch.nn.Module。该网络有两个全连接（fc）层，fc1 和 fc2，在 __init__ 方法中定义。第一个全连接层 fc1 以图像作为输入，并具有 100 个隐藏神经元。类似地，第二个全连接层 fc2 具有 100 个输入神经元和 num_classes 个输出神经元。num_classes 参数默认为 10，因为有 10 个类别。

此外，forward 方法定义了网络的前向传播，其中输入 x 通过在 __init__ 方法中定义的层进行处理。该方法首先使用 view 方法将输入张量 x 重新调整为所需的形状。然后，输入通过全连接层及其激活函数，最后返回一个输出张量。

用我的书深度学习与 PyTorch 为你的项目打个好开始。它提供了 自学教程 和 工作代码。

这里是上述所有内容的代码。

# Create the Data object
dataset = Data()

编写一个函数来可视化这些数据，这在你以后训练模型时也会很有用。

import torch.nn as nn

class SimpleNet(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(32*32*3, 100) # Fully connected layer with 100 hidden neurons
        self.fc2 = nn.Linear(100, num_classes) # Fully connected layer with num_classes outputs

    def forward(self, x):
        x = x.view(-1, 32*32*3) # reshape the input tensor
        x = self.fc1(x)
        x = torch.relu(x)
        x = self.fc2(x)
        return x

现在，让我们实例化模型对象。

# Instantiate the model
model = SimpleNet()

想要开始使用 PyTorch 进行深度学习吗？

立即获取我的免费电子邮件速成课程（附样本代码）。

点击注册，还可以获得课程的免费 PDF 电子书版本。

训练模型

你将创建两个 PyTorch DataLoader 类的实例，分别用于训练和测试。在 train_loader 中，你将批次大小设置为 64，并通过设置 shuffle=True 随机打乱训练数据。

然后，你将定义交叉熵损失函数和 Adam 优化器以训练模型。你将优化器的学习率设置为 0.001。

对于 test_loader 来说类似，只不过我们不需要进行洗牌。

# Load the data into PyTorch DataLoader
train_loader = torch.utils.data.DataLoader(train_set, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=64, shuffle=False)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

最后，让我们设置一个训练循环，以训练我们的模型几个周期。你将定义一些空列表来存储损失和准确率指标的值。

# train the model
num_epochs = 20
train_loss_history = []
train_acc_history = []
val_loss_history = []
val_acc_history = []

# Loop through the number of epochs
for epoch in range(num_epochs):
    train_loss = 0.0
    train_acc = 0.0
    val_loss = 0.0
    val_acc = 0.0

    # set model to train mode
    model.train()
    # iterate over the training data
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        #compute the loss
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        # increment the running loss and accuracy
        train_loss += loss.item()
        train_acc += (outputs.argmax(1) == labels).sum().item()

    # calculate the average training loss and accuracy
    train_loss /= len(train_loader)
    train_loss_history.append(train_loss)
    train_acc /= len(train_loader.dataset)
    train_acc_history.append(train_acc)

    # set the model to evaluation mode
    model.eval()
    with torch.no_grad():
        for inputs, labels in test_loader:
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item()
            val_acc += (outputs.argmax(1) == labels).sum().item()

    # calculate the average validation loss and accuracy
    val_loss /= len(test_loader)
    val_loss_history.append(val_loss)
    val_acc /= len(test_loader.dataset)
    val_acc_history.append(val_acc)

    print(f'Epoch {epoch+1}/{num_epochs}, train loss: {train_loss:.4f}, train acc: {train_acc:.4f}, val loss: {val_loss:.4f}, val acc: {val_acc:.4f}')

运行此循环将打印以下内容：

Epoch 1/20, train loss: 1.8757, train acc: 0.3292, val loss: 1.7515, val acc: 0.3807
Epoch 2/20, train loss: 1.7254, train acc: 0.3862, val loss: 1.6850, val acc: 0.4008
Epoch 3/20, train loss: 1.6548, train acc: 0.4124, val loss: 1.6692, val acc: 0.3987
Epoch 4/20, train loss: 1.6150, train acc: 0.4268, val loss: 1.6052, val acc: 0.4265
Epoch 5/20, train loss: 1.5874, train acc: 0.4343, val loss: 1.5803, val acc: 0.4384
Epoch 6/20, train loss: 1.5598, train acc: 0.4424, val loss: 1.5928, val acc: 0.4315
Epoch 7/20, train loss: 1.5424, train acc: 0.4506, val loss: 1.5489, val acc: 0.4514
Epoch 8/20, train loss: 1.5310, train acc: 0.4568, val loss: 1.5566, val acc: 0.4454
Epoch 9/20, train loss: 1.5116, train acc: 0.4626, val loss: 1.5501, val acc: 0.4442
Epoch 10/20, train loss: 1.5005, train acc: 0.4677, val loss: 1.5282, val acc: 0.4598
Epoch 11/20, train loss: 1.4911, train acc: 0.4702, val loss: 1.5310, val acc: 0.4629
Epoch 12/20, train loss: 1.4804, train acc: 0.4756, val loss: 1.5555, val acc: 0.4457
Epoch 13/20, train loss: 1.4743, train acc: 0.4762, val loss: 1.5207, val acc: 0.4629
Epoch 14/20, train loss: 1.4658, train acc: 0.4792, val loss: 1.5177, val acc: 0.4570
Epoch 15/20, train loss: 1.4608, train acc: 0.4819, val loss: 1.5529, val acc: 0.4527
Epoch 16/20, train loss: 1.4539, train acc: 0.4832, val loss: 1.5066, val acc: 0.4645
Epoch 17/20, train loss: 1.4486, train acc: 0.4863, val loss: 1.4874, val acc: 0.4727
Epoch 18/20, train loss: 1.4503, train acc: 0.4866, val loss: 1.5318, val acc: 0.4575
Epoch 19/20, train loss: 1.4383, train acc: 0.4910, val loss: 1.5065, val acc: 0.4673
Epoch 20/20, train loss: 1.4348, train acc: 0.4897, val loss: 1.5127, val acc: 0.4679

如你所见，这个单层分类器只训练了 20 个周期，并达到了大约 47% 的验证准确率。训练更多周期，你可能会获得一个不错的准确率。同样，我们的模型只有一个层，且有 100 个隐藏神经元。如果你添加更多层，准确率可能会显著提高。

现在，让我们绘制损失和准确率矩阵来查看它们的样子。

import matplotlib.pyplot as plt

# Plot the training and validation loss
plt.plot(train_loss_history, label='train loss')
plt.plot(val_loss_history, label='val loss')
plt.legend()
plt.show()

# Plot the training and validation accuracy
plt.plot(train_acc_history, label='train acc')
plt.plot(val_acc_history, label='val acc')
plt.legend()
plt.show()

损失图如下所示：准确率图如下所示：

这里是您如何查看模型对真实标签的预测。

import numpy as np

# get some validation data
for inputs, labels in test_loader:
    break  # this line stops the loop after the first iteration

# make predictions
outputs = model(inputs)
_, predicted = torch.max(outputs, 1)

# display the images and their labels
img_grid = torchvision.utils.make_grid(inputs)
img_grid = img_grid / 2 + 0.5     # unnormalize
npimg = img_grid.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))

print('True Labels: ', labels)
print('Predicted Labels: ', predicted)

打印的标签如下：

True Labels:  tensor([3, 8, 8, 0, 6, 6, 1, 6, 3, 1, 0, 9, 5, 7, 9, 8, 5, 7, 8, 6, 7, 0, 4, 9,
        5, 2, 4, 0, 9, 6, 6, 5, 4, 5, 9, 2, 4, 1, 9, 5, 4, 6, 5, 6, 0, 9, 3, 9,
        7, 6, 9, 8, 0, 3, 8, 8, 7, 7, 4, 6, 7, 3, 6, 3])
Predicted Labels:  tensor([3, 9, 8, 8, 4, 6, 3, 6, 2, 1, 8, 9, 6, 7, 1, 8, 5, 3, 8, 6, 9, 2, 0, 9,
        4, 6, 6, 2, 9, 6, 6, 4, 3, 3, 9, 1, 6, 9, 9, 5, 0, 6, 7, 6, 0, 9, 3, 8,
        4, 6, 9, 4, 6, 3, 8, 8, 5, 8, 8, 2, 7, 3, 6, 9])

这些标签对应以下图片：

概要

在本教程中，您学习了如何仅使用单层神经网络构建图像分类器。具体来说，您学到了：

如何使用和预处理 PyTorch 中的内置数据集。
如何在 PyTorch 中构建和训练自定义神经网络。
如何在 PyTorch 中逐步构建图像分类器。
如何使用训练好的模型在 PyTorch 中进行预测。

使用 scikit-learn 的 PyTorch 深度学习模型

原文

在 Python 中最受欢迎的深度学习库是 TensorFlow/Keras 和 PyTorch，由于它们的简洁性。然而，scikit-learn 库仍然是 Python 中最受欢迎的通用机器学习库。在这篇文章中，你将发现如何将 PyTorch 的深度学习模型与 Python 中的 scikit-learn 库结合使用。这将使你能够利用 scikit-learn 库的强大功能进行模型评估和模型超参数优化。完成本课程后，你将知道：

如何包装 PyTorch 模型以便与 scikit-learn 机器学习库一起使用
如何使用 scikit-learn 中的交叉验证轻松评估 PyTorch 模型
如何使用 scikit-learn 中的网格搜索调整 PyTorch 模型的超参数

快速启动你的项目，请参考我的书籍《使用 PyTorch 的深度学习》。它提供了自学教程和可运行的代码。

让我们开始吧！

使用 scikit-learn 的 PyTorch 深度学习模型

概述

本章分为四部分；它们是：

skorch 概述
评估深度学习模型的交叉验证
使用 scikit-learn 运行 k-折交叉验证
网格搜索深度学习模型参数

skorch 概述

PyTorch 是一个在 Python 中用于深度学习的流行库，但该库的重点是深度学习，而不是所有机器学习。实际上，它追求简约，专注于快速而简单地定义和构建深度学习模型。Python 中的 scikit-learn 库建立在 SciPy 堆栈上，以实现高效的数值计算。它是一个功能全面的通用机器学习库，并提供许多有用的工具来开发深度学习模型。尤其包括：

使用如 k-折交叉验证等重采样方法评估模型
模型超参数的高效搜索和评估
将机器学习工作流程的多个步骤连接成一个管道

PyTorch 不能直接与 scikit-learn 一起使用。但由于 Python 语言的鸭子类型特性，适应 PyTorch 模型以与 scikit-learn 一起使用是很容易的。事实上，skorch 模块就是为此目的而构建的。使用 skorch，你可以让你的 PyTorch 模型像 scikit-learn 模型一样工作。你可能会觉得使用起来更方便。

在接下来的章节中，你将通过使用 NeuralNetClassifier 封装器来处理一个在 PyTorch 中创建并用于 scikit-learn 库的分类神经网络的示例。测试问题是 Sonar 数据集。这是一个所有属性都是数值型的小型数据集，易于处理。

以下示例假设你已经成功安装了 PyTorch、skorch 和 scikit-learn。如果你使用 pip 安装 Python 模块，可以用以下命令安装它们：

pip install torch skorch scikit-learn

使用交叉验证评估深度学习模型

NeuralNet 类，或更专业的 NeuralNetClassifier、NeuralNetBinaryClassifier 和 NeuralNetRegressor 类在 skorch 中是 PyTorch 模型的工厂封装器。它们接收一个参数 model，这个参数是一个类或一个函数，用于获取你的模型。作为回报，这些封装器类允许你指定损失函数和优化器，然后训练循环自动完成。这是与直接使用 PyTorch 相比的便利之处。

以下是一个在 Sonar 数据集上训练二分类器的简单示例：

import copy

import numpy as np
from sklearn.model_selection import StratifiedKFold, train_test_split

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import LabelEncoder
from skorch import NeuralNetBinaryClassifier

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

# Define the model
class SonarClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.output(x)
        return x

# create the skorch wrapper
model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10
)

# run
model.fit(X, y)

在这个模型中，你使用了 torch.nn.BCEWithLogitsLoss 作为损失函数（这确实是 NeuralNetBinaryClassifier 的默认设置）。它将 sigmoid 函数与二元交叉熵损失结合在一起，这样你就不需要在模型输出端使用 sigmoid 函数。它有时被偏好以提供更好的数值稳定性。

此外，你在 skorch 封装器中指定了训练参数，如训练轮数和批次大小。然后你只需调用 fit() 函数并提供输入特征和目标。封装器将帮助你初始化模型并训练它。

运行上述代码将产生以下结果：

  epoch    train_loss    valid_acc    valid_loss     dur
-------  ------------  -----------  ------------  ------
      1        0.6952       0.5476        0.6921  0.0135
      2        0.6930       0.5476        0.6920  0.0114
      3        0.6925       0.5476        0.6919  0.0104
      4        0.6922       0.5238        0.6918  0.0118
      5        0.6919       0.5238        0.6917  0.0112
...
    146        0.2942       0.4524        0.9425  0.0115
    147        0.2920       0.4524        0.9465  0.0123
    148        0.2899       0.4524        0.9495  0.0112
    149        0.2879       0.4524        0.9544  0.0121
    150        0.2859       0.4524        0.9583  0.0118

请注意，skorch 被定位为适应 scikit-learn 接口的 PyTorch 模型封装器。因此，你应该将模型当作 scikit-learn 模型来使用。例如，要训练二分类模型，目标应该是一个向量而不是 $n\times 1$ 矩阵。并且在进行推断时，你应该使用 model.predict(X) 或 model.predict_proba(X)。这也是你应该使用 NeuralNetBinaryClassifier 的原因，这样分类相关的 scikit-learn 函数作为模型方法提供。

想要开始使用 PyTorch 深度学习吗？

立即参加我的免费电子邮件速成课程（包含示例代码）。

点击注册并获取课程的免费 PDF 电子书版本。

使用 scikit-learn 运行 k-Fold 交叉验证

使用 PyTorch 模型的封装器已经为你节省了大量构建自定义训练循环的样板代码。但来自 scikit-learn 的整个机器学习函数套件才是真正的生产力提升。

一个例子是使用 scikit-learn 的模型选择函数。假设你想用 k 折交叉验证评估这个模型设计。通常，这意味着将数据集分成 $k$ 部分，然后运行一个循环，将这些部分中的一个选作测试集，其余的作为训练集，从头开始训练模型并获得评估分数。这并不难，但你需要编写几行代码来实现这些功能。

确实，我们可以利用 scikit-learn 的 k 折交叉验证函数，如下：

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

kfold = StratifiedKFold(n_splits=5, shuffle=True)
results = cross_val_score(model, X, y, cv=kfold)
print(results)

NeuralNetBinaryClassifier 中的参数 verbose=False 是为了在模型训练时停止显示进度，因为进度很多。上述代码将打印验证分数，如下所示：

[0.76190476 0.76190476 0.78571429 0.75609756 0.75609756]

这些是评估分数。因为这是一个二分类模型，所以它们是平均准确率。因为是从 $k=5$ 的 k 折交叉验证中获得的，所以有五个，每个对应一个不同的测试集。通常你会用交叉验证分数的均值和标准差来评估模型：

print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

即

mean = 0.764; std = 0.011

一个好的模型应该产生高分（在这种情况下，准确率接近 1）和低标准差。高标准差意味着模型在不同测试集上的一致性较差。

将所有内容整合在一起，以下是完整代码：

import copy

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import LabelEncoder
from skorch import NeuralNetBinaryClassifier
from sklearn.model_selection import StratifiedKFold, train_test_split, cross_val_score

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

# Define the model
class SonarClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.output(x)
        return x

# create the skorch wrapper
model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

# k-fold
kfold = StratifiedKFold(n_splits=5, shuffle=True)
results = cross_val_score(model, X, y, cv=kfold)
print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

相比之下，以下是使用 scikit-learn 实现的等效神经网络模型：

from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.neural_network import MLPClassifier
import numpy as np

# load dataset
data = pd.read_csv("sonar.csv", header=None)
# split into input (X) and output (Y) variables, in numpy arrays
X = data.iloc[:, 0:60].values
y = data.iloc[:, 60].values

# binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# create model
model = MLPClassifier(hidden_layer_sizes=(60,60,60), activation='relu',
                      max_iter=150, batch_size=10, verbose=False)

# evaluate using 10-fold cross validation
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
results = cross_val_score(model, X, y, cv=kfold)
print("mean = %.3f; std = %.3f" % (results.mean(), results.std()))

你应该看到 skorch 如何使 PyTorch 模型可以替换 scikit-learn 模型。

网格搜索深度学习模型参数

前面的示例展示了如何轻松地将你的深度学习模型从 PyTorch 封装起来，并在 scikit-learn 库的函数中使用它。在这个示例中，你将更进一步。你在创建 NeuralNetBinaryClassifier 或 NeuralNetClassifier 包装器时指定给模型参数的函数可以接受许多参数。你可以使用这些参数进一步自定义模型的构建。此外，你也知道可以向 fit() 函数提供参数。

在这个示例中，你将使用网格搜索来评估神经网络模型的不同配置，并报告提供最佳估计性能的组合。为了增加趣味性，我们将修改 PyTorch 模型，使其接受一个参数来决定你希望模型有多深：

class SonarClassifier(nn.Module):
    def __init__(self, n_layers=3):
        super().__init__()
        self.layers = []
        self.acts = []
        for i in range(n_layers):
            self.layers.append(nn.Linear(60, 60))
            self.acts.append(nn.ReLU())
            self.add_module(f"layer{i}", self.layers[-1])
            self.add_module(f"act{i}", self.acts[-1])
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        for layer, act in zip(self.layers, self.acts):
            x = act(layer(x))
        x = self.output(x)
        return x

在这个设计中，我们将隐藏层及其激活函数保存在 Python 列表中。因为 PyTorch 组件不是类的直接属性，所以你不会在 model.parameters() 中看到它们。这在训练时会是个问题。这可以通过使用 self.add_module() 来注册组件来缓解。另一种方法是使用 nn.ModuleList() 代替 Python 列表，这样你就提供了足够的线索来告诉系统模型组件的位置。

skorch 封装器依旧保持不变。使用它，你可以获得一个兼容 scikit-learn 的模型。如你所见，封装器中有用于设置深度学习模型的参数，以及诸如学习率（lr）等训练参数，你可以有许多可能的变体。scikit-learn 的 GridSearchCV 函数提供网格搜索交叉验证。你可以为每个参数提供一个值列表，并要求 scikit-learn 尝试所有组合，并根据你指定的指标报告最佳参数集。示例如下：

from sklearn.model_selection import GridSearchCV

model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

param_grid = {
    'module__n_layers': [1, 3, 5], 
    'lr': [0.1, 0.01, 0.001, 0.0001],
    'max_epochs': [100, 150],
}

grid_search = GridSearchCV(model, param_grid, scoring='accuracy', verbose=1, cv=3)
result = grid_search.fit(X, y)

你将 model 传递给 GridSearchCV()，这是一个 skorch 封装器。你还传递了 param_grid，指定了要变化的参数：

PyTorch 模型中的 n_layers 参数（即 SonarClassifier 类），控制神经网络的深度。
封装器中的 lr 参数，控制优化器中的学习率。
封装器中的 max_epochs 参数，控制训练周期的数量。

注意使用双下划线来传递参数给 PyTorch 模型。实际上，这也允许你配置其他参数。例如，你可以设置 optimizer__weight_decay 来传递 weight_decay 参数给 Adam 优化器（用于设置 L2 正则化）。

运行这个可能需要一段时间，因为它尝试了所有组合，每个组合都经过 3 折交叉验证。你不希望频繁运行这个，但它对于设计模型是有用的。

网格搜索完成后，最佳模型的性能和配置组合将显示出来，随后是所有参数组合的性能，如下所示：

print("Best: %f using %s" % (result.best_score_, result.best_params_))
means = result.cv_results_['mean_test_score']
stds = result.cv_results_['std_test_score']
params = result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

它给出的结果是：

Best: 0.649551 using {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 1}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 1}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 100, 'module__n_layers': 5}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 1}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.1, 'max_epochs': 150, 'module__n_layers': 5}
0.644651 (0.062160) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 1}
0.567495 (0.049728) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.01, 'max_epochs': 100, 'module__n_layers': 5}
0.615804 (0.061966) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 1}
0.620290 (0.078243) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 3}
0.533678 (0.003611) with: {'lr': 0.01, 'max_epochs': 150, 'module__n_layers': 5}
0.635335 (0.108412) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 1}
0.582126 (0.058072) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 3}
0.563423 (0.136916) with: {'lr': 0.001, 'max_epochs': 100, 'module__n_layers': 5}
0.649551 (0.075676) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 1}
0.558178 (0.071443) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 3}
0.567909 (0.088623) with: {'lr': 0.001, 'max_epochs': 150, 'module__n_layers': 5}
0.557971 (0.041416) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 1}
0.587026 (0.079951) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 3}
0.606349 (0.092394) with: {'lr': 0.0001, 'max_epochs': 100, 'module__n_layers': 5}
0.563147 (0.099652) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 1}
0.534023 (0.057187) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 3}
0.634921 (0.057235) with: {'lr': 0.0001, 'max_epochs': 150, 'module__n_layers': 5}

在你的工作站上执行这个操作可能需要大约 5 分钟（使用 CPU 而非 GPU）。运行示例后显示了以下结果。你可以看到，网格搜索发现使用 0.001 的学习率、150 个周期和只有一个隐藏层的组合，在这个问题上获得了大约 65% 的最佳交叉验证分数。

实际上，你可以先通过标准化输入特征来看看是否能改善结果。由于封装器允许你在 scikit-learn 中使用 PyTorch 模型，你也可以实时使用 scikit-learn 的标准化器，并创建一个机器学习管道：

from sklearn.pipeline import Pipeline, FunctionTransformer
from sklearn.preprocessing import StandardScaler

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('float32', FunctionTransformer(func=lambda X: torch.tensor(X, dtype=torch.float32),
                                    validate=False)),
    ('sonarmodel', model.initialize()),
])

你创建的新对象 pipe 是另一个 scikit-learn 模型，它的工作方式与 model 对象类似，只是数据在传递给神经网络之前应用了标准化器。因此，你可以在这个管道上运行网格搜索，只需稍微调整参数的指定方式：

param_grid = {
    'sonarmodel__module__n_layers': [1, 3, 5], 
    'sonarmodel__lr': [0.1, 0.01, 0.001, 0.0001],
    'sonarmodel__max_epochs': [100, 150],
}

grid_search = GridSearchCV(pipe, param_grid, scoring='accuracy', verbose=1, cv=3)
result = grid_search.fit(X, y)
print("Best: %f using %s" % (result.best_score_, result.best_params_))
means = result.cv_results_['mean_test_score']
stds = result.cv_results_['std_test_score']
params = result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

这里要注意两点：由于 PyTorch 模型默认运行在 32 位浮点数上，但 NumPy 数组通常是 64 位浮点数。这些数据类型不对齐，但 scikit-learn 的缩放器总是返回一个 NumPy 数组。因此，你需要在管道中间进行类型转换，使用 FunctionTransformer 对象。

此外，在 scikit-learn 管道中，每个步骤都通过名称进行引用，例如 scaler 和 sonarmodel。因此，管道设置的参数也需要携带名称。在上述示例中，我们使用 sonarmodel__module__n_layers 作为网格搜索的参数。这指的是管道中的 sonarmodel 部分（即你的 skorch 封装器）、其中的 module 部分（即你的 PyTorch 模型）及其 n_layers 参数。注意使用双下划线进行层次分隔。

将所有内容整合在一起，以下是完整的代码：

import copy

import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import GridSearchCV, StratifiedKFold, train_test_split, cross_val_score
from sklearn.pipeline import Pipeline, FunctionTransformer
from sklearn.preprocessing import StandardScaler, LabelEncoder
from skorch import NeuralNetBinaryClassifier

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32)

class SonarClassifier(nn.Module):
    def __init__(self, n_layers=3):
        super().__init__()
        self.layers = []
        self.acts = []
        for i in range(n_layers):
            self.layers.append(nn.Linear(60, 60))
            self.acts.append(nn.ReLU())
            self.add_module(f"layer{i}", self.layers[-1])
            self.add_module(f"act{i}", self.acts[-1])
        self.output = nn.Linear(60, 1)

    def forward(self, x):
        for layer, act in zip(self.layers, self.acts):
            x = act(layer(x))
        x = self.output(x)
        return x

model = NeuralNetBinaryClassifier(
    SonarClassifier,
    criterion=torch.nn.BCEWithLogitsLoss,
    optimizer=torch.optim.Adam,
    lr=0.0001,
    max_epochs=150,
    batch_size=10,
    verbose=False
)

pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('float32', FunctionTransformer(func=lambda X: torch.tensor(X, dtype=torch.float32),
                                    validate=False)),
    ('sonarmodel', model.initialize()),
])

param_grid = {
    'sonarmodel__module__n_layers': [1, 3, 5], 
    'sonarmodel__lr': [0.1, 0.01, 0.001, 0.0001],
    'sonarmodel__max_epochs': [100, 150],
}

grid_search = GridSearchCV(pipe, param_grid, scoring='accuracy', verbose=1, cv=3)
result = grid_search.fit(X, y)
print("Best: %f using %s" % (result.best_score_, result.best_params_))
means = result.cv_results_['mean_test_score']
stds = result.cv_results_['std_test_score']
params = result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print("%f (%f) with: %r" % (mean, stdev, param))

深入阅读

本节提供了更多关于该主题的资源，如果你希望深入了解。

在线资源

skorch 文档
分层 K 折交叉验证器。scikit-learn 文档。
网格搜索交叉验证器。scikit-learn 文档。
管道。scikit-learn 文档

总结

在本章中，你了解了如何封装你的 PyTorch 深度学习模型并在 scikit-learn 通用机器学习库中使用它们。你学到了：

具体说明如何封装 PyTorch 模型，以便可以与 scikit-learn 机器学习库一起使用。
如何将封装的 PyTorch 模型作为评估模型性能的一部分在 scikit-learn 中使用。
如何使用封装的 PyTorch 模型在 scikit-learn 中进行超参数调整。

你可以看到，使用 scikit-learn 进行标准的机器学习操作，如模型评估和模型超参数优化，可以比自己实现这些方案节省大量时间。封装你的模型使你能够利用 scikit-learn 提供的强大工具，将你的深度学习模型融入到通用机器学习过程中。

PyTorch 中的激活函数

原文：machinelearningmastery.com/activation-functions-in-pytorch/

随着神经网络在机器学习领域的日益普及，了解激活函数在其实现中的作用变得越来越重要。在本文中，您将探讨应用于神经网络每个神经元输出的激活函数的概念，以引入模型的非线性。没有激活函数，神经网络将仅仅是一系列线性变换，这将限制它们学习复杂模式和数据间关系的能力。

PyTorch 提供了多种激活函数，每种都具有独特的特性和用途。在 PyTorch 中一些常见的激活函数包括 ReLU、sigmoid 和 tanh。选择适合特定问题的正确激活函数对于在神经网络中实现最佳性能至关重要。您将学习如何使用不同的激活函数在 PyTorch 中训练神经网络，并分析它们的性能。

在本教程中，您将学习：

关于在神经网络架构中使用的各种激活函数。
如何在 PyTorch 中实现激活函数。
如何在实际问题中比较激活函数的效果。

让我们开始吧。

PyTorch 中的激活函数

Adrian Tam 使用稳定扩散生成的图像。部分权利保留。

概述

本教程分为四个部分；它们分别是：

Logistic 激活函数
双曲正切激活函数
ReLU 激活函数
探索神经网络中的激活函数

Logistic 激活函数

您将从逻辑函数开始，这是神经网络中常用的激活函数，也称为 sigmoid 函数。它接受任何输入并将其映射到 0 到 1 之间的值，可以被解释为概率。这使得它特别适用于二元分类任务，其中网络需要预测输入属于两个类别之一的概率。

Logistic 函数的主要优势之一是它是可微分的，这意味着它可以用于反向传播算法来训练神经网络。此外，它具有平滑的梯度，有助于避免梯度爆炸等问题。然而，在训练过程中它也可能引入梯度消失的问题。

现在，让我们使用 PyTorch 对张量应用 logistic 函数，并绘制出它的图像看看。

# importing the libraries
import torch
import matplotlib.pyplot as plt

# create a PyTorch tensor
x = torch.linspace(-10, 10, 100)

# apply the logistic activation function to the tensor
y = torch.sigmoid(x)

# plot the results with a custom color
plt.plot(x.numpy(), y.numpy(), color='purple')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Logistic Activation Function')
plt.show()

在上述示例中，您使用了 PyTorch 库中的 torch.sigmoid() 函数将 logistic 激活函数应用到了张量 x 上。您还使用了 matplotlib 库创建了一个具有自定义颜色的图表。

双曲正切激活函数

接下来，你将研究 tanh 激活函数，该函数输出值介于 $-1$ 和 $1$ 之间，平均输出为 0。这有助于确保神经网络层的输出保持在 0 附近，从而对归一化目的有用。Tanh 是一个平滑且连续的激活函数，这使得在梯度下降过程中更容易优化。

与逻辑激活函数类似，tanh 函数在深度神经网络中尤其容易受到梯度消失问题的影响。这是因为函数的斜率在大或小的输入值下变得非常小，使得梯度在网络中传播变得困难。

此外，由于使用了指数函数，tanh 在计算上可能比较昂贵，尤其是在大张量或用于多层深度神经网络时。

下面是如何在张量上应用 tanh 并可视化的示例。

# apply the tanh activation function to the tensor
y = torch.tanh(x)

# plot the results with a custom color
plt.plot(x.numpy(), y.numpy(), color='blue')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.show()

ReLU 激活函数

ReLU（修正线性单元）是神经网络中另一种常用的激活函数。与 sigmoid 和 tanh 函数不同，ReLU 是一个非饱和函数，这意味着它在输入范围的极值处不会变得平坦。相反，如果输入值为正，ReLU 直接输出该值；如果为负，则输出 0。

这个简单的分段线性函数相比于 sigmoid 和 tanh 激活函数有几个优势。首先，它在计算上更高效，非常适合大规模神经网络。其次，ReLU 显示出对梯度消失问题的抗性较强，因为它没有平坦的斜率。此外，ReLU 可以帮助稀疏化网络中神经元的激活，从而可能提高泛化能力。

下面是如何将 ReLU 激活函数应用于 PyTorch 张量x并绘制结果的示例。

# apply the ReLU activation function to the tensor
y = torch.relu(x)

# plot the results with a custom color
plt.plot(x.numpy(), y.numpy(), color='green')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.show()

以下是打印上述所有激活函数的完整代码。

# importing the libraries
import torch
import matplotlib.pyplot as plt

# create a PyTorch tensor
x = torch.linspace(-10, 10, 100)

# apply the logistic activation function to the tensor and plot
y = torch.sigmoid(x)
plt.plot(x.numpy(), y.numpy(), color='purple')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Logistic Activation Function')
plt.show()

# apply the tanh activation function to the tensor and plot
y = torch.tanh(x)
plt.plot(x.numpy(), y.numpy(), color='blue')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('Tanh Activation Function')
plt.show()

# apply the ReLU activation function to the tensor and plot
y = torch.relu(x)
plt.plot(x.numpy(), y.numpy(), color='green')
plt.xlabel('Input')
plt.ylabel('Output')
plt.title('ReLU Activation Function')
plt.show()

探索神经网络中的激活函数

激活函数在深度学习模型的训练中起着至关重要的作用，因为它们向网络引入了非线性，使其能够学习复杂的模式。

让我们使用流行的 MNIST 数据集，它包含 70000 张 28×28 像素的灰度手写数字图像。你将创建一个简单的前馈神经网络来分类这些数字，并实验不同的激活函数，如 ReLU、Sigmoid、Tanh 和 Leaky ReLU。

import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Load the MNIST dataset
transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='data/', train=True, transform=transform, download=True)
test_dataset = datasets.MNIST(root='data/', train=False, transform=transform, download=True)

train_loader = DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(dataset=test_dataset, batch_size=64, shuffle=False)

让我们创建一个继承自nn.Module的NeuralNetwork类。该类有三个线性层和一个激活函数作为输入参数。前向传播方法定义了网络的前向传递过程，在每个线性层之后应用激活函数，最后一个线性层除外。

import torch
import torch.nn as nn
import torch.optim as optim

class NeuralNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, activation_function):
        super(NeuralNetwork, self).__init__()
        self.layer1 = nn.Linear(input_size, hidden_size)
        self.layer2 = nn.Linear(hidden_size, hidden_size)
        self.layer3 = nn.Linear(hidden_size, num_classes)
        self.activation_function = activation_function

    def forward(self, x):
        x = self.activation_function(self.layer1(x))
        x = self.activation_function(self.layer2(x))
        x = self.layer3(x)
        return x

你已将activation_function参数添加到NeuralNetwork类中，这使得你可以插入任何你想实验的激活函数。

使用不同激活函数训练和测试模型

让我们创建一些函数来帮助训练。train() 函数在一个 epoch 中训练网络。它遍历训练数据加载器，计算损失，并进行反向传播和优化。test() 函数在测试数据集上评估网络，计算测试损失和准确度。

def train(network, data_loader, criterion, optimizer, device):
    network.train()
    running_loss = 0.0

    for data, target in data_loader:
        data, target = data.to(device), target.to(device)
        data = data.view(data.shape[0], -1)

        optimizer.zero_grad()
        output = network(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()

        running_loss += loss.item() * data.size(0)

    return running_loss / len(data_loader.dataset)

def test(network, data_loader, criterion, device):
    network.eval()
    correct = 0
    total = 0
    test_loss = 0.0

    with torch.no_grad():
        for data, target in data_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.shape[0], -1)

            output = network(data)
            loss = criterion(output, target)
            test_loss += loss.item() * data.size(0)
            _, predicted = torch.max(output.data, 1)
            total += target.size(0)
            correct += (predicted == target).sum().item()

    return test_loss / len(data_loader.dataset), 100 * correct / total

为了进行比较，让我们创建一个激活函数的字典并进行迭代。对于每种激活函数，你实例化 NeuralNetwork 类，定义损失函数（CrossEntropyLoss），并设置优化器（Adam）。然后，训练模型若干个 epoch，每个 epoch 中调用 train() 和 test() 函数来评估模型的性能。你将每个 epoch 的训练损失、测试损失和测试准确度存储在结果字典中。

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
input_size = 784
hidden_size = 128
num_classes = 10
num_epochs = 10
learning_rate = 0.001

activation_functions = {
    'ReLU': nn.ReLU(),
    'Sigmoid': nn.Sigmoid(),
    'Tanh': nn.Tanh(),
    'LeakyReLU': nn.LeakyReLU()
}

results = {}

# Train and test the model with different activation functions
for name, activation_function in activation_functions.items():
    print(f"Training with {name} activation function...")

    model = NeuralNetwork(input_size, hidden_size, num_classes, activation_function).to(device)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)

    train_loss_history = []
    test_loss_history = []
    test_accuracy_history = []

    for epoch in range(num_epochs):
        train_loss = train(model, train_loader, criterion, optimizer, device)
        test_loss, test_accuracy = test(model, test_loader, criterion, device)

        train_loss_history.append(train_loss)
        test_loss_history.append(test_loss)
        test_accuracy_history.append(test_accuracy)

        print(f"Epoch [{epoch+1}/{num_epochs}], Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.2f}%")

    results[name] = {
        'train_loss_history': train_loss_history,
        'test_loss_history': test_loss_history,
        'test_accuracy_history': test_accuracy_history
    }

当你运行上述代码时，它会输出：

Training with ReLU activation function...
Epoch [1/10], Test Loss: 0.1589, Test Accuracy: 95.02%
Epoch [2/10], Test Loss: 0.1138, Test Accuracy: 96.52%
Epoch [3/10], Test Loss: 0.0886, Test Accuracy: 97.15%
Epoch [4/10], Test Loss: 0.0818, Test Accuracy: 97.50%
Epoch [5/10], Test Loss: 0.0783, Test Accuracy: 97.47%
Epoch [6/10], Test Loss: 0.0754, Test Accuracy: 97.80%
Epoch [7/10], Test Loss: 0.0832, Test Accuracy: 97.56%
Epoch [8/10], Test Loss: 0.0783, Test Accuracy: 97.78%
Epoch [9/10], Test Loss: 0.0789, Test Accuracy: 97.75%
Epoch [10/10], Test Loss: 0.0735, Test Accuracy: 97.99%
Training with Sigmoid activation function...
Epoch [1/10], Test Loss: 0.2420, Test Accuracy: 92.81%
Epoch [2/10], Test Loss: 0.1718, Test Accuracy: 94.99%
Epoch [3/10], Test Loss: 0.1339, Test Accuracy: 96.06%
Epoch [4/10], Test Loss: 0.1141, Test Accuracy: 96.42%
Epoch [5/10], Test Loss: 0.1004, Test Accuracy: 97.00%
Epoch [6/10], Test Loss: 0.0909, Test Accuracy: 97.10%
Epoch [7/10], Test Loss: 0.0846, Test Accuracy: 97.28%
Epoch [8/10], Test Loss: 0.0797, Test Accuracy: 97.42%
Epoch [9/10], Test Loss: 0.0785, Test Accuracy: 97.58%
Epoch [10/10], Test Loss: 0.0795, Test Accuracy: 97.58%
Training with Tanh activation function...
Epoch [1/10], Test Loss: 0.1660, Test Accuracy: 95.17%
Epoch [2/10], Test Loss: 0.1152, Test Accuracy: 96.47%
Epoch [3/10], Test Loss: 0.1057, Test Accuracy: 96.86%
Epoch [4/10], Test Loss: 0.0865, Test Accuracy: 97.21%
Epoch [5/10], Test Loss: 0.0760, Test Accuracy: 97.61%
Epoch [6/10], Test Loss: 0.0856, Test Accuracy: 97.23%
Epoch [7/10], Test Loss: 0.0735, Test Accuracy: 97.66%
Epoch [8/10], Test Loss: 0.0790, Test Accuracy: 97.67%
Epoch [9/10], Test Loss: 0.0805, Test Accuracy: 97.47%
Epoch [10/10], Test Loss: 0.0834, Test Accuracy: 97.82%
Training with LeakyReLU activation function...
Epoch [1/10], Test Loss: 0.1587, Test Accuracy: 95.14%
Epoch [2/10], Test Loss: 0.1084, Test Accuracy: 96.37%
Epoch [3/10], Test Loss: 0.0861, Test Accuracy: 97.22%
Epoch [4/10], Test Loss: 0.0883, Test Accuracy: 97.06%
Epoch [5/10], Test Loss: 0.0870, Test Accuracy: 97.37%
Epoch [6/10], Test Loss: 0.0929, Test Accuracy: 97.26%
Epoch [7/10], Test Loss: 0.0824, Test Accuracy: 97.54%
Epoch [8/10], Test Loss: 0.0785, Test Accuracy: 97.77%
Epoch [9/10], Test Loss: 0.0908, Test Accuracy: 97.92%
Epoch [10/10], Test Loss: 0.1012, Test Accuracy: 97.76%

你可以使用 Matplotlib 创建图表，以比较各个激活函数的性能。你可以创建三个独立的图表，以可视化每种激活函数在各个 epoch 上的训练损失、测试损失和测试准确度。

import matplotlib.pyplot as plt

# Plot the training loss
plt.figure()
for name, data in results.items():
    plt.plot(data['train_loss_history'], label=name)
plt.xlabel('Epoch')
plt.ylabel('Training Loss')
plt.legend()
plt.show()

# Plot the testing loss
plt.figure()
for name, data in results.items():
    plt.plot(data['test_loss_history'], label=name)
plt.xlabel('Epoch')
plt.ylabel('Testing Loss')
plt.legend()
plt.show()

# Plot the testing accuracy
plt.figure()
for name, data in results.items():
    plt.plot(data['test_accuracy_history'], label=name)
plt.xlabel('Epoch')
plt.ylabel('Testing Accuracy')
plt.legend()
plt.show()

这些图表提供了各个激活函数性能的视觉比较。通过分析结果，你可以确定哪种激活函数最适合本示例中的特定任务和数据集。

总结

在本教程中，你已经实现了 PyTorch 中一些最流行的激活函数。你还学习了如何使用流行的 MNIST 数据集在 PyTorch 中训练神经网络，尝试了 ReLU、Sigmoid、Tanh 和 Leaky ReLU 激活函数，并通过绘制训练损失、测试损失和测试准确度来分析它们的性能。

正如你所见，激活函数的选择在模型性能中起着至关重要的作用。然而，请记住，最佳的激活函数可能会根据任务和数据集的不同而有所变化。

在 PyTorch 中构建二元分类模型

原文：machinelearningmastery.com/building-a-binary-classification-model-in-pytorch/

PyTorch 库是用于深度学习的。深度学习模型的一些应用是解决回归或分类问题。

在本文中，您将发现如何使用 PyTorch 开发和评估用于二元分类问题的神经网络模型。

完成本文后，您将了解：

如何加载训练数据并使其在 PyTorch 中可用
如何设计和训练神经网络
如何使用 k 折交叉验证评估神经网络模型的性能
如何以推理模式运行模型
如何为二元分类模型创建接收器操作特性曲线

用我的书 Deep Learning with PyTorch 开始您的项目。它提供自学教程和可运行的代码。

让我们开始吧！

在 PyTorch 中构建二元分类模型

照片由 David Tang 拍摄。部分权利保留。

数据集描述

您在本教程中将使用的数据集是 Sonar 数据集。

这是描述声纳回波反射不同表面的数据集。60 个输入变量是不同角度的回波强度。这是一个需要模型区分岩石和金属圆柱体的二元分类问题。

您可以在 UCI 机器学习库上了解更多关于这个数据集的信息。您可以免费下载数据集，并将其放置在工作目录中，文件名为 sonar.csv。

这是一个被广泛理解的数据集。所有变量都是连续的，通常在 0 到 1 的范围内。输出变量是字符串“M”表示矿石和“R”表示岩石，需要将其转换为整数 1 和 0。

使用这个数据集的一个好处是它是一个标准的基准问题。这意味着我们对一个优秀模型的预期技能有一些了解。使用交叉验证，一个神经网络应该能够达到 84% 到 88% 的准确率。链接

加载数据集

如果您已经以 CSV 格式下载并将数据集保存为 sonar.csv 在本地目录中，您可以使用 pandas 加载数据集。有 60 个输入变量 (X) 和一个输出变量 (y)。由于文件包含混合数据（字符串和数字），使用 pandas 比其他工具如 NumPy 更容易读取它们。

数据可以如下读取：

import pandas as pd

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

这是一个二分类数据集。你更倾向于使用数值标签而不是字符串标签。你可以使用 scikit-learn 中的 LabelEncoder 进行这种转换。LabelEncoder 是将每个标签映射到一个整数。在这种情况下，只有两个标签，它们将变成 0 和 1。

使用它时，你需要首先调用 fit() 函数以让它学习可用的标签。然后调用 transform() 进行实际转换。下面是如何使用 LabelEncoder 将 y 从字符串转换为 0 和 1：

from sklearn.preprocessing import LabelEncoder

encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

你可以使用以下方法查看标签：

print(encoder.classes_)

输出为：

['M' 'R']

如果你运行 print(y)，你会看到以下内容

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

你会看到标签被转换为 0 和 1。从 encoder.classes_ 中，你知道 0 代表“M”，1 代表“R”。在二分类的背景下，它们也分别被称为负类和正类。

之后，你应该将它们转换为 PyTorch 张量，因为这是 PyTorch 模型希望使用的格式。

import torch

X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

想要开始使用 PyTorch 进行深度学习吗？

立即参加我的免费邮件速成课程（包括示例代码）。

点击注册，还可以获得免费的 PDF 电子书版本课程。

创建模型

现在你已经准备好进行神经网络模型训练了。

正如你在之前的一些帖子中看到的，最简单的神经网络模型是一个只有一个隐藏层的 3 层模型。深度学习模型通常指的是那些有多个隐藏层的模型。所有神经网络模型都有称为权重的参数。模型的参数越多，按照经验我们认为它就越强大。你应该使用一个层数较少但每层参数更多的模型，还是使用一个层数较多但每层参数较少的模型？让我们来探讨一下。

每层具有更多参数的模型称为更宽的模型。在这个例子中，输入数据有 60 个特征用于预测一个二分类变量。你可以假设构建一个具有 180 个神经元的单隐层宽模型（是输入特征的三倍）。这样的模型可以使用 PyTorch 构建：

import torch.nn as nn

class Wide(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(60, 180)
        self.relu = nn.ReLU()
        self.output = nn.Linear(180, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

因为这是一个二分类问题，输出必须是长度为 1 的向量。然后你还希望输出在 0 和 1 之间，因此你可以将其视为概率或模型对输入属于“正类”的预测置信度。

更多层的模型称为更深的模型。考虑到之前的模型有一个包含 180 个神经元的层，你可以尝试一个具有三个层，每层 60 个神经元的模型。这样的模型可以使用 PyTorch 构建：

class Deep(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.sigmoid(self.output(x))
        return x

你可以确认这两个模型的参数数量是相似的，如下所示：

# Compare model sizes
model1 = Wide()
model2 = Deep()
print(sum([x.reshape(-1).shape[0] for x in model1.parameters()]))  # 11161
print(sum([x.reshape(-1).shape[0] for x in model2.parameters()]))  # 11041

model1.parameters() 将返回所有模型的参数，每个参数都是 PyTorch 张量。然后你可以将每个张量重塑为向量并计算向量的长度，使用 x.reshape(-1).shape[0]。因此，上述方法总结了每个模型中的总参数数量。

使用交叉验证比较模型

你应该使用宽模型还是深度模型？一种方法是使用交叉验证来比较它们。

这是一种技术，利用“训练集”数据来训练模型，然后使用“测试集”数据来查看模型的预测准确性。测试集的结果是你应该关注的。然而，你不想只测试一次模型，因为如果你看到极端好的或坏的结果，可能是偶然的。你希望运行这个过程 $k$ 次，使用不同的训练集和测试集，以确保你在比较模型设计，而不是某次训练的结果。

你可以在这里使用的技术称为 k 折交叉验证。它将较大的数据集拆分成 $k$ 份，然后将一份作为测试集，而其他 $k-1$ 份作为训练集。这样会有 $k$ 种不同的组合。因此，你可以重复实验 $k$ 次并取平均结果。

在 scikit-learn 中，你有一个用于分层 k 折的函数。分层的意思是，当数据拆分成 $k$ 份时，算法会查看标签（即，二分类问题中的正负类），以确保每份数据中包含相等数量的各类。

运行 k 折交叉验证是微不足道的，例如以下代码：

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores = []
for train, test in kfold.split(X, y):
    # create model, train, and get accuracy
    model = Wide()
    acc = model_train(model, X[train], y[train], X[test], y[test])
    print("Accuracy (wide): %.2f" % acc)
    cv_scores.append(acc)

# evaluate the model
acc = np.mean(cv_scores)
std = np.std(cv_scores)
print("Model accuracy: %.2f%% (+/- %.2f%%)" % (acc*100, std*100))

简单来说，你使用StratifiedKFold()来自 scikit-learn 来拆分数据集。这个函数会返回给你索引。因此，你可以使用X[train]和X[test]来创建拆分后的数据集，并将它们命名为训练集和验证集（以免与“测试集”混淆，测试集会在我们选择模型设计后使用）。你假设有一个函数可以在模型上运行训练循环，并给出验证集上的准确率。然后你可以找出这个得分的均值和标准差，作为这种模型设计的性能指标。请注意，在上面的 for 循环中，你需要每次创建一个新的模型，因为你不应该在 k 折交叉验证中重新训练一个已经训练好的模型。

训练循环可以定义如下：

import copy
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm

def model_train(model, X_train, y_train, X_val, y_val):
    # loss function and optimizer
    loss_fn = nn.BCELoss()  # binary cross entropy
    optimizer = optim.Adam(model.parameters(), lr=0.0001)

    n_epochs = 250   # number of epochs to run
    batch_size = 10  # size of each batch
    batch_start = torch.arange(0, len(X_train), batch_size)

    # Hold the best model
    best_acc = - np.inf   # init to negative infinity
    best_weights = None

    for epoch in range(n_epochs):
        model.train()
        with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
            bar.set_description(f"Epoch {epoch}")
            for start in bar:
                # take a batch
                X_batch = X_train[start:start+batch_size]
                y_batch = y_train[start:start+batch_size]
                # forward pass
                y_pred = model(X_batch)
                loss = loss_fn(y_pred, y_batch)
                # backward pass
                optimizer.zero_grad()
                loss.backward()
                # update weights
                optimizer.step()
                # print progress
                acc = (y_pred.round() == y_batch).float().mean()
                bar.set_postfix(
                    loss=float(loss),
                    acc=float(acc)
                )
        # evaluate accuracy at end of each epoch
        model.eval()
        y_pred = model(X_val)
        acc = (y_pred.round() == y_val).float().mean()
        acc = float(acc)
        if acc > best_acc:
            best_acc = acc
            best_weights = copy.deepcopy(model.state_dict())
    # restore model and return best accuracy
    model.load_state_dict(best_weights)
    return best_acc

上述训练循环包含了通常的元素：前向传播、反向传播和梯度下降权重更新。但它扩展到每个 epoch 后有一个评估步骤：你以评估模式运行模型，并检查模型如何预测验证集。验证集上的准确率会被记住，并与模型权重一起保存。在训练结束时，最佳权重会被恢复到模型中，并返回最佳准确率。这个返回值是你在多次训练的 epoch 中遇到的最佳值，并且基于验证集。

注意，你在上面的tqdm中设置了disable=True。你可以将其设置为False，以便在训练过程中查看训练集的损失和准确率。

请记住，目标是选择最佳设计并重新训练模型。在训练中，你需要一个评估得分，以便了解生产中的预期效果。因此，你应该将获得的整个数据集拆分为训练集和测试集。然后，你可以在 k 折交叉验证中进一步拆分训练集。

有了这些，下面是你如何比较两个模型设计的方法：通过对每个模型进行 k 折交叉验证，并比较准确度：

from sklearn.model_selection import StratifiedKFold, train_test_split

# train-test split: Hold out the test set for final model evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores_wide = []
for train, test in kfold.split(X_train, y_train):
    # create model, train, and get accuracy
    model = Wide()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (wide): %.2f" % acc)
    cv_scores_wide.append(acc)
cv_scores_deep = []
for train, test in kfold.split(X_train, y_train):
    # create model, train, and get accuracy
    model = Deep()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (deep): %.2f" % acc)
    cv_scores_deep.append(acc)

# evaluate the model
wide_acc = np.mean(cv_scores_wide)
wide_std = np.std(cv_scores_wide)
deep_acc = np.mean(cv_scores_deep)
deep_std = np.std(cv_scores_deep)
print("Wide: %.2f%% (+/- %.2f%%)" % (wide_acc*100, wide_std*100))
print("Deep: %.2f%% (+/- %.2f%%)" % (deep_acc*100, deep_std*100))

你可能会看到上述输出如下：

Accuracy (wide): 0.72
Accuracy (wide): 0.66
Accuracy (wide): 0.83
Accuracy (wide): 0.76
Accuracy (wide): 0.83
Accuracy (deep): 0.90
Accuracy (deep): 0.72
Accuracy (deep): 0.93
Accuracy (deep): 0.69
Accuracy (deep): 0.76
Wide: 75.86% (+/- 6.54%)
Deep: 80.00% (+/- 9.61%)

因此，你发现较深的模型优于较宽的模型，因为其平均准确度更高且标准差更低。

重新训练最终模型

现在你知道选择哪个设计了，你想要重新构建模型并重新训练它。通常在 k 折交叉验证中，你会使用较小的数据集来加快训练速度。最终准确度不是问题，因为 k 折交叉验证的目的在于确定哪个设计更好。在最终模型中，你想提供更多的数据并生成更好的模型，因为这是你在生产中将使用的。

既然你已经将数据分为训练集和测试集，这些就是你将使用的数据。在 Python 代码中，

# rebuild model with full set of training data
if wide_acc > deep_acc:
    print("Retrain a wide model")
    model = Wide()
else:
    print("Retrain a deep model")
    model = Deep()
acc = model_train(model, X_train, y_train, X_test, y_test)
print(f"Final model accuracy: {acc*100:.2f}%")

你可以重用 model_train() 函数，因为它执行了所有必要的训练和验证。这是因为最终模型或在 k 折交叉验证中的训练过程不会改变。

这个模型是你可以在生产中使用的。通常，与训练不同，预测是在生产中逐个数据样本进行的。以下是我们通过运行五个测试集样本来演示使用模型进行推断的方法：

model.eval()
with torch.no_grad():
    # Test out inference with 5 samples
    for i in range(5):
        y_pred = model(X_test[i:i+1])
        print(f"{X_test[i].numpy()} -> {y_pred[0].numpy()} (expected {y_test[i].numpy()})")

它的输出应如下所示：

[0.0265 0.044  0.0137 0.0084 0.0305 0.0438 0.0341 0.078  0.0844 0.0779
 0.0327 0.206  0.1908 0.1065 0.1457 0.2232 0.207  0.1105 0.1078 0.1165
 0.2224 0.0689 0.206  0.2384 0.0904 0.2278 0.5872 0.8457 0.8467 0.7679
 0.8055 0.626  0.6545 0.8747 0.9885 0.9348 0.696  0.5733 0.5872 0.6663
 0.5651 0.5247 0.3684 0.1997 0.1512 0.0508 0.0931 0.0982 0.0524 0.0188
 0.01   0.0038 0.0187 0.0156 0.0068 0.0097 0.0073 0.0081 0.0086 0.0095] -> [0.9583146] (expected [1.])
...

[0.034  0.0625 0.0381 0.0257 0.0441 0.1027 0.1287 0.185  0.2647 0.4117
 0.5245 0.5341 0.5554 0.3915 0.295  0.3075 0.3021 0.2719 0.5443 0.7932
 0.8751 0.8667 0.7107 0.6911 0.7287 0.8792 1\.     0.9816 0.8984 0.6048
 0.4934 0.5371 0.4586 0.2908 0.0774 0.2249 0.1602 0.3958 0.6117 0.5196
 0.2321 0.437  0.3797 0.4322 0.4892 0.1901 0.094  0.1364 0.0906 0.0144
 0.0329 0.0141 0.0019 0.0067 0.0099 0.0042 0.0057 0.0051 0.0033 0.0058] -> [0.01937182] (expected [0.])

你在 torch.no_grad() 上下文中运行代码，因为你确定没有必要在结果上运行优化器。因此，你希望解除涉及的张量对如何计算值的记忆。

二分类神经网络的输出介于 0 和 1 之间（由于最后的 sigmoid 函数）。从 encoder.classes_ 中，你可以看到 0 代表“M”，1 代表“R”。对于介于 0 和 1 之间的值，你可以简单地将其四舍五入为最接近的整数并解释 0-1 结果，即，

y_pred = model(X_test[i:i+1])
y_pred = y_pred.round() # 0 or 1

或者使用其他阈值将值量化为 0 或 1，即，

threshold = 0.68
y_pred = model(X_test[i:i+1])
y_pred = (y_pred > threshold).float() # 0.0 or 1.0

实际上，将其四舍五入为最接近的整数等同于使用 0.5 作为阈值。一个好的模型应该对阈值的选择具有鲁棒性。这是指模型输出恰好为 0 或 1。否则，你会更喜欢一个很少报告中间值但经常返回接近 0 或接近 1 值的模型。要判断你的模型是否优秀，你可以使用接收者操作特征曲线（ROC），它是绘制模型在各种阈值下的真正率与假正率的图。你可以利用 scikit-learn 和 matplotlib 来绘制 ROC：

from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

with torch.no_grad():
    # Plot the ROC curve
    y_pred = model(X_test)
    fpr, tpr, thresholds = roc_curve(y_test, y_pred)
    plt.plot(fpr, tpr) # ROC curve = TPR vs FPR
    plt.title("Receiver Operating Characteristics")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.show()

你可能会看到以下内容。曲线总是从左下角开始，并在右上角结束。曲线越靠近左上角，模型的效果就越好。

完整代码

将所有内容汇总，以下是上述代码的完整版本：

import copy

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import tqdm
from sklearn.metrics import roc_curve
from sklearn.model_selection import StratifiedKFold, train_test_split
from sklearn.preprocessing import LabelEncoder

# Read data
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60]
y = data.iloc[:, 60]

# Binary encoding of labels
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# Convert to 2D PyTorch tensors
X = torch.tensor(X.values, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# Define two models
class Wide(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(60, 180)
        self.relu = nn.ReLU()
        self.output = nn.Linear(180, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.relu(self.hidden(x))
        x = self.sigmoid(self.output(x))
        return x

class Deep(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer1 = nn.Linear(60, 60)
        self.act1 = nn.ReLU()
        self.layer2 = nn.Linear(60, 60)
        self.act2 = nn.ReLU()
        self.layer3 = nn.Linear(60, 60)
        self.act3 = nn.ReLU()
        self.output = nn.Linear(60, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.sigmoid(self.output(x))
        return x

# Compare model sizes
model1 = Wide()
model2 = Deep()
print(sum([x.reshape(-1).shape[0] for x in model1.parameters()]))  # 11161
print(sum([x.reshape(-1).shape[0] for x in model2.parameters()]))  # 11041

# Helper function to train one model
def model_train(model, X_train, y_train, X_val, y_val):
    # loss function and optimizer
    loss_fn = nn.BCELoss()  # binary cross entropy
    optimizer = optim.Adam(model.parameters(), lr=0.0001)

    n_epochs = 300   # number of epochs to run
    batch_size = 10  # size of each batch
    batch_start = torch.arange(0, len(X_train), batch_size)

    # Hold the best model
    best_acc = - np.inf   # init to negative infinity
    best_weights = None

    for epoch in range(n_epochs):
        model.train()
        with tqdm.tqdm(batch_start, unit="batch", mininterval=0, disable=True) as bar:
            bar.set_description(f"Epoch {epoch}")
            for start in bar:
                # take a batch
                X_batch = X_train[start:start+batch_size]
                y_batch = y_train[start:start+batch_size]
                # forward pass
                y_pred = model(X_batch)
                loss = loss_fn(y_pred, y_batch)
                # backward pass
                optimizer.zero_grad()
                loss.backward()
                # update weights
                optimizer.step()
                # print progress
                acc = (y_pred.round() == y_batch).float().mean()
                bar.set_postfix(
                    loss=float(loss),
                    acc=float(acc)
                )
        # evaluate accuracy at end of each epoch
        model.eval()
        y_pred = model(X_val)
        acc = (y_pred.round() == y_val).float().mean()
        acc = float(acc)
        if acc > best_acc:
            best_acc = acc
            best_weights = copy.deepcopy(model.state_dict())
    # restore model and return best accuracy
    model.load_state_dict(best_weights)
    return best_acc

# train-test split: Hold out the test set for final model evaluation
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True)
cv_scores_wide = []
for train, test in kfold.split(X_train, y_train):
    # create model, train, and get accuracy
    model = Wide()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (wide): %.2f" % acc)
    cv_scores_wide.append(acc)
cv_scores_deep = []
for train, test in kfold.split(X_train, y_train):
    # create model, train, and get accuracy
    model = Deep()
    acc = model_train(model, X_train[train], y_train[train], X_train[test], y_train[test])
    print("Accuracy (deep): %.2f" % acc)
    cv_scores_deep.append(acc)

# evaluate the model
wide_acc = np.mean(cv_scores_wide)
wide_std = np.std(cv_scores_wide)
deep_acc = np.mean(cv_scores_deep)
deep_std = np.std(cv_scores_deep)
print("Wide: %.2f%% (+/- %.2f%%)" % (wide_acc*100, wide_std*100))
print("Deep: %.2f%% (+/- %.2f%%)" % (deep_acc*100, deep_std*100))

# rebuild model with full set of training data
if wide_acc > deep_acc:
    print("Retrain a wide model")
    model = Wide()
else:
    print("Retrain a deep model")
    model = Deep()
acc = model_train(model, X_train, y_train, X_test, y_test)
print(f"Final model accuracy: {acc*100:.2f}%")

model.eval()
with torch.no_grad():
    # Test out inference with 5 samples
    for i in range(5):
        y_pred = model(X_test[i:i+1])
        print(f"{X_test[i].numpy()} -> {y_pred[0].numpy()} (expected {y_test[i].numpy()})")

    # Plot the ROC curve
    y_pred = model(X_test)
    fpr, tpr, thresholds = roc_curve(y_test, y_pred)
    plt.plot(fpr, tpr) # ROC curve = TPR vs FPR
    plt.title("Receiver Operating Characteristics")
    plt.xlabel("False Positive Rate")
    plt.ylabel("True Positive Rate")
    plt.show()

总结

在这篇文章中，你发现了如何使用 PyTorch 构建二分类模型。

你学会了如何使用 PyTorch 一步一步地解决二分类问题，具体包括：

如何加载和准备 PyTorch 中使用的数据
如何创建神经网络模型并使用 k 折交叉验证对其进行比较
如何训练二分类模型并获取其接收者操作特征曲线

在 PyTorch 中构建卷积神经网络

原文：machinelearningmastery.com/building-a-convolutional-neural-network-in-pytorch/

神经网络由相互连接的层构成。有许多不同类型的层。对于图像相关的应用，你总是可以找到卷积层。这是一种参数非常少但应用于大尺寸输入的层。它之所以强大，是因为它可以保留图像的空间结构。因此，它被用于在计算机视觉神经网络中产生最先进的结果。在本文中，你将了解卷积层及其构建的网络。完成本文后，你将了解：

什么是卷积层和池化层
它们在神经网络中的适配方式
如何设计使用卷积层的神经网络

启动你的项目，请参阅我的书 Deep Learning with PyTorch。它提供了自学教程和可运行的代码。

让我们开始吧！

在 PyTorch 中构建卷积神经网络

图片由 Donna Elliot 提供。部分权利保留。

概述

本文分为四部分；它们是

卷积神经网络的理由
卷积神经网络的构建模块
卷积神经网络的一个示例
特征图中包含什么？

卷积神经网络的理由

让我们考虑构建一个神经网络来处理灰度图像作为输入，这是深度学习在计算机视觉中的最简单用例。

灰度图像是一个像素数组。每个像素的值通常在 0 到 255 的范围内。一个 32×32 的图像将有 1024 个像素。将其作为神经网络的输入意味着第一层将至少有 1024 个输入权重。

查看像素值对理解图片几乎没有用，因为数据隐藏在空间结构中（例如，图片上是否有水平线或垂直线）。因此，传统神经网络将难以从图像输入中提取信息。

卷积神经网络使用卷积层来保留像素的空间信息。它学习相邻像素的相似度，并生成特征表示。卷积层从图片中看到的内容在某种程度上对扭曲是不变的。例如，即使输入图像的颜色发生偏移、旋转或缩放，卷积神经网络也能预测相同的结果。此外，卷积层具有较少的权重，因此更容易训练。

卷积神经网络的构建模块

卷积神经网络的最简单用例是分类。你会发现它包含三种类型的层：

卷积层
池化层
全连接层

卷积层上的神经元称为滤波器。在图像应用中通常是一个二维卷积层。滤波器是一个 2D 补丁（例如 3×3 像素），应用在输入图像像素上。这个 2D 补丁的大小也称为感受野，表示它一次可以看到图像的多大部分。

卷积层的滤波器是与输入像素相乘，然后将结果求和。这个结果是输出的一个像素值。滤波器会在输入图像周围移动，填充所有输出的像素值。通常会对同一个输入应用多个滤波器，产生多个输出张量。这些输出张量称为这一层生成的特征图，它们被堆叠在一起作为一个张量，作为下一层的输入传递。

将一个二维输入应用滤波器生成特征图的示例

卷积层的输出称为特征图，因为通常它学到了输入图像的特征。例如，在特定位置是否有垂直线条。从像素学习特征有助于在更高层次理解图像。多个卷积层堆叠在一起，以从低级细节推断出更高级别的特征。

池化层用于降采样前一层的特征图。通常在卷积层后使用以整合学习到的特征。它可以压缩和泛化特征表示。池化层也有一个感受野，通常是在感受野上取平均值（平均池化）或最大值（最大池化）。

全连接层通常是网络中的最后一层。它将前面卷积和池化层整合的特征作为输入，产生预测结果。可能会有多个全连接层堆叠在一起。在分类的情况下，通常看到最终全连接层的输出应用 softmax 函数，产生类似概率的分类结果。

想要开始使用 PyTorch 进行深度学习吗？

现在开始免费的电子邮件快速入门课程（含示例代码）。

点击注册并免费获得课程的 PDF 电子书版本。

一个卷积神经网络的例子

以下是一个在 CIFAR-10 数据集上进行图像分类的程序。

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision

transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

batch_size = 32
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)
testloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)

class CIFAR10Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=(3,3), stride=1, padding=1)
        self.act1 = nn.ReLU()
        self.drop1 = nn.Dropout(0.3)

        self.conv2 = nn.Conv2d(32, 32, kernel_size=(3,3), stride=1, padding=1)
        self.act2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))

        self.flat = nn.Flatten()

        self.fc3 = nn.Linear(8192, 512)
        self.act3 = nn.ReLU()
        self.drop3 = nn.Dropout(0.5)

        self.fc4 = nn.Linear(512, 10)

    def forward(self, x):
        # input 3x32x32, output 32x32x32
        x = self.act1(self.conv1(x))
        x = self.drop1(x)
        # input 32x32x32, output 32x32x32
        x = self.act2(self.conv2(x))
        # input 32x32x32, output 32x16x16
        x = self.pool2(x)
        # input 32x16x16, output 8192
        x = self.flat(x)
        # input 8192, output 512
        x = self.act3(self.fc3(x))
        x = self.drop3(x)
        # input 512, output 10
        x = self.fc4(x)
        return x

model = CIFAR10Model()
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

n_epochs = 20
for epoch in range(n_epochs):
    for inputs, labels in trainloader:
        # forward, backward, and then weight update
        y_pred = model(inputs)
        loss = loss_fn(y_pred, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    acc = 0
    count = 0
    for inputs, labels in testloader:
        y_pred = model(inputs)
        acc += (torch.argmax(y_pred, 1) == labels).float().sum()
        count += len(labels)
    acc /= count
    print("Epoch %d: model accuracy %.2f%%" % (epoch, acc*100))

torch.save(model.state_dict(), "cifar10model.pth")

CIFAR-10 数据集提供的图像为 32×32 像素的 RGB 彩色图（即 3 个颜色通道）。有 10 类，用整数 0 到 9 标记。当你在 PyTorch 神经网络模型上处理图像时，你会发现姐妹库 torchvision 很有用。在上面的例子中，你使用它从互联网下载 CIFAR-10 数据集，并将其转换为 PyTorch 张量：

...
transform = torchvision.transforms.Compose([torchvision.transforms.ToTensor()])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

你还在 PyTorch 中使用了DataLoader来帮助创建训练批次。训练是优化模型的交叉熵损失，使用随机梯度下降。它是一个分类模型，因此分类的准确性比交叉熵更直观，它在每个 epoch 结束时通过比较输出 logit 中的最大值与数据集的标签来计算：

...
acc += (torch.argmax(y_pred, 1) == labels).float().sum()

运行上述程序来训练网络需要一些时间。这个网络应该能够在分类中达到 70%以上的准确率。

在图像分类网络中，典型的是在早期阶段由卷积层、dropout 和池化层交错组成。然后，在后期阶段，卷积层的输出被展平并由一些全连接层处理。

特征图中包含什么？

上述定义的网络中有两个卷积层。它们都定义了 3×3 的核大小，因此每次看 9 个像素以产生一个输出像素。注意第一个卷积层将 RGB 图像作为输入。因此，每个像素有三个通道。第二个卷积层将具有 32 个通道的特征图作为输入。因此，它看到的每个“像素”将有 32 个值。因此，尽管它们具有相同的感受野，第二个卷积层具有更多的参数。

让我们看看特征图中有什么。假设我们从训练集中选择了一个输入样本：

import matplotlib.pyplot as plt

plt.imshow(trainset.data[7])
plt.show()

你应该能看到这是一张马的图像，32×32 像素，带有 RGB 通道：

首先，你需要将其转换为 PyTorch 张量，并将其转换为一个图像的批次。PyTorch 模型期望每个图像以(channel, height, width)的格式作为张量，但你读取的数据是(height, width, channel)的格式。如果你使用torchvision来将图像转换为 PyTorch 张量，则此格式转换会自动完成。否则，在使用之前需要 重新排列 维度。

然后，将其通过模型的第一个卷积层，并捕获输出。你需要告诉 PyTorch 在这个计算中不需要梯度，因为你不打算优化模型权重：

X = torch.tensor([trainset.data[7]], dtype=torch.float32).permute(0,3,1,2)
model.eval()
with torch.no_grad():
    feature_maps = model.conv1(X)

特征图存储在一个张量中。你可以使用 matplotlib 来可视化它们：

fig, ax = plt.subplots(4, 8, sharex=True, sharey=True, figsize=(16,8))
for i in range(0, 32):
    row, col = i//8, i%8
    ax[row][col].imshow(feature_maps[0][i])
plt.show()

之后，你可能会看到以下内容：

特征图之所以被称为特征图，是因为它们突出显示了输入图像中的某些特征。使用一个小窗口来识别特征（在本例中是一个 3×3 像素的滤波器）。输入图像有三个色彩通道。每个通道应用了不同的滤波器，它们的结果被合并为一个输出特征。

类似地，你可以显示第二个卷积层输出的特征图，如下所示：

X = torch.tensor([trainset.data[7]], dtype=torch.float32).permute(0,3,1,2)

model.eval()
with torch.no_grad():
    feature_maps = model.act1(model.conv1(X))
    feature_maps = model.drop1(feature_maps)
    feature_maps = model.conv2(feature_maps)

fig, ax = plt.subplots(4, 8, sharex=True, sharey=True, figsize=(16,8))
for i in range(0, 32):
    row, col = i//8, i%8
    ax[row][col].imshow(feature_maps[0][i])
plt.show()

显示如下：

相对于第一个卷积层的输出，第二个卷积层的特征图看起来更模糊、更抽象。但这些对模型来识别对象更有用。

将所有内容整合在一起，下面的代码加载了前一节保存的模型并生成了特征图：

import torch
import torch.nn as nn
import torchvision
import matplotlib.pyplot as plt

trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True)

class CIFAR10Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=(3,3), stride=1, padding=1)
        self.act1 = nn.ReLU()
        self.drop1 = nn.Dropout(0.3)

        self.conv2 = nn.Conv2d(32, 32, kernel_size=(3,3), stride=1, padding=1)
        self.act2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=(2, 2))

        self.flat = nn.Flatten()

        self.fc3 = nn.Linear(8192, 512)
        self.act3 = nn.ReLU()
        self.drop3 = nn.Dropout(0.5)

        self.fc4 = nn.Linear(512, 10)

    def forward(self, x):
        # input 3x32x32, output 32x32x32
        x = self.act1(self.conv1(x))
        x = self.drop1(x)
        # input 32x32x32, output 32x32x32
        x = self.act2(self.conv2(x))
        # input 32x32x32, output 32x16x16
        x = self.pool2(x)
        # input 32x16x16, output 8192
        x = self.flat(x)
        # input 8192, output 512
        x = self.act3(self.fc3(x))
        x = self.drop3(x)
        # input 512, output 10
        x = self.fc4(x)
        return x

model = CIFAR10Model()
model.load_state_dict(torch.load("cifar10model.pth"))

plt.imshow(trainset.data[7])
plt.show()

X = torch.tensor([trainset.data[7]], dtype=torch.float32).permute(0,3,1,2)
model.eval()
with torch.no_grad():
    feature_maps = model.conv1(X)
fig, ax = plt.subplots(4, 8, sharex=True, sharey=True, figsize=(16,8))
for i in range(0, 32):
    row, col = i//8, i%8
    ax[row][col].imshow(feature_maps[0][i])
plt.show()

with torch.no_grad():
    feature_maps = model.act1(model.conv1(X))
    feature_maps = model.drop1(feature_maps)
    feature_maps = model.conv2(feature_maps)
fig, ax = plt.subplots(4, 8, sharex=True, sharey=True, figsize=(16,8))
for i in range(0, 32):
    row, col = i//8, i%8
    ax[row][col].imshow(feature_maps[0][i])
plt.show()

进一步阅读

如果你想深入了解这个主题，本节提供了更多资源。

文章

卷积层在深度学习神经网络中的工作原理
分类器训练，来自 PyTorch 教程

书籍

第九章：卷积网络，《深度学习》（amzn.to/2Dl124s），20… 年。

API

nn.Conv2d 层在 PyTorch 中的应用

总结

在本文中，你学会了如何使用卷积神经网络处理图像输入，并如何可视化特征图。

具体来说，你学到了：

典型卷积神经网络的结构
滤波器大小对卷积层的影响是什么
在网络中堆叠卷积层的效果是什么
如何提取和可视化卷积神经网络的特征图

在 PyTorch 中构建逻辑回归分类器

原文：machinelearningmastery.com/building-a-logistic-regression-classifier-in-pytorch/

逻辑回归是一种回归类型，用于预测事件的概率。它用于分类问题，并在机器学习、人工智能和数据挖掘领域有许多应用。

逻辑回归的公式是对线性函数的输出应用 sigmoid 函数。本文讨论了如何构建逻辑回归分类器。虽然之前你在处理单变量数据集，但在这里我们将使用一个流行的 MNIST 数据集来训练和测试我们的模型。通过阅读本文，你将学到：

如何在 PyTorch 中使用逻辑回归以及它如何应用于实际问题。
如何加载和分析 torchvision 数据集。
如何在图像数据集上构建和训练逻辑回归分类器。

启动你的项目，请参考我的书籍《使用 PyTorch 进行深度学习》。它提供了自学教程和有效代码。

开始吧！

在 PyTorch 中构建逻辑回归分类器。

图片来自 Catgirlmutant。保留一些权利。

概述

本教程分为四部分；它们是

MNIST 数据集
将数据集加载到 DataLoader 中
使用 nn.Module 构建模型
训练分类器

MNIST 数据集

你将使用 MNIST 数据集来训练和测试一个逻辑回归模型。该数据集包含 6000 张用于训练的图像和 10000 张用于测试样本性能的图像。

MNIST 数据集如此流行，以至于它是 PyTorch 的一部分。以下是如何在 PyTorch 中加载 MNIST 数据集的训练和测试样本。

import torch
import torchvision.transforms as transforms
from torchvision import datasets

# loading training data
train_dataset = datasets.MNIST(root='./data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)
#loading test data
test_dataset = datasets.MNIST(root='./data', 
                              train=False, 
                              transform=transforms.ToTensor())

数据集将被下载并提取到如下目录中。

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz
  0%|          | 0/9912422 [00:00<?, ?it/s]
Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz
  0%|          | 0/28881 [00:00<?, ?it/s]
Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz
  0%|          | 0/1648877 [00:00<?, ?it/s]
Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz
  0%|          | 0/4542 [00:00<?, ?it/s]
Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw

让我们验证数据集中训练和测试样本的数量。

print("number of training samples: " + str(len(train_dataset)) + "\n" +
      "number of testing samples: " + str(len(test_dataset)))

它会打印

number of training samples: 60000
number of testing samples: 10000

数据集中每个样本都是图像和标签的配对。要检查训练数据中第一个元素的数据类型和大小，你可以使用 type() 和 size() 方法。

print("datatype of the 1st training sample: ", train_dataset[0][0].type())
print("size of the 1st training sample: ", train_dataset[0][0].size())

这将打印

datatype of the 1st training sample:  torch.FloatTensor
size of the 1st training sample:  torch.Size([1, 28, 28])

你可以使用列表索引从数据集中访问样本。数据集中的第一个样本是 FloatTensor，它是一个 $28\times 28$ 像素的灰度图像（即一个通道），因此大小为 [1, 28, 28]。

现在，让我们检查训练集中的前两个样本的标签。

# check the label of first two training sample
print("label of the first taining sample: ", train_dataset[0][1])
print("label of the second taining sample: ", train_dataset[1][1])

这显示了

label of the first taining sample:  5
label of the second taining sample:  0

从上述内容可以看出，训练集中的前两张图像分别代表“5”和“0”。让我们展示这两张图像以确认。

img_5 = train_dataset[0][0].numpy().reshape(28, 28)
plt.imshow(img_5, cmap='gray')
plt.show()
img_0 = train_dataset[1][0].numpy().reshape(28, 28)
plt.imshow(img_0, cmap='gray')
plt.show()

你应该能看到这两个数字：

将数据集加载到 DataLoader 中

通常情况下，你不直接在训练中使用数据集，而是通过一个DataLoader类。这使你可以批量读取数据，而不是逐个样本。

在接下来，数据以批量大小为 32 加载到DataLoader中。

...
from torch.utils.data import DataLoader

# load train and test data samples into dataloader
batach_size = 32
train_loader = DataLoader(dataset=train_dataset, batch_size=batach_size, shuffle=True) 
test_loader = DataLoader(dataset=test_dataset, batch_size=batach_size, shuffle=False)

想要开始构建带有注意力的 Transformer 模型吗？

现在就参加我的免费 12 天电子邮件速成课程（附有示例代码）。

点击注册，并免费获取课程的 PDF 电子书版本。

使用`nn.Module`构建模型

让我们使用nn.Module为我们的逻辑回归模型构建模型类。这个类与以前的帖子中的类似，但是输入和输出的数量是可配置的。

# build custom module for logistic regression
class LogisticRegression(torch.nn.Module):    
    # build the constructor
    def __init__(self, n_inputs, n_outputs):
        super(LogisticRegression, self).__init__()
        self.linear = torch.nn.Linear(n_inputs, n_outputs)
    # make predictions
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

此模型将以 $28\times 28$ 像素的手写数字图像作为输入，并将它们分类为数字 0 到 9 中的一个输出类。因此，下面是如何实例化模型的方法。

# instantiate the model
n_inputs = 28*28 # makes a 1D vector of 784
n_outputs = 10
log_regr = LogisticRegression(n_inputs, n_outputs)

训练分类器

你将使用随机梯度下降作为优化器，学习率为 0.001，交叉熵为损失度量来训练此模型。

然后，模型进行了 50 个周期的训练。请注意，你使用了view()方法将图像矩阵展平为行，以适应逻辑回归模型输入的形状。

...

# defining the optimizer
optimizer = torch.optim.SGD(log_regr.parameters(), lr=0.001)
# defining Cross-Entropy loss
criterion = torch.nn.CrossEntropyLoss()

epochs = 50
Loss = []
acc = []
for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = log_regr(images.view(-1, 28*28))
        loss = criterion(outputs, labels)
        # Loss.append(loss.item())
        loss.backward()
        optimizer.step()
    Loss.append(loss.item())
    correct = 0
    for images, labels in test_loader:
        outputs = log_regr(images.view(-1, 28*28))
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum()
    accuracy = 100 * (correct.item()) / len(test_dataset)
    acc.append(accuracy)
    print('Epoch: {}. Loss: {}. Accuracy: {}'.format(epoch, loss.item(), accuracy))

在训练过程中，你应该看到如下的进展：

Epoch: 0\. Loss: 2.211054563522339\. Accuracy: 61.63
Epoch: 1\. Loss: 2.1178536415100098\. Accuracy: 74.81
Epoch: 2\. Loss: 2.0735440254211426\. Accuracy: 78.47
Epoch: 3\. Loss: 2.040225028991699\. Accuracy: 80.17
Epoch: 4\. Loss: 1.9637292623519897\. Accuracy: 81.05
Epoch: 5\. Loss: 2.000900983810425\. Accuracy: 81.44
...
Epoch: 45\. Loss: 1.6549798250198364\. Accuracy: 86.3
Epoch: 46\. Loss: 1.7053509950637817\. Accuracy: 86.31
Epoch: 47\. Loss: 1.7396119832992554\. Accuracy: 86.36
Epoch: 48\. Loss: 1.6963073015213013\. Accuracy: 86.37
Epoch: 49\. Loss: 1.6838685274124146\. Accuracy: 86.46

通过仅训练 50 个周期，你已经达到了约 86%的准确率。如果进一步训练模型，准确率可以进一步提高。

让我们看看损失和准确率图表的可视化效果。以下是损失：

plt.plot(Loss)
plt.xlabel("no. of epochs")
plt.ylabel("total loss")
plt.title("Loss")
plt.show()

而这是关于准确率的情况：

plt.plot(acc)
plt.xlabel("no. of epochs")
plt.ylabel("total accuracy")
plt.title("Accuracy")
plt.show()

将所有内容整合起来，以下是完整的代码：

import torch
import torchvision.transforms as transforms
from torchvision import datasets
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

# loading training data
train_dataset = datasets.MNIST(root='./data', 
                               train=True, 
                               transform=transforms.ToTensor(),
                               download=True)
# loading test data
test_dataset = datasets.MNIST(root='./data', 
                              train=False, 
                              transform=transforms.ToTensor())

print("number of training samples: " + str(len(train_dataset)) + "\n" +
      "number of testing samples: " + str(len(test_dataset)))
print("datatype of the 1st training sample: ", train_dataset[0][0].type())
print("size of the 1st training sample: ", train_dataset[0][0].size())

# check the label of first two training sample
print("label of the first taining sample: ", train_dataset[0][1])
print("label of the second taining sample: ", train_dataset[1][1])

img_5 = train_dataset[0][0].numpy().reshape(28, 28)
plt.imshow(img_5, cmap='gray')
plt.show()
img_0 = train_dataset[1][0].numpy().reshape(28, 28)
plt.imshow(img_0, cmap='gray')
plt.show()

# load train and test data samples into dataloader
batach_size = 32
train_loader = DataLoader(dataset=train_dataset, batch_size=batach_size, shuffle=True) 
test_loader = DataLoader(dataset=test_dataset, batch_size=batach_size, shuffle=False)

# build custom module for logistic regression
class LogisticRegression(torch.nn.Module):    
    # build the constructor
    def __init__(self, n_inputs, n_outputs):
        super().__init__()
        self.linear = torch.nn.Linear(n_inputs, n_outputs)
    # make predictions
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

# instantiate the model
n_inputs = 28*28 # makes a 1D vector of 784
n_outputs = 10
log_regr = LogisticRegression(n_inputs, n_outputs)

# defining the optimizer
optimizer = torch.optim.SGD(log_regr.parameters(), lr=0.001)
# defining Cross-Entropy loss
criterion = torch.nn.CrossEntropyLoss()

epochs = 50
Loss = []
acc = []
for epoch in range(epochs):
    for i, (images, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = log_regr(images.view(-1, 28*28))
        loss = criterion(outputs, labels)
        # Loss.append(loss.item())
        loss.backward()
        optimizer.step()
    Loss.append(loss.item())
    correct = 0
    for images, labels in test_loader:
        outputs = log_regr(images.view(-1, 28*28))
        _, predicted = torch.max(outputs.data, 1)
        correct += (predicted == labels).sum()
    accuracy = 100 * (correct.item()) / len(test_dataset)
    acc.append(accuracy)
    print('Epoch: {}. Loss: {}. Accuracy: {}'.format(epoch, loss.item(), accuracy))

plt.plot(Loss)
plt.xlabel("no. of epochs")
plt.ylabel("total loss")
plt.title("Loss")
plt.show()

plt.plot(acc)
plt.xlabel("no. of epochs")
plt.ylabel("total accuracy")
plt.title("Accuracy")
plt.show()

摘要

在本教程中，你学会了如何在 PyTorch 中构建多类逻辑回归分类器。特别是，你学到了。

如何在 PyTorch 中使用逻辑回归以及它如何应用于实际问题。
如何加载和分析 torchvision 数据集。
如何在图像数据集上构建和训练逻辑回归分类器。

Machine-Learning-Mastery-PyTorch-教程-一-