Machine-Learning-Mastery-PyTorch-教程-七-Machine Learning Maste

Machine Learning Mastery PyTorch 教程（七）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

在 PyTorch 中训练多目标多重线性回归模型

原文：machinelearningmastery.com/training-a-multi-target-multilinear-regression-model-in-pytorch/

多目标多重线性回归模型是一种机器学习模型，它以单个或多个特征作为输入，以进行多个预测。在我们之前的文章中，我们讨论了如何使用多重线性回归进行简单预测并生成多个输出。在这里，我们将构建我们的模型并在数据集上进行训练。

在这篇文章中，我们将生成一个数据集，并定义我们的模型，包括优化器和损失函数。然后，我们将训练我们的模型并可视化训练过程的结果。特别是，我们将解释：

如何在 PyTorch 中训练多目标多重线性回归模型。
如何生成简单的数据集并将其输入到模型中。
如何使用 PyTorch 中的内置包构建模型。
如何使用小批量梯度下降法训练模型并可视化结果。

启动你的项目，请阅读我的书 Deep Learning with PyTorch。它提供了自学教程和可运行的代码。

让我们开始吧！

在 PyTorch 中训练多目标多重线性回归模型。

概述

本教程分为四个部分，它们是

创建数据类
使用 nn.Module 构建模型
使用小批量梯度下降法进行训练
绘制进度图

创建数据类

我们需要数据来训练我们的模型。在 PyTorch 中，我们可以使用 Dataset 类。首先，我们将创建一个数据类，其中包括数据构造函数、返回数据样本的 __getitem__() 方法和允许我们检查数据长度的 __len__() 方法。我们在构造函数中基于线性模型生成数据。请注意，torch.mm() 用于矩阵乘法，张量的形状应设置为允许乘法。

import torch
from torch.utils.data import Dataset, DataLoader
torch.manual_seed(42)

# Creating the dataset class
class Data(Dataset):
    # Constructor
    def __init__(self):
        self.x = torch.zeros(40, 2)
        self.x[:, 0] = torch.arange(-2, 2, 0.1)
        self.x[:, 1] = torch.arange(-2, 2, 0.1)
        w = torch.tensor([[1.0, 2.0], [2.0, 4.0]])
        b = 1
        func = torch.mm(self.x, w) + b    
        self.y = func + 0.2 * torch.randn((self.x.shape[0],1))
        self.len = self.x.shape[0]
    # Getter
    def __getitem__(self, idx):          
        return self.x[idx], self.y[idx] 
    # getting data length
    def __len__(self):
        return self.len

然后，我们可以创建用于训练的数据集对象。

# Creating dataset object
data_set = Data()

想要开始使用 PyTorch 进行深度学习吗？

立即参加我的免费电子邮件速成课程（附带示例代码）。

点击注册并获得课程的免费 PDF 电子书版本。

使用 `nn.Module` 构建模型

PyTorch 的 nn.Module 包含了构建我们多重线性回归模型所需的所有方法和属性。这个包将帮助我们在系列的后续教程中构建更复杂的神经网络架构。

我们将使我们的模型类成为 nn.Module 包的子类，以便继承所有的功能。我们的模型将包括一个构造函数和一个 forward() 函数用于进行预测。

...
# Creating a custom Multiple Linear Regression Model
class MultipleLinearRegression(torch.nn.Module):
    # Constructor
    def __init__(self, input_dim, output_dim):
        super(MultipleLinearRegression, self).__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim)
    # Prediction
    def forward(self, x):
        y_pred = self.linear(x)
        return y_pred

由于我们需要处理多个输出，让我们创建一个具有两个输入和两个输出的模型对象。我们还将列出模型参数。

这是参数的样子，权重最初是随机化的。

...
# Creating the model object
MLR_model = MultipleLinearRegression(2,2)
print("The parameters: ", list(MLR_model.parameters()))

以下是输出的样子。

The parameters:  [Parameter containing:
tensor([[ 0.2236, -0.0123],
        [ 0.5534, -0.5024]], requires_grad=True), Parameter containing:
tensor([ 0.0445, -0.4826], requires_grad=True)]

我们将使用随机梯度下降来训练模型，学习率保持在 0.1。为了测量模型损失，我们将使用均方误差。

# defining the model optimizer
optimizer = torch.optim.SGD(MLR_model.parameters(), lr=0.1)
# defining the loss criterion
criterion = torch.nn.MSELoss()

PyTorch 有一个 DataLoader 类，可以让我们将数据输入到模型中。这不仅允许我们加载数据，还可以实时应用各种转换。在开始训练之前，让我们定义我们的 dataloader 对象并定义批量大小。

# Creating the dataloader
train_loader = DataLoader(dataset=data_set, batch_size=2)

用我的书 《深度学习与 PyTorch》 启动你的项目。它提供了自学教程和可运行的代码。

使用小批量梯度下降训练

一切准备就绪后，我们可以创建训练循环来训练模型。我们创建一个空列表来存储模型损失，并训练模型 20 个轮次。

# Train the model
losses = []
epochs = 20
for epoch in range(epochs):
    for x,y in train_loader:
        y_pred = MLR_model(x)
        loss = criterion(y_pred, y)
        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"epoch = {epoch}, loss = {loss}")
print("Done training!")

如果你运行这个，你应该会看到类似以下的输出：

epoch = 0, loss = 0.052659016102552414
epoch = 1, loss = 0.13005244731903076
epoch = 2, loss = 0.13508380949497223
epoch = 3, loss = 0.1353638768196106
epoch = 4, loss = 0.13537931442260742
epoch = 5, loss = 0.13537974655628204
epoch = 6, loss = 0.13537967205047607
epoch = 7, loss = 0.13538001477718353
epoch = 8, loss = 0.13537967205047607
epoch = 9, loss = 0.13537967205047607
epoch = 10, loss = 0.13538001477718353
epoch = 11, loss = 0.13537967205047607
epoch = 12, loss = 0.13537967205047607
epoch = 13, loss = 0.13538001477718353
epoch = 14, loss = 0.13537967205047607
epoch = 15, loss = 0.13537967205047607
epoch = 16, loss = 0.13538001477718353
epoch = 17, loss = 0.13537967205047607
epoch = 18, loss = 0.13537967205047607
epoch = 19, loss = 0.13538001477718353
Done training!

绘制进度

由于这是一个线性回归模型，训练应该很快。我们可以可视化模型损失在每个训练轮次后是如何减少的。

import matplotlib.pyplot as plt

plt.plot(losses)
plt.xlabel("no. of iterations")
plt.ylabel("total loss")
plt.show()

将所有内容整合在一起，以下是完整的代码。

import matplotlib.pyplot as plt
import torch
from torch.utils.data import Dataset, DataLoader
torch.manual_seed(42)

# Creating the dataset class
class Data(Dataset):
    # Constructor
    def __init__(self):
        self.x = torch.zeros(40, 2)
        self.x[:, 0] = torch.arange(-2, 2, 0.1)
        self.x[:, 1] = torch.arange(-2, 2, 0.1)
        w = torch.tensor([[1.0, 2.0], [2.0, 4.0]])
        b = 1
        func = torch.mm(self.x, w) + b    
        self.y = func + 0.2 * torch.randn((self.x.shape[0],1))
        self.len = self.x.shape[0]
    # Getter
    def __getitem__(self, idx):          
        return self.x[idx], self.y[idx] 
    # getting data length
    def __len__(self):
        return self.len

# Creating dataset object
data_set = Data()

# Creating a custom Multiple Linear Regression Model
class MultipleLinearRegression(torch.nn.Module):
    # Constructor
    def __init__(self, input_dim, output_dim):
        super(MultipleLinearRegression, self).__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim)
    # Prediction
    def forward(self, x):
        y_pred = self.linear(x)
        return y_pred

# Creating the model object
MLR_model = MultipleLinearRegression(2,2)
print("The parameters: ", list(MLR_model.parameters()))

# defining the model optimizer
optimizer = torch.optim.SGD(MLR_model.parameters(), lr=0.1)
# defining the loss criterion
criterion = torch.nn.MSELoss()

# Creating the dataloader
train_loader = DataLoader(dataset=data_set, batch_size=2)

# Train the model
losses = []
epochs = 20
for epoch in range(epochs):
    for x,y in train_loader:
        y_pred = MLR_model(x)
        loss = criterion(y_pred, y)
        losses.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"epoch = {epoch}, loss = {loss}")
print("Done training!")

# Plot the losses
plt.plot(losses)
plt.xlabel("no. of iterations")
plt.ylabel("total loss")
plt.show()

总结

在本教程中，你学习了在 PyTorch 中训练多目标多线性回归模型所需的步骤。特别是，你学到了：

如何在 PyTorch 中训练多目标多线性回归模型。
如何生成一个简单的数据集并将其输入模型。
如何使用 PyTorch 内置的包构建模型。
如何使用小批量梯度下降训练模型并可视化结果。

使用 DataLoader 和 Dataset 训练 PyTorch 模型

原文：machinelearningmastery.com/training-a-pytorch-model-with-dataloader-and-dataset/

当您构建和训练一个 PyTorch 深度学习模型时，可以通过几种不同的方式提供训练数据。最终，PyTorch 模型的工作方式类似于一个接受 PyTorch 张量并返回另一个张量的函数。您在如何获取输入张量方面有很大的自由度。可能最简单的方式是准备整个数据集的大张量，并在每个训练步骤中从中提取一个小批次。但是您会发现，使用DataLoader可以节省一些处理数据的代码行数。

在本篇文章中，您将了解如何在 PyTorch 中使用 Data 和 DataLoader。完成本文后，您将学会：

如何创建和使用 DataLoader 来训练您的 PyTorch 模型
如何使用 Data 类动态生成数据

用我的书Kick-start your project。它提供自学教程和工作代码。

让我们开始吧！

使用 DataLoader 和 Dataset 训练 PyTorch 模型

照片由Emmanuel Appiah提供。部分权利保留。

概览

本文分为三个部分；它们是：

什么是DataLoader？
在训练循环中使用DataLoader

什么是`DataLoader`？

要训练一个深度学习模型，您需要数据。通常数据作为数据集提供。在数据集中，有很多数据样本或实例。您可以要求模型一次处理一个样本，但通常您会让模型处理一个包含多个样本的批次。您可以通过在张量上使用切片语法从数据集中提取一个批次来创建一个批次。为了获得更高质量的训练，您可能还希望在每个 epoch 中对整个数据集进行洗牌，以确保整个训练循环中没有两个相同的批次。有时，您可能会引入数据增强来手动为数据引入更多的变化。这在与图像相关的任务中很常见，您可以随机倾斜或缩放图像，以从少数图像生成大量数据样本。

您可以想象需要编写大量代码来完成所有这些操作。但使用DataLoader会更加轻松。

以下是如何创建一个DataLoader并从中获取一个批次的示例。在此示例中，使用了sonar 数据集，并最终将其转换为 PyTorch 张量，传递给DataLoader：

import pandas as pd
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.preprocessing import LabelEncoder

# Read data, convert to NumPy arrays
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60].values
y = data.iloc[:, 60].values

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# convert into PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# create DataLoader, then take one batch
loader = DataLoader(list(zip(X,y)), shuffle=True, batch_size=16)
for X_batch, y_batch in loader:
    print(X_batch, y_batch)
    break

您可以从上面的输出中看到X_batch和y_batch是 PyTorch 张量。loader是DataLoader类的一个实例，可以像可迭代对象一样工作。每次从中读取时，您都会从原始数据集中获取一个特征和目标批次。

当你创建一个DataLoader实例时，你需要提供一个样本对列表。每个样本对是一个特征和相应目标的数据样本。需要使用列表，因为DataLoader期望使用len()来获取数据集的总大小，并使用数组索引来检索特定样本。批处理大小是DataLoader的一个参数，因此它知道如何从整个数据集创建批次。你几乎总是应该使用shuffle=True，这样每次加载数据时样本都会被打乱。这对训练很有用，因为在每个 epoch 中，你将读取每个批次一次。当你从一个 epoch 进入另一个 epoch 时，DataLoader会知道你已经耗尽了所有的批次，所以会重新洗牌，这样你就会得到新的样本组合。

想要用 PyTorch 开始深度学习吗？

现在参加我的免费电子邮件速成课程（附有示例代码）。

点击注册，还可以获得课程的免费 PDF 电子书版本。

在训练循环中使用`DataLoader`

下面是一个在训练循环中使用DataLoader的示例：

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split

# train-test split for evaluation of the model
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# set up DataLoader for training set
loader = DataLoader(list(zip(X_train, y_train)), shuffle=True, batch_size=16)

# create model
model = nn.Sequential(
    nn.Linear(60, 60),
    nn.ReLU(),
    nn.Linear(60, 30),
    nn.ReLU(),
    nn.Linear(30, 1),
    nn.Sigmoid()
)

# Train the model
n_epochs = 200
loss_fn = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
model.train()
for epoch in range(n_epochs):
    for X_batch, y_batch in loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# evaluate accuracy after training
model.eval()
y_pred = model(X_test)
acc = (y_pred.round() == y_test).float().mean()
acc = float(acc)
print("Model accuracy: %.2f%%" % (acc*100))

你可以看到一旦创建了DataLoader实例，训练循环就会变得更加简单。在上面的例子中，只有训练集被打包成了一个DataLoader，因为你需要按批次遍历它。你也可以为测试集创建一个DataLoader，并用它进行模型评估，但由于精度是针对整个测试集计算而不是按批次计算，因此DataLoader的好处并不显著。

将所有内容整合在一起，以下是完整的代码。

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

# Read data, convert to NumPy arrays
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60].values
y = data.iloc[:, 60].values

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y)

# convert into PyTorch tensors
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y, dtype=torch.float32).reshape(-1, 1)

# train-test split for evaluation of the model
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# set up DataLoader for training set
loader = DataLoader(list(zip(X_train, y_train)), shuffle=True, batch_size=16)

# create model
model = nn.Sequential(
    nn.Linear(60, 60),
    nn.ReLU(),
    nn.Linear(60, 30),
    nn.ReLU(),
    nn.Linear(30, 1),
    nn.Sigmoid()
)

# Train the model
n_epochs = 200
loss_fn = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
model.train()
for epoch in range(n_epochs):
    for X_batch, y_batch in loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# evaluate accuracy after training
model.eval()
y_pred = model(X_test)
acc = (y_pred.round() == y_test).float().mean()
acc = float(acc)
print("Model accuracy: %.2f%%" % (acc*100))

创建使用`Dataset`类的数据迭代器

在 PyTorch 中，有一个Dataset类，可以与DataLoader类紧密耦合。回想一下，DataLoader期望其第一个参数能够使用len()和数组索引。Dataset类是这一切的基类。你可能希望使用Dataset类的原因是在获取数据样本之前需要进行一些特殊处理。例如，数据可能需要从数据库或磁盘读取，并且你可能只想在内存中保留少量样本而不是预取所有内容。另一个例子是对数据进行实时预处理，例如图像任务中常见的随机增强。

要使用Dataset类，你只需从它继承并实现两个成员函数。以下是一个示例：

from torch.utils.data import Dataset

class SonarDataset(Dataset):
    def __init__(self, X, y):
        # convert into PyTorch tensors and remember them
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.float32)

    def __len__(self):
        # this should return the size of the dataset
        return len(self.X)

    def __getitem__(self, idx):
        # this should return one sample from the dataset
        features = self.X[idx]
        target = self.y[idx]
        return features, target

这并不是使用Dataset的最强大方式，但足够简单，可以演示其工作原理。有了这个，你可以创建一个DataLoader并用它进行模型训练。修改自前面的示例，你会得到以下内容：

...

# set up DataLoader for training set
dataset = SonarDataset(X_train, y_train)
loader = DataLoader(dataset, shuffle=True, batch_size=16)

# create model
model = nn.Sequential(
    nn.Linear(60, 60),
    nn.ReLU(),
    nn.Linear(60, 30),
    nn.ReLU(),
    nn.Linear(30, 1),
    nn.Sigmoid()
)

# Train the model
n_epochs = 200
loss_fn = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
model.train()
for epoch in range(n_epochs):
    for X_batch, y_batch in loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# evaluate accuracy after training
model.eval()
y_pred = model(torch.tensor(X_test, dtype=torch.float32))
y_test = torch.tensor(y_test, dtype=torch.float32)
acc = (y_pred.round() == y_test).float().mean()
acc = float(acc)
print("Model accuracy: %.2f%%" % (acc*100))

你将dataset设置为SonarDataset的一个实例，其中你实现了__len__()和__getitem__()函数。这在前面的示例中用于设置DataLoader实例的列表的位置。之后，在训练循环中一切都一样。请注意，在示例中，你仍然直接使用 PyTorch 张量来处理测试集。

在__getitem__()函数中，你传入一个像数组索引一样的整数，返回一对数据，即特征和目标。你可以在这个函数中实现任何操作：运行一些代码生成合成数据样本，从互联网动态读取数据，或者对数据添加随机变化。当你无法将整个数据集全部加载到内存中时，这个函数非常有用，因此你可以仅加载需要的数据样本。

实际上，由于你已创建了一个 PyTorch 数据集，你不需要使用 scikit-learn 来将数据分割成训练集和测试集。在torch.utils.data子模块中，你可以使用random_split()函数来与Dataset类一起实现相同的目的。以下是一个完整的示例：

import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, random_split, default_collate
from sklearn.preprocessing import LabelEncoder

# Read data, convert to NumPy arrays
data = pd.read_csv("sonar.csv", header=None)
X = data.iloc[:, 0:60].values
y = data.iloc[:, 60].values

# encode class values as integers
encoder = LabelEncoder()
encoder.fit(y)
y = encoder.transform(y).reshape(-1, 1)

class SonarDataset(Dataset):
    def __init__(self, X, y):
        # convert into PyTorch tensors and remember them
        self.X = torch.tensor(X, dtype=torch.float32)
        self.y = torch.tensor(y, dtype=torch.float32)

    def __len__(self):
        # this should return the size of the dataset
        return len(self.X)

    def __getitem__(self, idx):
        # this should return one sample from the dataset
        features = self.X[idx]
        target = self.y[idx]
        return features, target

# set up DataLoader for data set
dataset = SonarDataset(X, y)
trainset, testset = random_split(dataset, [0.7, 0.3])
loader = DataLoader(trainset, shuffle=True, batch_size=16)

# create model
model = nn.Sequential(
    nn.Linear(60, 60),
    nn.ReLU(),
    nn.Linear(60, 30),
    nn.ReLU(),
    nn.Linear(30, 1),
    nn.Sigmoid()
)

# Train the model
n_epochs = 200
loss_fn = nn.BCELoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
model.train()
for epoch in range(n_epochs):
    for X_batch, y_batch in loader:
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

# create one test tensor from the testset
X_test, y_test = default_collate(testset)
model.eval()
y_pred = model(X_test)
acc = (y_pred.round() == y_test).float().mean()
acc = float(acc)
print("Model accuracy: %.2f%%" % (acc*100))

这与你之前看到的例子非常相似。请注意，PyTorch 模型仍然需要张量作为输入，而不是Dataset。因此，在上述情况下，你需要使用default_collate()函数将数据集中的样本收集成张量。

进一步阅读

如果你希望深入了解此主题，本节提供了更多资源。

总结

在本文中，你学习了如何使用DataLoader创建打乱顺序的数据批次，以及如何使用Dataset提供数据样本。具体来说，你学会了：

DataLoader作为向训练循环提供数据批次的便捷方式
如何使用Dataset生成数据样本
如何结合Dataset和DataLoader以在模型训练中动态生成数据批次

在 PyTorch 中训练单输出多元线性回归模型

原文：machinelearningmastery.com/training-a-single-output-multilinear-regression-model-in-pytorch/

神经网络架构由数百个神经元构成，其中每个神经元接受多个输入，执行多元线性回归操作以进行预测。在之前的教程中，我们建立了一个单输出的多元线性回归模型，仅使用前向函数进行预测。

在本教程中，我们将向我们的单输出多元线性回归模型添加优化器，并执行反向传播以减少模型的损失。特别是，我们将演示：

如何在 PyTorch 中构建单输出多元线性回归模型。
如何利用 PyTorch 内置包创建复杂模型。
如何在 PyTorch 中使用小批量梯度下降训练单输出多元线性回归模型。

启动你的项目，请参阅我的书籍《深度学习与 PyTorch》。它提供了自学教程和实用代码。

让我们开始吧。

在 PyTorch 中训练单输出多元线性回归模型。

概述

本教程分为三个部分，它们是

为预测准备数据
使用 Linear 类进行多元线性回归
可视化结果

构建数据集类

和之前的教程一样，我们将创建一个样本数据集来进行实验。我们的数据类包括一个数据集构造函数、一个获取数据样本的__getitem__()方法，以及一个获取创建数据长度的__len__()函数。它的样子如下。

import torch
from torch.utils.data import Dataset

# Creating the dataset class
class Data(Dataset):
    # Constructor
    def __init__(self):
        self.x = torch.zeros(40, 2)
        self.x[:, 0] = torch.arange(-2, 2, 0.1)
        self.x[:, 1] = torch.arange(-2, 2, 0.1)
        self.w = torch.tensor([[1.0], [1.0]])
        self.b = 1
        self.func = torch.mm(self.x, self.w) + self.b    
        self.y = self.func + 0.2 * torch.randn((self.x.shape[0],1))
        self.len = self.x.shape[0]
    # Getter
    def __getitem__(self, index):          
        return self.x[index], self.y[index] 
    # getting data length
    def __len__(self):
        return self.len

有了这个，我们可以轻松创建数据集对象。

# Creating dataset object
data_set = Data()

想要开始使用 PyTorch 进行深度学习？

现在就参加我的免费电子邮件速成课程（附样例代码）。

点击注册并获得免费的 PDF 电子书版本。

构建模型类

现在我们有了数据集，让我们构建一个自定义的多元线性回归模型类。如在之前的教程中所讨论的，我们定义一个类并将其作为 nn.Module 的子类。因此，该类继承了所有方法和属性。

...
# Creating a custom Multiple Linear Regression Model
class MultipleLinearRegression(torch.nn.Module):
    # Constructor
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim)
    # Prediction
    def forward(self, x):
        y_pred = self.linear(x)
        return y_pred

我们将创建一个输入大小为 2 和输出大小为 1 的模型对象。此外，我们可以使用parameters()方法打印出所有模型参数。

...
# Creating the model object
MLR_model = MultipleLinearRegression(2,1)
print("The parameters: ", list(MLR_model.parameters()))

这是输出的样子。

The parameters:  [Parameter containing:
tensor([[ 0.2236, -0.0123]], requires_grad=True), Parameter containing:
tensor([0.5534], requires_grad=True)]

为了训练我们的多元线性回归模型，我们还需要定义优化器和损失标准。我们将使用随机梯度下降优化器和均方误差损失。学习率保持在 0.1。

# defining the model optimizer
optimizer = torch.optim.SGD(MLR_model.parameters(), lr=0.1)
# defining the loss criterion
criterion = torch.nn.MSELoss()

使用小批量梯度下降训练模型

在开始训练过程之前，让我们将数据加载到DataLoader中，并定义训练的批次大小。

from torch.utils.data import DataLoader

# Creating the dataloader
train_loader = DataLoader(dataset=data_set, batch_size=2)

我们将开始训练，并让过程持续 20 个周期，使用与我们之前教程相同的for-loop。

# Train the model
Loss = []
epochs = 20
for epoch in range(epochs):
    for x,y in train_loader:
        y_pred = MLR_model(x)
        loss = criterion(y_pred, y)
        Loss.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"epoch = {epoch}, loss = {loss}")
print("Done training!")

在上面的训练循环中，每个周期都会报告损失。你应该会看到类似以下的输出：

epoch = 0, loss = 0.06849382817745209
epoch = 1, loss = 0.07729718089103699
epoch = 2, loss = 0.0755983218550682
epoch = 3, loss = 0.07591515779495239
epoch = 4, loss = 0.07585576921701431
epoch = 5, loss = 0.07586675882339478
epoch = 6, loss = 0.07586495578289032
epoch = 7, loss = 0.07586520910263062
epoch = 8, loss = 0.07586534321308136
epoch = 9, loss = 0.07586508244276047
epoch = 10, loss = 0.07586508244276047
epoch = 11, loss = 0.07586508244276047
epoch = 12, loss = 0.07586508244276047
epoch = 13, loss = 0.07586508244276047
epoch = 14, loss = 0.07586508244276047
epoch = 15, loss = 0.07586508244276047
epoch = 16, loss = 0.07586508244276047
epoch = 17, loss = 0.07586508244276047
epoch = 18, loss = 0.07586508244276047
epoch = 19, loss = 0.07586508244276047
Done training!

这个训练循环在 PyTorch 中很典型。你将在未来的项目中频繁使用它。

绘制图表

最后，让我们绘制图表来可视化损失在训练过程中如何减少并收敛到某一点。

...
import matplotlib.pyplot as plt

# Plot the graph for epochs and loss
plt.plot(Loss)
plt.xlabel("Iterations ")
plt.ylabel("total loss ")
plt.show()

训练过程中的损失

将所有内容汇总，以下是完整的代码。

# Importing libraries and packages
import numpy as np
import torch
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader

torch.manual_seed(42)

# Creating the dataset class
class Data(Dataset):
    # Constructor
    def __init__(self):
        self.x = torch.zeros(40, 2)
        self.x[:, 0] = torch.arange(-2, 2, 0.1)
        self.x[:, 1] = torch.arange(-2, 2, 0.1)
        self.w = torch.tensor([[1.0], [1.0]])
        self.b = 1
        self.func = torch.mm(self.x, self.w) + self.b    
        self.y = self.func + 0.2 * torch.randn((self.x.shape[0],1))
        self.len = self.x.shape[0]
    # Getter
    def __getitem__(self, index):          
        return self.x[index], self.y[index] 
    # getting data length
    def __len__(self):
        return self.len

# Creating dataset object
data_set = Data()

# Creating a custom Multiple Linear Regression Model
class MultipleLinearRegression(torch.nn.Module):
    # Constructor
    def __init__(self, input_dim, output_dim):
        super().__init__()
        self.linear = torch.nn.Linear(input_dim, output_dim)
    # Prediction
    def forward(self, x):
        y_pred = self.linear(x)
        return y_pred

# Creating the model object
MLR_model = MultipleLinearRegression(2,1)
# defining the model optimizer
optimizer = torch.optim.SGD(MLR_model.parameters(), lr=0.1)
# defining the loss criterion
criterion = torch.nn.MSELoss()
# Creating the dataloader
train_loader = DataLoader(dataset=data_set, batch_size=2)

# Train the model
Loss = []
epochs = 20
for epoch in range(epochs):
    for x,y in train_loader:
        y_pred = MLR_model(x)
        loss = criterion(y_pred, y)
        Loss.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"epoch = {epoch}, loss = {loss}")
print("Done training!")

# Plot the graph for epochs and loss
plt.plot(Loss)
plt.xlabel("Iterations ")
plt.ylabel("total loss ")
plt.show()

总结

在本教程中，你学会了如何在 PyTorch 中构建一个单输出的多元线性回归模型。特别是，你学到了：

如何在 PyTorch 中构建一个单输出的多元线性回归模型。
如何使用 PyTorch 内置的包来创建复杂的模型。
如何使用迷你批量梯度下降在 PyTorch 中训练单输出的多元线性回归模型。

PyTorch 中的训练和验证数据

原文：machinelearningmastery.com/training-and-validation-data-in-pytorch/

训练数据是机器学习算法用来学习的数据集。它也称为训练集。验证数据是机器学习算法用来测试其准确性的一组数据。验证算法性能就是将预测输出与验证数据中的已知真实值进行比较。

训练数据通常很大且复杂，而验证数据通常较小。训练样本越多，模型的表现就会越好。例如，在垃圾邮件检测任务中，如果训练集中有 10 封垃圾邮件和 10 封非垃圾邮件，那么机器学习模型可能难以检测到新邮件中的垃圾邮件，因为没有足够的信息来判断垃圾邮件的样子。然而，如果我们有 1000 万封垃圾邮件和 1000 万封非垃圾邮件，那么我们的模型检测新垃圾邮件会容易得多，因为它已经见过了许多垃圾邮件的样子。

在本教程中，你将学习 PyTorch 中训练和验证数据的内容。我们还将演示训练和验证数据对机器学习模型的重要性，特别是神经网络。特别地，你将学习到：

PyTorch 中训练和验证数据的概念。
数据如何在 PyTorch 中划分为训练集和验证集。
如何使用 PyTorch 内置函数构建一个简单的线性回归模型。
如何使用不同的学习率来训练我们的模型以获得期望的准确性。
如何调整超参数以获得最佳的数据模型。

快速启动你的项目，请参阅我的书籍《用 PyTorch 深度学习》。它提供了自学教程和工作代码。

让我们开始吧！

使用 PyTorch 中的优化器。

图片由Markus Krisetya提供。部分版权保留。

概述

本教程分为三部分；它们是：

为训练和验证集构建数据类
构建和训练模型
可视化结果

为训练和验证集构建数据类

首先，我们加载一些本教程中需要的库。

import numpy as np
import matplotlib.pyplot as plt
import torch
from torch.utils.data import Dataset, DataLoader

我们将从构建一个自定义数据集类开始，以生成足够的合成数据。这将允许我们将数据拆分为训练集和验证集。此外，我们还将添加一些步骤将异常值包含到数据中。

# Creating our dataset class
class Build_Data(Dataset):
    # Constructor
    def __init__(self, train = True):
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        self.func = -5 * self.x + 1
        self.y = self.func + 0.4 * torch.randn(self.x.size())
        self.len = self.x.shape[0]
        # adding some outliers
        if train == True:
            self.y[10:12] = 0
            self.y[30:35] = 25
        else:
            pass                
    # Getting the data
    def __getitem__(self, index):    
        return self.x[index], self.y[index]    
    # Getting length of the data
    def __len__(self):
        return self.len

train_set = Build_Data()
val_set = Build_Data(train=False)

对于训练集，我们默认将train参数设置为True。如果设置为False，则会生成验证数据。我们将训练集和验证集创建为不同的对象。

现在，让我们可视化我们的数据。你会看到在 $x=-2$ 和 $x=0$ 的异常值。

# Plotting and visualizing the data points
plt.plot(train_set.x.numpy(), train_set.y.numpy(), 'b+', label='y')
plt.plot(train_set.x.numpy(), train_set.func.numpy(), 'r', label='func')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid('True', color='y')
plt.show()

训练和验证数据集

生成上述图表的完整代码如下。

import numpy as np
import matplotlib.pyplot as plt
import torch
from torch.utils.data import Dataset, DataLoader

# Creating our dataset class
class Build_Data(Dataset):
    # Constructor
    def __init__(self, train = True):
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        self.func = -5 * self.x + 1
        self.y = self.func + 0.4 * torch.randn(self.x.size())
        self.len = self.x.shape[0]
        # adding some outliers
        if train == True:
            self.y[10:12] = 0
            self.y[30:35] = 25
        else:
            pass
    # Getting the data
    def __getitem__(self, index):
        return self.x[index], self.y[index]
    # Getting length of the data
    def __len__(self):
        return self.len

train_set = Build_Data()
val_set = Build_Data(train=False)

# Plotting and visualizing the data points
plt.plot(train_set.x.numpy(), train_set.y.numpy(), 'b+', label='y')
plt.plot(train_set.x.numpy(), train_set.func.numpy(), 'r', label='func')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.grid('True', color='y')
plt.show()

构建和训练模型

PyTorch 中的 nn 包为我们提供了许多有用的函数。我们将从 nn 包中导入线性回归模型和损失准则。此外，我们还将从 torch.utils.data 包中导入 DataLoader。

...
model = torch.nn.Linear(1, 1)
criterion = torch.nn.MSELoss()
trainloader = DataLoader(dataset=train_set, batch_size=1)

我们将创建一个包含各种学习率的列表，以一次训练多个模型。这是深度学习从业者中的一种常见做法，他们调整不同的超参数以获得最佳模型。我们将训练和验证损失存储在张量中，并创建一个空列表 Models 来存储我们的模型。之后，我们将绘制图表来评估我们的模型。

...
learning_rates = [0.1, 0.01, 0.001, 0.0001]
train_err = torch.zeros(len(learning_rates))
val_err = torch.zeros(len(learning_rates))
Models = []

为了训练模型，我们将使用各种学习率与随机梯度下降（SGD）优化器。训练和验证数据的结果将与模型一起保存在列表中。我们将训练所有模型 20 个周期。

...
epochs = 20

# iterate through the list of various learning rates 
for i, learning_rate in enumerate(learning_rates):
    model = torch.nn.Linear(1, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)
    for epoch in range(epochs):
        for x, y in trainloader:
            y_hat = model(x)
            loss = criterion(y_hat, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    # training data
    Y_hat = model(train_set.x)
    train_loss = criterion(Y_hat, train_set.y)
    train_err[i] = train_loss.item()

    # validation data
    Y_hat = model(val_set.x)
    val_loss = criterion(Y_hat, val_set.y)
    val_err[i] = val_loss.item()
    Models.append(model)

上述代码分别收集训练和验证的损失。这帮助我们理解训练效果如何，例如是否过拟合。如果我们发现验证集的损失与训练集的损失差异很大，那么我们的训练模型未能对未见过的数据进行泛化，即验证集。

想要开始使用 PyTorch 进行深度学习吗？

立即获取我的免费电子邮件速成课程（附示例代码）。

点击注册并获取课程的免费 PDF Ebook 版本。

可视化结果

在上述代码中，我们使用相同的模型（线性回归）并在固定的训练周期下进行训练。唯一的变化是学习率。然后我们可以比较哪一个学习率在收敛速度上表现最佳。

让我们可视化每个学习率的训练和验证数据的损失图。通过查看图表，你可以观察到在学习率为 0.001 时损失最小，这意味着我们的模型在这个学习率下更快地收敛。

plt.semilogx(np.array(learning_rates), train_err.numpy(), label = 'total training loss')
plt.semilogx(np.array(learning_rates), val_err.numpy(), label = 'total validation loss')
plt.ylabel('Total Loss')
plt.xlabel('learning rate')
plt.legend()
plt.show()

损失 vs 学习率

让我们也绘制每个模型在验证数据上的预测结果。一个完全收敛的模型应能完美拟合数据，而一个尚未收敛的模型则会产生偏离数据的预测结果。

# plotting the predictions on validation data
for model, learning_rate in zip(Models, learning_rates):
    yhat = model(val_set.x)
    plt.plot(val_set.x.numpy(), yhat.detach().numpy(), label = 'learning rate:' + str(learning_rate))
plt.plot(val_set.x.numpy(), val_set.func.numpy(), 'or', label = 'validation data')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

我们看到的预测结果可视化如下：

正如你所见，绿色线更接近验证数据点。这是具有最佳学习率（0.001）的线。

以下是从创建数据到可视化训练和验证损失的完整代码。

import numpy as np
import matplotlib.pyplot as plt
import torch
from torch.utils.data import Dataset, DataLoader

# Creating our dataset class
class Build_Data(Dataset):
    # Constructor
    def __init__(self, train=True):
        self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
        self.func = -5 * self.x + 1
        self.y = self.func + 0.4 * torch.randn(self.x.size())
        self.len = self.x.shape[0]
        # adding some outliers
        if train == True:
            self.y[10:12] = 0
            self.y[30:35] = 25
        else:
            pass
    # Getting the data
    def __getitem__(self, index):
        return self.x[index], self.y[index]
    # Getting length of the data
    def __len__(self):
        return self.len

train_set = Build_Data()
val_set = Build_Data(train=False)

criterion = torch.nn.MSELoss()
trainloader = DataLoader(dataset=train_set, batch_size=1)

learning_rates = [0.1, 0.01, 0.001, 0.0001]
train_err = torch.zeros(len(learning_rates))
val_err = torch.zeros(len(learning_rates))
Models = []

epochs = 20

# iterate through the list of various learning rates 
for i, learning_rate in enumerate(learning_rates):
    model = torch.nn.Linear(1, 1)
    optimizer = torch.optim.SGD(model.parameters(), lr = learning_rate)
    for epoch in range(epochs):
        for x, y in trainloader:
            y_hat = model(x)
            loss = criterion(y_hat, y)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

    # training data
    Y_hat = model(train_set.x)
    train_loss = criterion(Y_hat, train_set.y)
    train_err[i] = train_loss.item()

    # validation data
    Y_hat = model(val_set.x)
    val_loss = criterion(Y_hat, val_set.y)
    val_err[i] = val_loss.item()
    Models.append(model)

plt.semilogx(np.array(learning_rates), train_err.numpy(), label = 'total training loss')
plt.semilogx(np.array(learning_rates), val_err.numpy(), label = 'total validation loss')
plt.ylabel('Total Loss')
plt.xlabel('learning rate')
plt.legend()
plt.show()

# plotting the predictions on validation data
for model, learning_rate in zip(Models, learning_rates):
    yhat = model(val_set.x)
    plt.plot(val_set.x.numpy(), yhat.detach().numpy(), label = 'learning rate:' + str(learning_rate))
plt.plot(val_set.x.numpy(), val_set.func.numpy(), 'or', label = 'validation data')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

总结

在本教程中，你学习了 PyTorch 中训练数据和验证数据的概念。特别是，你了解了：

PyTorch 中训练和验证数据的概念。
数据如何在 PyTorch 中被拆分为训练集和验证集。
如何使用 PyTorch 中的内置函数构建一个简单的线性回归模型。
如何使用不同的学习率来训练我们的模型，以获得期望的准确性。
如何调整超参数，以便为你的数据获得最佳模型。

在 PyTorch 中使用交叉熵损失训练逻辑回归

原文：machinelearningmastery.com/training-logistic-regression-with-cross-entropy-loss-in-pytorch/

在我们 PyTorch 系列的上一节中，我们演示了初始化不良的权重如何影响分类模型的准确性，尤其是当使用均方误差（MSE）损失时。我们注意到模型在训练过程中没有收敛，其准确性也显著下降。

接下来，你将看到如果随机初始化权重并使用交叉熵作为模型训练的损失函数会发生什么。这个损失函数更适合逻辑回归和其他分类问题。因此，今天大多数分类问题都使用交叉熵损失。

在本教程中，你将使用交叉熵损失训练逻辑回归模型，并对测试数据进行预测。特别地，你将学习：

如何在 PyTorch 中使用交叉熵损失训练逻辑回归模型。
交叉熵损失如何影响模型准确性。

通过我的书籍《用 PyTorch 进行深度学习》来 启动你的项目。这本书提供了自学教程和示例代码。

开始吧。

在 PyTorch 中使用交叉熵损失训练逻辑回归。

图片来源：Y K。保留部分权利。

概述

本教程分为三个部分；它们是：

数据准备与模型构建
使用交叉熵的模型训练
使用测试数据验证

数据准备与模型

就像之前的教程一样，你将构建一个类来获取数据集以进行实验。这个数据集将被拆分成训练样本和测试样本。测试样本是用于测量训练模型性能的未见数据。

首先，我们创建一个Dataset类：

import torch
from torch.utils.data import Dataset

# Creating the dataset class
class Data(Dataset):
    # Constructor
    def __init__(self):
        self.x = torch.arange(-2, 2, 0.1).view(-1, 1)
        self.y = torch.zeros(self.x.shape[0], 1)
        self.y[self.x[:, 0] > 0.2] = 1
        self.len = self.x.shape[0]
    # Getter
    def __getitem__(self, idx):          
        return self.x[idx], self.y[idx] 
    # getting data length
    def __len__(self):
        return self.len

然后，实例化数据集对象。

# Creating dataset object
data_set = Data()

接下来，你将为我们的逻辑回归模型构建一个自定义模块。它将基于 PyTorch 的nn.Module中的属性和方法。这个包允许我们为深度学习模型构建复杂的自定义模块，并使整个过程变得更简单。

该模块只包含一个线性层，如下所示：

# build custom module for logistic regression
class LogisticRegression(torch.nn.Module):    
    # build the constructor
    def __init__(self, n_inputs):
        super().__init__()
        self.linear = torch.nn.Linear(n_inputs, 1)
    # make predictions
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

让我们创建模型对象。

log_regr = LogisticRegression(1)

该模型应具有随机初始化的权重。你可以通过打印其状态来检查这一点：

print("checking parameters: ", log_regr.state_dict())

你可能会看到：

checking parameters:  OrderedDict([('linear.weight', tensor([[-0.0075]])), ('linear.bias', tensor([0.5364]))])

想开始使用 PyTorch 进行深度学习吗？

现在就参加我的免费电子邮件速成课程（包含示例代码）。

点击注册并获得课程的免费 PDF 电子书版。

使用交叉熵的模型训练

回想一下，当你在上一教程中使用这些参数值和 MSE 损失时，这个模型没有收敛。我们来看一下使用交叉熵损失时会发生什么。

由于你正在进行具有一个输出的逻辑回归，这是一个具有两个类别的分类问题。换句话说，这是一个二分类问题，因此我们使用二元交叉熵。你设置优化器和损失函数如下。

...
optimizer = torch.optim.SGD(log_regr.parameters(), lr=2)
# binary cross-entropy
criterion = torch.nn.BCELoss()

接下来，我们准备一个DataLoader并将模型训练 50 个周期。

# load data into the dataloader
train_loader = DataLoader(dataset=data_set, batch_size=2)
# Train the model
Loss = []
epochs = 50
for epoch in range(epochs):
    for x,y in train_loader:
        y_pred = log_regr(x)
        loss = criterion(y_pred, y)
        Loss.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"epoch = {epoch}, loss = {loss}")
print("Done!")

训练期间的输出会像下面这样：

checking weights:  OrderedDict([('linear.weight', tensor([[-5.]])), ('linear.bias', tensor([-10.]))])

如你所见，损失在训练过程中减少并收敛到最低值。我们也来绘制一下训练图表。

import matplotlib.pyplot as plt

plt.plot(Loss)
plt.xlabel("no. of iterations")
plt.ylabel("total loss")
plt.show()

你将看到以下内容：

使用测试数据验证

上述图表显示模型在训练数据上表现良好。最后，让我们检查一下模型在未见数据上的表现。

# get the model predictions on test data
y_pred = log_regr(data_set.x)
label = y_pred > 0.5 # setting the threshold between zero and one.
print("model accuracy on test data: ",
      torch.mean((label == data_set.y.type(torch.ByteTensor)).type(torch.float)))

这给出了

model accuracy on test data:  tensor(1.)

当模型在均方误差（MSE）损失上训练时，它的表现不佳。之前的准确率大约是 57%。但在这里，我们得到了完美的预测。这部分是因为模型简单，是一个单变量逻辑函数。部分是因为我们正确设置了训练。因此，交叉熵损失显著提高了模型在实验中表现的准确性，相比于 MSE 损失。

将所有内容放在一起，以下是完整的代码：

import matplotlib.pyplot as plt 
import torch
from torch.utils.data import Dataset, DataLoader
torch.manual_seed(0)

# Creating the dataset class
class Data(Dataset):
    def __init__(self):
        self.x = torch.arange(-2, 2, 0.1).view(-1, 1)
        self.y = torch.zeros(self.x.shape[0], 1)
        self.y[self.x[:, 0] > 0.2] = 1
        self.len = self.x.shape[0]

    def __getitem__(self, idx):          
        return self.x[idx], self.y[idx] 

    def __len__(self):
        return self.len

# building dataset object
data_set = Data()

# build custom module for logistic regression
class LogisticRegression(torch.nn.Module):    
    # build the constructor
    def __init__(self, n_inputs):
        super().__init__()
        self.linear = torch.nn.Linear(n_inputs, 1)
    # make predictions
    def forward(self, x):
        y_pred = torch.sigmoid(self.linear(x))
        return y_pred

log_regr = LogisticRegression(1)
print("checking parameters: ", log_regr.state_dict())

optimizer = torch.optim.SGD(log_regr.parameters(), lr=2)
# binary cross-entropy
criterion = torch.nn.BCELoss()

# load data into the dataloader
train_loader = DataLoader(dataset=data_set, batch_size=2)
# Train the model
Loss = []
epochs = 50
for epoch in range(epochs):
    for x,y in train_loader:
        y_pred = log_regr(x)
        loss = criterion(y_pred, y)
        Loss.append(loss.item())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()   
    print(f"epoch = {epoch}, loss = {loss}")
print("Done!")

plt.plot(Loss)
plt.xlabel("no. of iterations")
plt.ylabel("total loss")
plt.show()

# get the model predictions on test data
y_pred = log_regr(data_set.x)
label = y_pred > 0.5 # setting the threshold between zero and one.
print("model accuracy on test data: ",
      torch.mean((label == data_set.y.type(torch.ByteTensor)).type(torch.float)))

总结

在本教程中，你了解了交叉熵损失如何影响分类模型的性能。特别是，你学习了：

如何在 Pytorch 中使用交叉熵损失训练逻辑回归模型。
交叉熵损失如何影响模型准确性。

PyTorch 中的二维张量

原文：machinelearningmastery.com/two-dimensional-tensors-in-pytorch/

二维张量类似于二维度量。就像二维度量一样，二维张量也有 $n$ 行和列。

让我们以灰度图像为例，这是一个由数值构成的二维矩阵，通常称为像素。从 '0' 到 '255'，每个数字代表像素强度值。这里，最小强度数（即 '0'）代表图像中的黑色区域，而最高强度数（即 '255'）代表图像中的白色区域。使用 PyTorch 框架，这种二维图像或矩阵可以转换为二维张量。

在上一篇文章中，我们了解了 PyTorch 中的一维张量并应用了一些有用的张量操作。在本教程中，我们将使用 PyTorch 库将这些操作应用于二维张量。具体来说，我们将学习：

如何在 PyTorch 中创建二维张量并探索它们的类型和形状。
关于二维张量的切片和索引操作的详细信息。
要对张量应用多种方法，如张量加法、乘法等。

启动您的项目，使用我的书籍深度学习与 PyTorch。它提供了带有 工作代码 的 自学教程。

让我们开始吧！

PyTorch 中的二维张量

图片由 dylan dolte 拍摄。部分权利保留。

教程概览

本教程分为几部分，它们是：

二维张量的类型和形状
将二维张量转换为 NumPy 数组
将 pandas series 转换为二维张量
二维张量上的索引和切片操作
二维张量的操作

二维张量的类型和形状

让我们首先导入本教程中将要使用的几个必要库。

import torch
import numpy as np 
import pandas as pd

要检查二维张量的类型和形状，我们将使用来自 PyTorch 的相同方法，之前介绍过用于一维张量的方法。但是，它对于二维张量的工作方式应该是一样的吗？

让我们演示通过将整数的二维列表转换为二维张量对象。作为示例，我们将创建一个二维列表，并应用 torch.tensor() 进行转换。

example_2D_list = [[5, 10, 15, 20],
                   [25, 30, 35, 40],
                   [45, 50, 55, 60]]
list_to_tensor = torch.tensor(example_2D_list)
print("Our New 2D Tensor from 2D List is: ", list_to_tensor)

Our New 2D Tensor from 2D List is:  tensor([[ 5, 10, 15, 20],
        [25, 30, 35, 40],
        [45, 50, 55, 60]])

正如您所见，torch.tensor() 方法对于二维张量也非常有效。现在，让我们使用 shape()、size() 和 ndimension() 方法来返回张量对象的形状、大小和维度。

print("Getting the shape of tensor object: ", list_to_tensor.shape)
print("Getting the size of tensor object: ", list_to_tensor.size())
print("Getting the dimensions of tensor object: ", list_to_tensor.ndimension())

print("Getting the shape of tensor object: ", list_to_tensor.shape)
print("Getting the size of tensor object: ", list_to_tensor.size())
print("Getting the dimensions of tensor object: ", list_to_tensor.ndimension())

想要开始使用 PyTorch 进行深度学习吗？

立即注册我的免费电子邮件崩溃课程（附有示例代码）。

点击注册，还可以获取课程的免费 PDF 电子书版本。

将二维张量转换为 NumPy 数组

PyTorch 允许我们将二维张量转换为 NumPy 数组，然后再转换回张量。让我们看看如何操作。

# Converting two_D tensor to numpy array

twoD_tensor_to_numpy = list_to_tensor.numpy()
print("Converting two_Dimensional tensor to numpy array:")
print("Numpy array after conversion: ", twoD_tensor_to_numpy)
print("Data type after conversion: ", twoD_tensor_to_numpy.dtype)

print("***************************************************************")

# Converting numpy array back to a tensor

back_to_tensor = torch.from_numpy(twoD_tensor_to_numpy)
print("Converting numpy array back to two_Dimensional tensor:")
print("Tensor after conversion:", back_to_tensor)
print("Data type after conversion: ", back_to_tensor.dtype)

Converting two_Dimensional tensor to numpy array:
Numpy array after conversion:  [[ 5 10 15 20]
 [25 30 35 40]
 [45 50 55 60]]
Data type after conversion:  int64
***************************************************************
Converting numpy array back to two_Dimensional tensor:
Tensor after conversion: tensor([[ 5, 10, 15, 20],
        [25, 30, 35, 40],
        [45, 50, 55, 60]])
Data type after conversion:  torch.int64

将 Pandas Series 转换为二维张量

同样地，我们也可以将 pandas DataFrame 转换为张量。与一维张量类似，我们将使用相同的步骤进行转换。使用values属性获取 NumPy 数组，然后使用torch.from_numpy将 pandas DataFrame 转换为张量。

这是我们将如何执行此操作。

# Converting Pandas Dataframe to a Tensor

dataframe = pd.DataFrame({'x':[22,24,26],'y':[42,52,62]})

print("Pandas to numpy conversion: ", dataframe.values)
print("Data type before tensor conversion: ", dataframe.values.dtype)

print("***********************************************")

pandas_to_tensor = torch.from_numpy(dataframe.values)
print("Getting new tensor: ", pandas_to_tensor)
print("Data type after conversion to tensor: ", pandas_to_tensor.dtype)

Pandas to numpy conversion:  [[22 42]
 [24 52]
 [26 62]]
Data type before tensor conversion:  int64
***********************************************
Getting new tensor:  tensor([[22, 42],
        [24, 52],
        [26, 62]])
Data type after conversion to tensor:  torch.int64

二维张量的索引和切片操作

对于索引操作，可以使用方括号访问张量对象中的不同元素。只需将对应的索引放入方括号中，即可访问张量中所需的元素。

在下面的例子中，我们将创建一个张量，并使用两种不同的方法访问某些元素。请注意，索引值应始终比二维张量中元素实际位置少一个。

example_tensor = torch.tensor([[10, 20, 30, 40],
                               [50, 60, 70, 80],
                               [90, 100, 110, 120]])
print("Accessing element in 2nd row and 2nd column: ", example_tensor[1, 1])
print("Accessing element in 2nd row and 2nd column: ", example_tensor[1][1])

print("********************************************************")

print("Accessing element in 3rd row and 4th column: ", example_tensor[2, 3])
print("Accessing element in 3rd row and 4th column: ", example_tensor[2][3])

Accessing element in 2nd row and 2nd column:  tensor(60)
Accessing element in 2nd row and 2nd column:  tensor(60)
********************************************************
Accessing element in 3rd row and 4th column:  tensor(120)
Accessing element in 3rd row and 4th column:  tensor(120)

当我们需要同时访问两个或更多元素时，我们需要使用张量切片。让我们使用之前的例子来访问第二行的前两个元素和第三行的前三个元素。

example_tensor = torch.tensor([[10, 20, 30, 40],
                               [50, 60, 70, 80],
                               [90, 100, 110, 120]])
print("Accessing first two elements of the second row: ", example_tensor[1, 0:2])
print("Accessing first two elements of the second row: ", example_tensor[1][0:2])

print("********************************************************")

print("Accessing first three elements of the third row: ", example_tensor[2, 0:3])
print("Accessing first three elements of the third row: ", example_tensor[2][0:3])

example_tensor = torch.tensor([[10, 20, 30, 40],
                               [50, 60, 70, 80],
                               [90, 100, 110, 120]])
print("Accessing first two elements of the second row: ", example_tensor[1, 0:2])
print("Accessing first two elements of the second row: ", example_tensor[1][0:2])

print("********************************************************")

print("Accessing first three elements of the third row: ", example_tensor[2, 0:3])
print("Accessing first three elements of the third row: ", example_tensor[2][0:3])

二维张量的操作

在使用 PyTorch 框架处理二维张量时，有许多操作可以进行。在这里，我们将介绍张量加法、标量乘法和矩阵乘法。

二维张量的加法

将两个张量相加类似于矩阵加法。这是一个非常直接的过程，您只需使用加号（+）运算符即可执行操作。让我们在下面的例子中相加两个张量。

A = torch.tensor([[5, 10],
                  [50, 60], 
                  [100, 200]]) 
B = torch.tensor([[10, 20], 
                  [60, 70], 
                  [200, 300]])
add = A + B
print("Adding A and B to get: ", add)

Adding A and B to get:  tensor([[ 15,  30],
        [110, 130],
        [300, 500]])

二维张量的标量和矩阵乘法

二维张量的标量乘法与矩阵中的标量乘法相同。例如，通过与标量（例如 4）相乘，您将对张量中的每个元素乘以 4。

new_tensor = torch.tensor([[1, 2, 3], 
                           [4, 5, 6]]) 
mul_scalar = 4 * new_tensor
print("result of scalar multiplication: ", mul_scalar)

result of scalar multiplication:  tensor([[ 4,  8, 12],
        [16, 20, 24]])

关于二维张量的乘法，torch.mm()在 PyTorch 中为我们简化了操作。与线性代数中的矩阵乘法类似，张量对象 A（即 2×3）的列数必须等于张量对象 B（即 3×2）的行数。

A = torch.tensor([[3, 2, 1], 
                  [1, 2, 1]])
B = torch.tensor([[3, 2], 
                  [1, 1], 
                  [2, 1]])
A_mult_B = torch.mm(A, B)
print("multiplying A with B: ", A_mult_B)

multiplying A with B:  tensor([[13,  9],
        [ 7,  5]])

进一步阅读

PyTorch 与 TensorFlow 同时开发，直到 TensorFlow 在其 2.x 版本中采用了 Keras 之前，PyTorch 的语法更为简单。要学习 PyTorch 的基础知识，您可以阅读 PyTorch 教程：

pytorch.org/tutorials/

特别是 PyTorch 张量的基础知识可以在张量教程页面找到：

pytorch.org/tutorials/beginner/basics/tensorqs_tutorial.html

对于 PyTorch 的入门者来说，也有不少适合的书籍。应推荐较新出版的书籍，因为工具和语法在积极地发展变化。一个例子是

《深度学习与 PyTorch》由 Eli Stevens、Luca Antiga 和 Thomas Viehmann 编写，2020 年出版。

《深度学习与 PyTorch》书籍链接

总结

在本教程中，您了解了 PyTorch 中的二维张量。

具体来说，您学到了：

如何在 PyTorch 中创建二维张量并探索它们的类型和形状。
关于二维张量的切片和索引操作的详细信息。
应用多种方法对张量进行操作，如张量的加法、乘法等。

通过可视化指标了解训练期间的模型行为

原文：machinelearningmastery.com/understand-model-behavior-during-training-by-visualizing-metrics/

通过观察神经网络和深度学习模型在训练期间的性能变化，您可以学到很多。例如，如果您发现训练精度随着训练轮数变差，您就知道优化存在问题。可能是学习率过快。在本文中，您将了解如何在训练过程中查看和可视化 PyTorch 模型的性能。完成本文后，您将了解：

在训练期间收集哪些指标
如何绘制训练和验证数据集上的指标
如何解释图表以了解模型和训练进展

启动您的项目，使用我的书籍《使用 PyTorch 进行深度学习》。它提供了自学教程和可工作代码。

让我们开始吧！

通过可视化指标了解训练期间的模型行为

照片由Alison Pang提供。部分权利保留。

概述

这一章分为两部分；它们是：

从训练循环中收集指标
绘制训练历史

从训练循环中收集指标

在深度学习中，使用梯度下降算法训练模型意味着进行前向传递，使用模型和损失函数推断输入的损失指标，然后进行反向传递以计算从损失指标得出的梯度，并且更新过程应用梯度以更新模型参数。虽然这些是你必须采取的基本步骤，但你可以在整个过程中做更多事情来收集额外的信息。

正确训练的模型应该期望损失指标减少，因为损失是要优化的目标。应该根据问题使用的损失指标来决定。

对于回归问题，模型预测与实际值越接近越好。因此，您希望跟踪均方误差（MSE）、有时是均方根误差（RMSE）、平均绝对误差（MAE）或平均绝对百分比误差（MAPE）。虽然这些不被用作损失指标，但您可能还对模型产生的最大误差感兴趣。

对于分类问题，通常损失指标是交叉熵。但是交叉熵的值并不直观。因此，您可能还希望跟踪预测准确率、真正例率、精确度、召回率、F1 分数等。

从训练循环中收集这些指标是微不足道的。让我们从使用 PyTorch 和加利福尼亚房屋数据集的基本回归示例开始：

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read data
data = fetch_california_housing()
X, y = data.data, data.target

# train-test split for model evaluation
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model
model = nn.Sequential(
    nn.Linear(8, 24),
    nn.ReLU(),
    nn.Linear(24, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function and optimizer
loss_fn = nn.MSELoss()  # mean square error
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100   # number of epochs to run
batch_size = 32  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

for epoch in range(n_epochs):
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()

此实现方法虽然原始，但在过程中你得到了每一步的loss作为张量，这为优化器提供了改进模型的提示。要了解训练的进展，当然可以在每一步打印这个损失度量。但你也可以保存这个值，这样稍后可以进行可视化。在这样做时，请注意不要保存张量，而只保存它的值。这是因为这里的 PyTorch 张量记得它是如何得到它的值的，所以可以进行自动微分。这些额外的数据占用了内存，但你并不需要它们。

因此，你可以修改训练循环如下：

mse_history = []

for epoch in range(n_epochs):
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        mse_history.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()

在训练模型时，你应该使用与训练集分离的测试集来评估它。通常在一个时期内进行一次，即在该时期的所有训练步骤之后。测试结果也可以保存以便稍后进行可视化。事实上，如果需要，你可以从测试集获得多个指标。因此，你可以添加到训练循环中如下：

mae_fn = nn.L1Loss()  # create a function to compute MAE
train_mse_history = []
test_mse_history = []
test_mae_history = []

for epoch in range(n_epochs):
    model.train()
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        train_mse_history.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()
    # validate model on test set
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

你可以定义自己的函数来计算指标，或者使用已经在 PyTorch 库中实现的函数。在评估时将模型切换到评估模式是一个好习惯。在no_grad()上下文中运行评估也是一个好习惯，这样你明确告诉 PyTorch 你没有打算在张量上运行自动微分。

然而，上述代码存在问题：训练集的 MSE 是基于一个批次计算一次训练步骤，而测试集的指标是基于整个测试集每个时期计算一次。它们不是直接可比较的。事实上，如果你查看训练步骤的 MSE，你会发现它非常嘈杂。更好的方法是将同一时期的 MSE 总结为一个数字（例如，它们的平均值），这样你可以与测试集的数据进行比较。

进行这些更改后，以下是完整的代码：

import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read data
data = fetch_california_housing()
X, y = data.data, data.target

# train-test split for model evaluation
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model
model = nn.Sequential(
    nn.Linear(8, 24),
    nn.ReLU(),
    nn.Linear(24, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function, metrics, and optimizer
loss_fn = nn.MSELoss()  # mean square error
mae_fn = nn.L1Loss()  # mean absolute error
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100   # number of epochs to run
batch_size = 32  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

train_mse_history = []
test_mse_history = []
test_mae_history = []

for epoch in range(n_epochs):
    model.train()
    epoch_mse = []
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        epoch_mse.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()
    mean_mse = sum(epoch_mse) / len(epoch_mse)
    train_mse_history.append(mean_mse)
    # validate model on test set
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

想要开始使用 PyTorch 进行深度学习吗？

现在就参加我的免费电子邮件速成课程（附带示例代码）。

点击注册并获取课程的免费 PDF 电子书版本。

绘制训练历史

在上面的代码中，你在 Python 列表中收集了每个时期的指标。因此，使用 matplotlib 将它们绘制成折线图是很简单的。下面是一个示例：

import matplotlib.pyplot as plt
import numpy as np

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")
plt.plot(np.sqrt(test_mse_history), label="Test RMSE")
plt.plot(test_mae_history, label="Test MAE")
plt.xlabel("epochs")
plt.legend()
plt.show()

它绘制了例如以下内容：

这样的图表可以提供关于模型训练的有用信息，例如：

它在时期间的收敛速度（斜率）
模型是否已经收敛（线的平台期）
模型是否在过度学习训练数据（验证线的拐点）

在如上回归示例中，如果模型变得更好，MAE 和 MSE 指标应该都下降。然而，在分类示例中，准确率指标应该增加，而交叉熵损失应该随着更多训练的进行而减少。这是你在图中期望看到的结果。

这些曲线最终应该平稳，意味着你无法根据当前数据集、模型设计和算法进一步改进模型。你希望这一点尽快发生，以便你的模型收敛更快，使训练更高效。你还希望指标在高准确率或低损失区域平稳，以便模型在预测中有效。

另一个需要关注的属性是训练和验证的指标差异。在上图中，你看到训练集的 RMSE 在开始时高于测试集的 RMSE，但很快曲线交叉，最后测试集的 RMSE 更高。这是预期的，因为最终模型会更好地拟合训练集，但测试集可以预测模型在未来未见数据上的表现。

你需要谨慎地在微观尺度上解释曲线或指标。在上图中，你会看到训练集的 RMSE 在第 0 轮时与测试集的 RMSE 相比极大。它们的差异可能并不那么显著，但由于你在第一个训练轮次中通过计算每个步骤的 MSE 收集了训练集的 RMSE，你的模型可能在前几个步骤表现不好，但在训练轮次的最后几个步骤表现更好。在所有步骤上取平均可能不是一个公平的比较，因为测试集的 MSE 基于最后一步后的模型。

如果你看到训练集的指标远好于测试集，那么你的模型是过拟合的。这可能提示你应该在较早的训练轮次停止训练，或者模型设计需要一些正则化，例如 dropout 层。

在上图中，虽然你收集了回归问题的均方误差（MSE），但你绘制的是均方根误差（RMSE），以便你可以与均值绝对误差（MAE）在相同的尺度上进行比较。你可能还应该收集训练集的 MAE。这两个 MAE 曲线应该与 RMSE 曲线的行为类似。

将所有内容汇总，以下是完整的代码：

import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Read data
data = fetch_california_housing()
X, y = data.data, data.target

# train-test split for model evaluation
X_train_raw, X_test_raw, y_train, y_test = train_test_split(X, y, train_size=0.7, shuffle=True)

# Standardizing data
scaler = StandardScaler()
scaler.fit(X_train_raw)
X_train = scaler.transform(X_train_raw)
X_test = scaler.transform(X_test_raw)

# Convert to 2D PyTorch tensors
X_train = torch.tensor(X_train, dtype=torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
X_test = torch.tensor(X_test, dtype=torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

# Define the model
model = nn.Sequential(
    nn.Linear(8, 24),
    nn.ReLU(),
    nn.Linear(24, 12),
    nn.ReLU(),
    nn.Linear(12, 6),
    nn.ReLU(),
    nn.Linear(6, 1)
)

# loss function, metrics, and optimizer
loss_fn = nn.MSELoss()  # mean square error
mae_fn = nn.L1Loss()  # mean absolute error
optimizer = optim.Adam(model.parameters(), lr=0.001)

n_epochs = 100   # number of epochs to run
batch_size = 32  # size of each batch
batch_start = torch.arange(0, len(X_train), batch_size)

train_mse_history = []
test_mse_history = []
test_mae_history = []

for epoch in range(n_epochs):
    model.train()
    epoch_mse = []
    for start in batch_start:
        # take a batch
        X_batch = X_train[start:start+batch_size]
        y_batch = y_train[start:start+batch_size]
        # forward pass
        y_pred = model(X_batch)
        loss = loss_fn(y_pred, y_batch)
        epoch_mse.append(float(loss))
        # backward pass
        optimizer.zero_grad()
        loss.backward()
        # update weights
        optimizer.step()
    mean_mse = sum(epoch_mse) / len(epoch_mse)
    train_mse_history.append(mean_mse)
    # validate model on test set
    model.eval()
    with torch.no_grad():
        y_pred = model(X_test)
        mse = loss_fn(y_pred, y_test)
        mae = mae_fn(y_pred, y_test)
        test_mse_history.append(float(mse))
        test_mae_history.append(float(mae))

plt.plot(np.sqrt(train_mse_history), label="Train RMSE")
plt.plot(np.sqrt(test_mse_history), label="Test RMSE")
plt.plot(test_mae_history, label="Test MAE")
plt.xlabel("epochs")
plt.legend()
plt.show()

进一步阅读

本节提供了更多资源，供你深入了解该主题。

APIs

nn.L1Loss 来自 PyTorch 文档
nn.MSELoss 来自 PyTorch 文档

总结

在本章中，你发现了在训练深度学习模型时收集和审查指标的重要性。你学到了：

模型训练过程中应关注哪些指标
如何在 PyTorch 训练循环中计算和收集指标
如何从训练循环中可视化指标
如何解读指标以推断有关训练经验的详细信息

在深度学习模型中使用激活函数

原文：machinelearningmastery.com/using-activation-functions-in-deep-learning-models/

最简单形式的深度学习模型是层叠的感知机。如果没有激活函数，它们只是矩阵乘法，无论有多少层，其功能都很有限。激活函数的神奇之处在于神经网络能够近似各种非线性函数。在 PyTorch 中，有许多激活函数可以用于你的深度学习模型。在这篇文章中，你将看到激活函数的选择如何影响模型。具体来说，

常见的激活函数有哪些
激活函数的性质是什么
不同激活函数对学习率的影响
激活函数的选择如何解决梯度消失问题

通过我的书 《深度学习与 PyTorch》 启动你的项目。它提供了 自学教程 和 有效代码。

让我们开始吧！

在深度学习模型中使用激活函数

概述

本文分为三个部分；它们是

二元分类的玩具模型
为什么需要非线性函数？
激活函数的作用

二元分类的玩具模型

让我们从一个简单的二元分类示例开始。这里你将使用 make_circle() 函数从 scikit-learn 创建一个用于二元分类的合成数据集。该数据集有两个特征：点的 x 和 y 坐标。每个点属于两个类别之一。你可以生成 1000 个数据点并将其可视化如下：

from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim

# Make data: Two circles on x-y plane as a classification problem
X, y = make_circles(n_samples=1000, factor=0.5, noise=0.1)
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y.reshape(-1, 1), dtype=torch.float32)

plt.figure(figsize=(8,6))
plt.scatter(X[:,0], X[:,1], c=y)
plt.show()

数据集的可视化如下：

这个数据集很特别，因为它简单但不可线性分离：不可能找到一条直线来分隔两个类别。如何让你的神经网络识别出两个类别之间的圆形边界是一个挑战。

让我们为这个问题创建一个深度学习模型。为了简化起见，你不进行交叉验证。你可能会发现神经网络对数据过拟合，但这不会影响下面的讨论。该模型有 4 个隐藏层，输出层为二元分类提供一个 sigmoid 值（0 到 1）。模型在构造函数中接受一个参数来指定在隐藏层中使用的激活函数。你将训练循环实现为一个函数，因为你会运行这个函数多次。

实现如下：

class Model(nn.Module):
    def __init__(self, activation=nn.ReLU):
        super().__init__()
        self.layer0 = nn.Linear(2,5)
        self.act0 = activation()
        self.layer1 = nn.Linear(5,5)
        self.act1 = activation()
        self.layer2 = nn.Linear(5,5)
        self.act2 = activation()
        self.layer3 = nn.Linear(5,5)
        self.act3 = activation()
        self.layer4 = nn.Linear(5,1)
        self.act4 = nn.Sigmoid()

    def forward(self, x):
        x = self.act0(self.layer0(x))
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.act4(self.layer4(x))
        return x

def train_loop(model, X, y, n_epochs=300, batch_size=32):
    loss_fn = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)
    batch_start = torch.arange(0, len(X), batch_size)

    bce_hist = []
    acc_hist = []

    for epoch in range(n_epochs):
        # train model with optimizer
        model.train()
        for start in batch_start:
            X_batch = X[start:start+batch_size]
            y_batch = y[start:start+batch_size]
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
        # evaluate BCE and accuracy at end of each epoch
        model.eval()
        with torch.no_grad():
            y_pred = model(X)
            bce = float(loss_fn(y_pred, y))
            acc = float((y_pred.round() == y).float().mean())
        bce_hist.append(bce)
        acc_hist.append(acc)
        # print metrics every 10 epochs
        if (epoch+1) % 10 == 0:
            print("Before epoch %d: BCE=%.4f, Accuracy=%.2f%%" % (epoch+1, bce, acc*100))
    return bce_hist, acc_hist

在每次训练周期结束时，你会使用整个数据集来评估模型。训练完成后会返回评估结果。接下来，你将创建一个模型，训练它，并绘制训练历史。你使用的激活函数是修正线性单元或 ReLU，这是目前最常见的激活函数：

activation = nn.ReLU
model = Model(activation=activation)
bce_hist, acc_hist = train_loop(model, X, y)
plt.plot(bce_hist, label="BCE")
plt.plot(acc_hist, label="Accuracy")
plt.xlabel("Epochs")
plt.ylim(0, 1)
plt.title(str(activation))
plt.legend()
plt.show()

运行这个会给你以下结果：

Before epoch 10: BCE=0.7025, Accuracy=50.00%
Before epoch 20: BCE=0.6990, Accuracy=50.00%
Before epoch 30: BCE=0.6959, Accuracy=50.00%
...
Before epoch 280: BCE=0.3051, Accuracy=96.30%
Before epoch 290: BCE=0.2785, Accuracy=96.90%
Before epoch 300: BCE=0.2543, Accuracy=97.00%

以及这个图：

这个模型表现很好。经过 300 个周期后，它可以达到 90% 的准确率。然而，ReLU 并不是唯一的激活函数。从历史上看，sigmoid 函数和双曲正切函数在神经网络文献中很常见。如果你感兴趣，下面是如何使用 matplotlib 比较这三种激活函数：

x = torch.linspace(-4, 4, 200)
relu = nn.ReLU()(x)
tanh = nn.Tanh()(x)
sigmoid = nn.Sigmoid()(x)

plt.plot(x, sigmoid, label="sigmoid")
plt.plot(x, tanh, label="tanh")
plt.plot(x, relu, label="ReLU")
plt.ylim(-1.5, 2)
plt.legend()
plt.show()

ReLU 被称为修正线性单元，因为它在正数 $x$ 时是线性函数 $y=x$ ，而在 $x$ 为负数时保持为零。从数学上讲，它是 $y=\max(0, x)$ 。双曲正切函数 ( $y=\tanh(x)=\dfrac{e^x – e^{-x}}{e^x+e^{-x}}$ ) 平滑地从 -1 过渡到 +1，而 sigmoid 函数 ( $y=\sigma(x)=\dfrac{1}{1+e^{-x}}$ ) 从 0 过渡到 +1。

如果你尝试对这些函数进行微分，你会发现 ReLU 是最简单的：正区域的梯度是 1，其余为 0。双曲正切函数的斜率更陡，因此它的梯度大于 sigmoid 函数的梯度。

所有这些函数都是递增的。因此，它们的梯度永远不会为负数。这是激活函数在神经网络中适用的标准之一。

想开始使用 PyTorch 深度学习吗？

立即获取我的免费电子邮件速成课程（附示例代码）。

点击注册并获得课程的免费 PDF 电子书版本。

为什么选择非线性函数？

你可能会想，为什么对非线性激活函数如此 hype？或者，为什么我们不能在前一层的加权线性组合后直接使用一个恒等函数？使用多个线性层基本上与使用单个线性层是一样的。通过一个简单的例子可以看出。假设你有一个隐藏层神经网络，每层有两个隐藏神经元。

如果你使用线性隐藏层，你可以将输出层重写为原始输入变量的线性组合。如果有更多的神经元和权重，方程式会更长，包含更多嵌套和层间权重的乘法。然而，基本思想仍然相同：你可以将整个网络表示为一个线性层。为了使网络能够表示更复杂的函数，你需要非线性激活函数。

激活函数的效果

为了说明激活函数对模型的影响，让我们修改训练循环函数以捕获更多数据：每个训练步骤中的梯度。你的模型有四个隐藏层和一个输出层。在每一步中，反向传播计算每一层权重的梯度，优化器根据反向传播的结果更新权重。你应该观察训练进展中梯度的变化。因此，训练循环函数被修改为收集每一步每一层的平均绝对值，如下所示：

def train_loop(model, X, y, n_epochs=300, batch_size=32):
    loss_fn = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)
    batch_start = torch.arange(0, len(X), batch_size)

    bce_hist = []
    acc_hist = []
    grad_hist = [[],[],[],[],[]]

    for epoch in range(n_epochs):
        # train model with optimizer
        model.train()
        layer_grad = [[],[],[],[],[]]
        for start in batch_start:
            X_batch = X[start:start+batch_size]
            y_batch = y[start:start+batch_size]
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            # collect mean absolute value of gradients
            layers = [model.layer0, model.layer1, model.layer2, model.layer3, model.layer4]
            for n,layer in enumerate(layers):
                mean_grad = float(layer.weight.grad.abs().mean())
                layer_grad[n].append(mean_grad)
        # evaluate BCE and accuracy at end of each epoch
        model.eval()
        with torch.no_grad():
            y_pred = model(X)
            bce = float(loss_fn(y_pred, y))
            acc = float((y_pred.round() == y).float().mean())
        bce_hist.append(bce)
        acc_hist.append(acc)
        for n, grads in enumerate(layer_grad):
            grad_hist[n].append(sum(grads)/len(grads))
        # print metrics every 10 epochs
        if epoch % 10 == 9:
            print("Epoch %d: BCE=%.4f, Accuracy=%.2f%%" % (epoch, bce, acc*100))
    return bce_hist, acc_hist, layer_grad

在内层 for 循环结束时，通过先前的反向过程计算层权重的梯度，你可以通过model.layer0.weight.grad访问梯度。像权重一样，梯度也是张量。你取每个元素的绝对值，然后计算所有元素的均值。这个值依赖于批次，可能会非常嘈杂。因此，你总结所有这样的均值绝对值，并在同一周期结束时进行汇总。

请注意，你的神经网络中有五层（包括隐藏层和输出层）。因此，如果你可视化它们，你可以看到每层梯度在周期中的模式。下面，你运行与之前相同的训练循环，并绘制交叉熵、准确率以及每层的平均绝对梯度：

activation = nn.ReLU
model = Model(activation=activation)
bce_hist, acc_hist, grad_hist = train_loop(model, X, y)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].plot(bce_hist, label="BCE")
ax[0].plot(acc_hist, label="Accuracy")
ax[0].set_xlabel("Epochs")
ax[0].set_ylim(0, 1)
for n, grads in enumerate(grad_hist):
    ax[1].plot(grads, label="layer"+str(n))
ax[1].set_xlabel("Epochs")
fig.suptitle(str(activation))
ax[0].legend()
ax[1].legend()
plt.show()

运行上述代码会产生以下图表：

在上图中，你可以看到准确率如何提高以及交叉熵损失如何减少。同时，你可以看到每一层的梯度在类似范围内波动，特别是你应该关注与第一层和最后一层对应的线。这种行为是理想的。

让我们用 sigmoid 激活函数重复同样的操作：

activation = nn.Sigmoid
model = Model(activation=activation)
bce_hist, acc_hist, grad_hist = train_loop(model, X, y)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].plot(bce_hist, label="BCE")
ax[0].plot(acc_hist, label="Accuracy")
ax[0].set_xlabel("Epochs")
ax[0].set_ylim(0, 1)
for n, grads in enumerate(grad_hist):
    ax[1].plot(grads, label="layer"+str(n))
ax[1].set_xlabel("Epochs")
fig.suptitle(str(activation))
ax[0].legend()
ax[1].legend()
plt.show()

其图表如下：

你可以看到，经过 300 个周期后，最终结果比 ReLU 激活函数差得多。实际上，你可能需要更多的周期才能使这个模型收敛。原因可以在右侧的图表中很容易找到，你可以看到梯度仅对输出层显著，而所有隐藏层的梯度几乎为零。这就是梯度消失效应，这是许多使用 sigmoid 激活函数的神经网络模型的问题。

双曲正切函数的形状类似于 sigmoid 函数，但其曲线更陡。让我们看看它的表现：

activation = nn.Tanh
model = Model(activation=activation)
bce_hist, acc_hist, grad_hist = train_loop(model, X, y)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].plot(bce_hist, label="BCE")
ax[0].plot(acc_hist, label="Accuracy")
ax[0].set_xlabel("Epochs")
ax[0].set_ylim(0, 1)
for n, grads in enumerate(grad_hist):
    ax[1].plot(grads, label="layer"+str(n))
ax[1].set_xlabel("Epochs")
fig.suptitle(str(activation))
ax[0].legend()
ax[1].legend()
plt.show()

这是：

结果看起来比 sigmoid 激活函数要好，但仍然不如 ReLU。实际上，从梯度图表中，你可以注意到隐藏层的梯度是显著的，但第一个隐藏层的梯度明显比输出层的梯度低一个数量级。因此，反向传播在将梯度传播到输入层时不是很有效。

这就是你今天在每个神经网络模型中都看到 ReLU 激活的原因。不仅因为 ReLU 更简单且其导数计算比其他激活函数要快，而且还因为它可以使模型收敛更快。

实际上，有时你可以做得比 ReLU 更好。在 PyTorch 中，你有多个 ReLU 变体。让我们看两个变体。你可以如下比较这三种 ReLU 变体：

x = torch.linspace(-8, 8, 200)
relu = nn.ReLU()(x)
relu6 = nn.ReLU6()(x)
leaky = nn.LeakyReLU()(x)

plt.plot(x, relu, label="ReLU")
plt.plot(x, relu6, label="ReLU6")
plt.plot(x, leaky, label="LeakyReLU")
plt.legend()
plt.show()

首先是 ReLU6，它是 ReLU，但如果函数的输入超过 6.0，则将函数限制在 6.0：

activation = nn.ReLU6
model = Model(activation=activation)
bce_hist, acc_hist, grad_hist = train_loop(model, X, y)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].plot(bce_hist, label="BCE")
ax[0].plot(acc_hist, label="Accuracy")
ax[0].set_xlabel("Epochs")
ax[0].set_ylim(0, 1)
for n, grads in enumerate(grad_hist):
    ax[1].plot(grads, label="layer"+str(n))
ax[1].set_xlabel("Epochs")
fig.suptitle(str(activation))
ax[0].legend()
ax[1].legend()
plt.show()

接下来是 leaky ReLU，其负半轴不再是平坦的，而是一个轻微倾斜的线。这背后的理由是为了在该区域保持一个小的正梯度。

activation = nn.LeakyReLU
model = Model(activation=activation)
bce_hist, acc_hist, grad_hist = train_loop(model, X, y)

fig, ax = plt.subplots(1, 2, figsize=(12, 5))
ax[0].plot(bce_hist, label="BCE")
ax[0].plot(acc_hist, label="Accuracy")
ax[0].set_xlabel("Epochs")
ax[0].set_ylim(0, 1)
for n, grads in enumerate(grad_hist):
    ax[1].plot(grads, label="layer"+str(n))
ax[1].set_xlabel("Epochs")
fig.suptitle(str(activation))
ax[0].legend()
ax[1].legend()
plt.show()

你可以看到，所有这些变体在 300 个 epoch 后都能提供类似的准确性，但从历史曲线中，你会发现有些变体比其他变体更快达到高准确性。这是由于激活函数的梯度与优化器之间的相互作用。没有单一激活函数最适合的黄金规则，但设计的帮助是：

在反向传播中，从输出层传递损失度量到输入层
在特定条件下保持稳定的梯度计算，例如，限制浮点数精度
提供足够的对比，以便反向传递可以对参数进行准确的调整

以下是生成上述所有图表的完整代码：

from sklearn.datasets import make_circles
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim

# Make data: Two circles on x-y plane as a classification problem
X, y = make_circles(n_samples=1000, factor=0.5, noise=0.1)
X = torch.tensor(X, dtype=torch.float32)
y = torch.tensor(y.reshape(-1, 1), dtype=torch.float32)

# Binary classification model
class Model(nn.Module):
    def __init__(self, activation=nn.ReLU):
        super().__init__()
        self.layer0 = nn.Linear(2,5)
        self.act0 = activation()
        self.layer1 = nn.Linear(5,5)
        self.act1 = activation()
        self.layer2 = nn.Linear(5,5)
        self.act2 = activation()
        self.layer3 = nn.Linear(5,5)
        self.act3 = activation()
        self.layer4 = nn.Linear(5,1)
        self.act4 = nn.Sigmoid()

    def forward(self, x):
        x = self.act0(self.layer0(x))
        x = self.act1(self.layer1(x))
        x = self.act2(self.layer2(x))
        x = self.act3(self.layer3(x))
        x = self.act4(self.layer4(x))
        return x

# train the model and produce history
def train_loop(model, X, y, n_epochs=300, batch_size=32):
    loss_fn = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.0001)
    batch_start = torch.arange(0, len(X), batch_size)

    bce_hist = []
    acc_hist = []
    grad_hist = [[],[],[],[],[]]

    for epoch in range(n_epochs):
        # train model with optimizer
        model.train()
        layer_grad = [[],[],[],[],[]]
        for start in batch_start:
            X_batch = X[start:start+batch_size]
            y_batch = y[start:start+batch_size]
            y_pred = model(X_batch)
            loss = loss_fn(y_pred, y_batch)
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            # collect mean absolute value of gradients
            layers = [model.layer0, model.layer1, model.layer2, model.layer3, model.layer4]
            for n,layer in enumerate(layers):
                mean_grad = float(layer.weight.grad.abs().mean())
                layer_grad[n].append(mean_grad)
        # evaluate BCE and accuracy at end of each epoch
        model.eval()
        with torch.no_grad():
            y_pred = model(X)
            bce = float(loss_fn(y_pred, y))
            acc = float((y_pred.round() == y).float().mean())
        bce_hist.append(bce)
        acc_hist.append(acc)
        for n, grads in enumerate(layer_grad):
            grad_hist[n].append(sum(grads)/len(grads))
        # print metrics every 10 epochs
        if epoch % 10 == 9:
            print("Epoch %d: BCE=%.4f, Accuracy=%.2f%%" % (epoch, bce, acc*100))
    return bce_hist, acc_hist, layer_grad

# pick different activation functions and compare the result visually
for activation in [nn.Sigmoid, nn.Tanh, nn.ReLU, nn.ReLU6, nn.LeakyReLU]:
    model = Model(activation=activation)
    bce_hist, acc_hist, grad_hist = train_loop(model, X, y)

    fig, ax = plt.subplots(1, 2, figsize=(12, 5))
    ax[0].plot(bce_hist, label="BCE")
    ax[0].plot(acc_hist, label="Accuracy")
    ax[0].set_xlabel("Epochs")
    ax[0].set_ylim(0, 1)
    for n, grads in enumerate(grad_hist):
        ax[1].plot(grads, label="layer"+str(n))
    ax[1].set_xlabel("Epochs")
    fig.suptitle(str(activation))
    ax[0].legend()
    ax[1].legend()
    plt.show()

进一步阅读

本节提供了更多资源，如果你想深入了解这个主题。

nn.Sigmoid 来自 PyTorch 文档
nn.Tanh 来自 PyTorch 文档
nn.ReLU 来自 PyTorch 文档
nn.ReLU6 来自 PyTorch 文档
nn.LeakyReLU 来自 PyTorch 文档
梯度消失问题，维基百科

摘要

在这一章中，你了解到如何为你的 PyTorch 模型选择激活函数。你学到了：

常见的激活函数是什么，它们的表现如何
如何在你的 PyTorch 模型中使用激活函数
什么是梯度消失问题
激活函数对模型性能的影响

Machine-Learning-Mastery-PyTorch-教程-七-