02Pytorch-2.1基本知识总结及搭建神经网络工作流

207 阅读11分钟

1 Tensors

什么是张量?张量是建模机器学习或者深度学习的一个基本单位或者说基本块,说人话的话就是:机器学习和深度学习其实最重要部分是对数据的学习,也就是我们要通过数据来获取信息;那么我们怎么处理数据呢?很简单,就是将数据转换成计算机最喜欢的--“数字”,这种数字的形式在pytorch中就称为“张量”

当然这里的数据包含很多种,文字、图片、视频、音频等等。

1.1 tensor的基本形式

总共包含四种:scalar, vector, matrix, 其他

类型ndimshapeExample
scalar0torch.size([ ])torch.tensor(7)
vector1torch.size([有几个数])torch.tensor([1, 2, 3])
matrix2torch.size([n, m])torch.tensor([[1, 2], [3, 4]])
...xtorch.size([a, b, c, ...])torch.tensor([[[[[...]]]]])

可以通过创建张量时所包含的中括号数来判断是几维。同时标量可以通过tensor.item() 取出这个数的值。

1.2 创建tensor的方法

tensor.rand(size=(a, b))、tensor.zeros(size=(a, b))、tensor.ones(size=(a, b))、tensor.arange(start, end, step)

1.3 tensor的dtype

torch.float32 / torch.float、 torch.float16 / torch.half、 torch.float64 / torch.double torch.int16 / torch.short、 torch.int32 / torch.int、 torch.int64 / torch.long

上述只列出了一点常用的数据类型,实际上还有很多种数据类型,一般情况下用得最多的是torch.float32,当然,数据类型的精度越高,最终从数据中提取的信息也就越精确;数据类型的精度越低,最终计算的速度就越快。

1.4 一些tensor的基本操作

+ - * / 这些操作都是对矩阵每个元素进行对应操作 torch.matmul(tensor, tensor)或者torch.mm(tensor, tensor)这种是矩阵乘法(也称为dot product即向量中的点积或者内积),必须满足对应维度的关系。使用矩阵乘法是发挥gpu并行计算的很好体现,极大提升计算速度。

tensor.min(), tensor.max(), tensor.mean(), tensor.sum(), tensor.argmax(), tensor.argmin()

这里扩充一下数据转换的方法:使用tensor.type(torch.float32),此时得到的是一个新的张量,只不过这个张量和原来张量的类型不同。使用tensor.float32()也是一个道理。

torch.reshape()、torch.view() 前一个返回新的张量,后一个返回的是类似指针的东西 torch.squeeze()、torch.unsqueeze()将维度为1的去掉 torch.permute() 返回一个新的张量,可以交换原张量的维度次序

1.5 与numpy的相互转换

1.6 Reproducibility

通过设定随机数种子,来复现随机的结果。

import torch
import random

# # Set the random seed
RANDOM_SEED=42 # try changing this to different values and see what happens to the numbers below
torch.manual_seed(seed=RANDOM_SEED) 
random_tensor_C = torch.rand(3, 4)

# Have to reset the seed every time a new rand() is called 
# Without this, tensor_D would be different to tensor_C 
torch.random.manual_seed(seed=RANDOM_SEED) # try commenting this line out and seeing what happens
random_tensor_D = torch.rand(3, 4)

print(f"Tensor C:\n{random_tensor_C}\n")
print(f"Tensor D:\n{random_tensor_D}\n")
print(f"Does Tensor C equal Tensor D? (anywhere)")
random_tensor_C == random_tensor_D

2 pytorch回归工作流

image.png

整个流程就那么几步:

  1. 准备好数据,将数据分成训练集(60~80)、验证集(10~20)、测试集(10~20),将其转换成对应的tensors并放到对应的device上
  2. 构建我们的神经网络或者说是模型,这个过程涉及到nn.module、nn.parameters,这些库的含义概念理解清楚。nn.module就是搭建整个模型的基本块,nn.parameters就是模型需要训练的参数。
  3. 构建完模型后,准备训练模型,这个过程就涉及到损失函数的选择、优化器的选择,epoch的设定,整个流程跑下来(前向传播、计算损失、梯度清零、反向传播、梯度更新)
  4. 这里面训练和测试过程好像是同步进行的,记得设定model.eval()和with torch.inference_mode():的上下文环境,主要是为了加快推理速度和禁止梯度更新。推理过程也很简单,就是再走一遍前向传播过程就行了,同时可以在这里打印一些信息出来,像损失啦、精确度啦还有其他评价指标。
  5. 最后整个下来没什么问题了,就可以将模型的参数保存下来以便迁移使用。

model.parameters()model.state_dic() 前者只是在训练中会更新的参数,后者保存的是整个模型的参数信息。

最后这里放下流程代码:

# Import PyTorch and matplotlib
import torch
from torch import nn # nn contains all of PyTorch's building blocks for neural networks
import matplotlib.pyplot as plt

# Check PyTorch version
torch.__version__

# Setup device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# 1 数据处理
# Create weight and bias
weight = 0.7
bias = 0.3

# Create range values
start = 0
end = 1
step = 0.02

# Create X and y (features and labels)
X = torch.arange(start, end, step).unsqueeze(dim=1) # without unsqueeze, errors will happen later on (shapes within linear layers)
y = weight * X + bias 
X[:10], y[:10]

# Split data
train_split = int(0.8 * len(X))
X_train, y_train = X[:train_split], y[:train_split]
X_test, y_test = X[train_split:], y[train_split:]

len(X_train), len(y_train), len(X_test), len(y_test)

# 2 构建模型
# Subclass nn.Module to make our model
class LinearRegressionModelV2(nn.Module):
    def __init__(self):
        super().__init__()
        # Use nn.Linear() for creating the model parameters
        self.linear_layer = nn.Linear(in_features=1, 
                                      out_features=1)
    
    # Define the forward computation (input data x flows through nn.Linear())
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.linear_layer(x)

# Set the manual seed when creating the model (this isn't always needed but is used for demonstrative purposes, try commenting it out and seeing what happens)
torch.manual_seed(42)
model_1 = LinearRegressionModelV2()
model_1, model_1.state_dict()

# 3 训练模型
# Create loss function
loss_fn = nn.L1Loss()

# Create optimizer
optimizer = torch.optim.SGD(params=model_1.parameters(), # optimize newly created model's parameters
                            lr=0.01)
                            torch.manual_seed(42)

# Set the number of epochs 
epochs = 1000 

# Put data on the available device
# Without this, error will happen (not all model/data on device)
X_train = X_train.to(device)
X_test = X_test.to(device)
y_train = y_train.to(device)
y_test = y_test.to(device)

for epoch in range(epochs):
    ### Training
    model_1.train() # train mode is on by default after construction

    # 1. Forward pass
    y_pred = model_1(X_train)

    # 2. Calculate loss
    loss = loss_fn(y_pred, y_train)

    # 3. Zero grad optimizer
    optimizer.zero_grad()

    # 4. Loss backward
    loss.backward()

    # 5. Step the optimizer
    optimizer.step()

    ### Testing
    model_1.eval() # put the model in evaluation mode for testing (inference)
    # 1. Forward pass
    with torch.inference_mode():
        test_pred = model_1(X_test)
    
        # 2. Calculate the loss
        test_loss = loss_fn(test_pred, y_test)

    if epoch % 100 == 0:
        print(f"Epoch: {epoch} | Train loss: {loss} | Test loss: {test_loss}")
        
# 4 保存
from pathlib import Path

# 1. Create models directory 
MODEL_PATH = Path("models")
MODEL_PATH.mkdir(parents=True, exist_ok=True)

# 2. Create model save path 
MODEL_NAME = "01_pytorch_workflow_model_1.pth"
MODEL_SAVE_PATH = MODEL_PATH / MODEL_NAME

# 3. Save the model state dict 
print(f"Saving model to: {MODEL_SAVE_PATH}")
torch.save(obj=model_1.state_dict(), # only saving the state_dict() only saves the models learned parameters
           f=MODEL_SAVE_PATH) 
           
# Instantiate a fresh instance of LinearRegressionModelV2
loaded_model_1 = LinearRegressionModelV2()

# Load model state dict 
loaded_model_1.load_state_dict(torch.load(MODEL_SAVE_PATH))

# Put model to target device (if your data is on GPU, model will have to be on GPU to make predictions)
loaded_model_1.to(device)

print(f"Loaded model:\n{loaded_model_1}")
print(f"Model on device:\n{next(loaded_model_1.parameters()).device}")

3 pytorch分类工作流

分类问题要比回归问题稍微复杂一点点,回归直接给出预测值即可,分类还需要根据这个预测值去分析他属于哪个类别。分类问题总共有三种:Binary classificationMulti-class classificationMulti-label classification

这里的第一步还是处理数据,只是数据的处理过程可能稍微复杂一点,因为数据包含的内容稍微多了一点,输入inputs可能包含多个features, 同时我们要考虑将其转换为tensor时的dtype。

第二步就是构建我们的模型,由于分类不一定能用一条直线就分开,所以这里的模型就不一定只用一个线性层就能分开,会添加一些非线性元素进行组合。一般情况是在线性后面加一个relu(),为了更加方便我们可以引入nn.Sequential(),在这里面去写线性和非线性的组合,不过注意:通过这样创建的网络结构,只会按照nn.Sequential里面的顺序线性穿过网络,所以当一些网络需要返回到前面的网络层时,还是要自己去重新手动定义网络,然后在forward中书写前向传播顺序。

在第二步中我们还需设定损失函数,一般来说二分类采用BCELossWithLogits(不使用BCELoss是因为它内部没有替我们实现nn.sigmoid,这就导致在后续训练模型的过程中如果你没有手动加上sigmoid会因为数据变化不受控导致报cuda error的问题😭😭我就被这个折磨了一个多小时),多分类采用CrossEntropyLoss。顺便提一下,回归一般采用L1Loss和MSELoss。

分类问题在处理时需要将模型的输出结果也就是logits(逻辑值)转换为我们所能识别的分类标签(0, 1, 2)。但是这个logits,模型输出的值五花八门,不太好直接翻译成分类标签,于是我们使用sigmoid(多分类时采用softmax,多个值加起来为1;注意:使用softmax时需要指定是哪一个维度进行softmax,如果我们设定dim=1,则是在维度1上的所有数进行softmax,如果没理解透彻,构造一个三维张量,设定dim为不同值,看哪个维度值加起来为1,做个简单实验就明白了)这个函数来将logits映射到0~1这个区间,当值小于0.5我们认为模型大概率将其分类为0,反之则为1。搞清楚pred_logits、pred_probs、pred_labels之间的关系。

下面列出了一些模型欠拟合的解决思路

Model improvement technique*What does it do?
Add more layersEach layer potentially increases the learning capabilities of the model with each layer being able to learn some kind of new pattern in the data. More layers are often referred to as making your neural network deeper.
Add more hidden unitsSimilar to the above, more hidden units per layer means a potential increase in learning capabilities of the model. More hidden units are often referred to as making your neural network wider.
Fitting for longer (more epochs)Your model might learn more if it had more opportunities to look at the data.
Changing the activation functionsSome data just can't be fit with only straight lines (like what we've seen), using non-linear activation functions can help with this (hint, hint).
Change the learning rateLess model specific, but still related, the learning rate of the optimizer decides how much a model should change its parameters each step, too much and the model overcorrects, too little and it doesn't learn enough.
Change the loss functionAgain, less model specific but still important, different problems require different loss functions. For example, a binary cross entropy loss function won't work with a multi-class classification problem.

这里列出一个代码示例:

# Import dependencies
import torch
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

# Set the hyperparameters for data creation
NUM_CLASSES = 4
NUM_FEATURES = 2
RANDOM_SEED = 42

# 1. Create multi-class data
X_blob, y_blob = make_blobs(n_samples=1000,
    n_features=NUM_FEATURES, # X features
    centers=NUM_CLASSES, # y labels 
    cluster_std=1.5, # give the clusters a little shake up (try changing this to 1.0, the default)
    random_state=RANDOM_SEED
)

# 2. Turn data into tensors
X_blob = torch.from_numpy(X_blob).type(torch.float)
y_blob = torch.from_numpy(y_blob).type(torch.LongTensor)
print(X_blob[:5], y_blob[:5])

# 3. Split into train and test sets
X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob,
    y_blob,
    test_size=0.2,
    random_state=RANDOM_SEED
)

# 4. Plot data
plt.figure(figsize=(10, 7))
plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_blob, cmap=plt.cm.RdYlBu);
# Create device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"

from torch import nn

# Build model
class BlobModel(nn.Module):
    def __init__(self, input_features, output_features, hidden_units=8):
        """Initializes all required hyperparameters for a multi-class classification model.

        Args:
            input_features (int): Number of input features to the model.
            out_features (int): Number of output features of the model
              (how many classes there are).
            hidden_units (int): Number of hidden units between layers, default 8.
        """
        super().__init__()
        self.linear_layer_stack = nn.Sequential(
            nn.Linear(in_features=input_features, out_features=hidden_units),
            # nn.ReLU(), # <- does our dataset require non-linear layers? (try uncommenting and see if the results change)
            nn.Linear(in_features=hidden_units, out_features=hidden_units),
            # nn.ReLU(), # <- does our dataset require non-linear layers? (try uncommenting and see if the results change)
            nn.Linear(in_features=hidden_units, out_features=output_features), # how many classes are there?
        )
    
    def forward(self, x):
        return self.linear_layer_stack(x)

# Create an instance of BlobModel and send it to the target device
model_4 = BlobModel(input_features=NUM_FEATURES, 
                    output_features=NUM_CLASSES, 
                    hidden_units=8).to(device)
                    
# Create loss and optimizer
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_4.parameters(), 
                            lr=0.1) # exercise: try changing the learning rate here and seeing what happens to the model's performance
# Fit the model
torch.manual_seed(42)

# Set number of epochs
epochs = 100

# Put data to target device
X_blob_train, y_blob_train = X_blob_train.to(device), y_blob_train.to(device)
X_blob_test, y_blob_test = X_blob_test.to(device), y_blob_test.to(device)

for epoch in range(epochs):
    ### Training
    model_4.train()

    # 1. Forward pass
    y_logits = model_4(X_blob_train) # model outputs raw logits 
    y_pred = torch.softmax(y_logits, dim=1).argmax(dim=1) # go from logits -> prediction probabilities -> prediction labels
    # print(y_logits)
    # 2. Calculate loss and accuracy
    loss = loss_fn(y_logits, y_blob_train) 
    acc = accuracy_fn(y_true=y_blob_train,
                      y_pred=y_pred)

    # 3. Optimizer zero grad
    optimizer.zero_grad()

    # 4. Loss backwards
    loss.backward()

    # 5. Optimizer step
    optimizer.step()

    ### Testing
    model_4.eval()
    with torch.inference_mode():
      # 1. Forward pass
      test_logits = model_4(X_blob_test)
      test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)
      # 2. Calculate test loss and accuracy
      test_loss = loss_fn(test_logits, y_blob_test)
      test_acc = accuracy_fn(y_true=y_blob_test,
                             y_pred=test_pred)

    # Print out what's happening
    if epoch % 10 == 0:
        print(f"Epoch: {epoch} | Loss: {loss:.5f}, Acc: {acc:.2f}% | Test Loss: {test_loss:.5f}, Test Acc: {test_acc:.2f}%") 

最后再补充一些分类的评价指标:

Metric name/Evaluation methodDefintionCode
AccuracyOut of 100 predictions, how many does your model get correct? E.g. 95% accuracy means it gets 95/100 predictions correct.torchmetrics.Accuracy() or sklearn.metrics.accuracy_score()
PrecisionProportion of true positives over total number of samples. Higher precision leads to less false positives (model predicts 1 when it should've been 0).torchmetrics.Precision() or sklearn.metrics.precision_score()
RecallProportion of true positives over total number of true positives and false negatives (model predicts 0 when it should've been 1). Higher recall leads to less false negatives.torchmetrics.Recall() or sklearn.metrics.recall_score()
F1-scoreCombines precision and recall into one metric. 1 is best, 0 is worst.torchmetrics.F1Score() or sklearn.metrics.f1_score()
Confusion matrixCompares the predicted values with the true values in a tabular way, if 100% correct, all values in the matrix will be top left to bottom right (diagnol line).torchmetrics.ConfusionMatrix or sklearn.metrics.plot_confusion_matrix()
Classification reportCollection of some of the main classification metrics such as precision, recall and f1-score.sklearn.metrics.classification_report()