搭建模型层

一、神经网络是什么？

神经网络是一个神经元的集合，通过层连接。每个神经元都是一个小的计算单元，执行简单的计算以整体解决一个问题。神经元是按层次组织的。图层有3种类型: 输入层、隐藏层和外层【隐藏层实现方式对外透明】。除了输入层之外，每一层都包含一定数量的神经元。神经网络模拟了人脑处理信息的方式。

二、神经网络的组成

激活函数：决定了神经元是否应该被激活。神经网络中的计算过程也包括使用激活函数。若一个神经元被激活，那么就意味着在这个神经元的输入比较重要。激活函数的种类各不相同。选择使用哪个激活函数在于你想要输出什么。激活函数的另一个重要作用是增加模型的非线性(便于模拟人脑的思考行为)。
- $f(x)=\left\{\begin{array}{ll} 0, & \text { if } x<0 \\ 1, & \text { if } x \geq 0 \end{array}\right.$ , 二进制用于设置一个输出节点为1，如果函数结果为正，如果函数结果为负，则为0
- $f(x)=\frac{1}{1+e^{-x}}$ , Sigmoid 用于预测输出0、1的概率，取值范围为[0, 1]。以概率最高者作为输出结果。
- $f(x)=\frac{e^{x}-e^{-x}}{e^{x}+e^{-x}}$ , Tanh 用于预测输出1、-1的概率，同样也在[0, 1] 之间，通常用于处理分类问题
- $f(x)=\left\{\begin{array}{ll} 0, & \text { if } x<0 \\ x, & \text { if } x \geq 0 \end{array}\right.$ ,如果函数结果为负，则 ReLU 用于将输出节点设置为0，如果结果为正，则保持结果值
权重：影响我们网络的输出接近预期输出值的程度。当一个输入进入神经元时，它被乘以一个权值，结果输出用关于观测，要么传递到神经网络的下一层进一步处理。一个层中所有神经元的权值被组织成一个张量，也就是说层与层之间都是“隔开”的
偏移量：弥补了激活函数输出和预期之间的差额。低偏差意味着网络对输出形式做出了更多的假设，而高偏差值使得对输出形式的假设更少。

可以说，一个神经网络层的输出是每层的权值乘以对应的输入，然后累加并在最后加上偏移量，经由激活函数处理后所得的结果。

$x=\sum(\text { weights } * \text { inputs })+\text { bias }$ ，这里的f(x) 是激活函数

三、建立一个神经网络

神经网络由执行数据操作的层/模块组成。torch.nn 命名空间提供了构建神经网络所需的所有构建块。PyTorch 中的每个模块都将 nn.Module 子类化。神经网络本身就是一个模块，由其他模块(层)组成。这种嵌套结构让我们可以轻易地构建和管理复杂的架构。

这里的 nn 其实就是 neural network 的简写

下面，我们来建立一个神经网络来对 FashionMCIST 数据集中的图片进行分类

%matplotlib inline
import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

1. 获取训练的硬件支持

若配置了像GPU一样的硬件加速器，则可以直接在其上训练模型，检查一下是否配置了torch.cuda，否则直接使用CPU

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print('Using {} device'.format(device))

Using cuda device

2. 定义分类

我们可以将 nn.Module 子类化来自定义神经网络，并且利用此模块中的 init 方法来初始化神经网络层。每一个 nn.Module 子类都会利用 forward 方法里的操作对输入数据进行预处理

自定义神经网络由如下部分组成：

具有28x28或784特征/像素的输入层
第一个线性模块获取输入的784个特性，并将其转换为一个具有512个特性的隐藏层(如何转换？)
应用 ReLU 激活函数进行转换(ReLu就是以0分断的分段函数，大于等于0取原值，否则取0)
第二个线性模块从第一个隐藏层获取512个特征作为输入，并将其转换为具有512个特征的下一个隐藏层
应用 ReLU 激活函数进行转换
第三个线性模块将512个特征作为第二隐藏层的输入，并将其转换为10个类的输出层
应用 ReLU 激活函数进行转换

为什么每次再隐藏层处理之后就要使用ReLU进行转换？
便于模拟人脑中的非线性思维

我们在定义自已的神经网络时，需要继承nn.Module类，并重新实现构造函数__init__构造函数和forward这两个方法。但有一些注意技巧：

一般把网络中具有可学习参数的层（如全连接层、卷积层等）放在构造函数__init__()中，当然我也可以把不具有参数的层也放在里面；
一般把不具有可学习参数的层(如ReLU、dropout、BatchNormanation层)可放在构造函数中，也可不放在构造函数中，如果不放在构造函数__init__里面，则在forward方法里面可以使用nn.functional来代替
forward方法是必须要重写的，它是实现模型的功能，实现各个层之间的连接关系的核心。

class NeuralNetwork(nn.Module):
    def __init__(self): # 对创建的神经网络进行初始化
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten() # 扁平化：将输入数据压成一行
        
        # 三次隐藏层处理以及其后的ReLU激活函数处理
        
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x): # 对输入数据进行预处理
        x = self.flatten(x)
        logits = self.linear_relu_stack(x) # 对数据数据也要进行一次扁平 + ReLU 处理
        return logits

要弄清定义函数和定义类的区别：
1. 关键字不同类为class，函数为 def
2. 传入参数的意义不同类为所继承的父类，函数为函数参数

下面，我们创建一个上面自定义类 NeuralNetwork 的实例，并将其移动到 device，然后打印

model = NeuralNetwork().to(device) # 此处的 to 有什么作用？
print(model)

NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
)

Module.to 和 Tensor.to区别：前者为原地操作，后者为非原地操作
函数to的作用是原地 ( in-place ) 修改Module，它可以当成三种函数来使用：
1. function:: to(device=None, dtype=None, non_blocking=False)：
2. function:: to(dtype, non_blocking=False)：
3. function:: to(tensor, non_blocking=False)：

要使用该模型，我们将输入数据传递给它。模型会对输入数据使用 forward 中的方法来进行预处理。但是，不要直接调用 model.forward() ！对输入调用模型将返回一个10维张量，其中包含每个类的原始预测值。

我们通过对 nn. Softmax 的一个实例进行传递，得到预测的密度。

X = torch.rand(1, 28, 28, device=device)
logits = model(X) 
pred_probab = nn.Softmax(dim=1)(logits)
y_pred = pred_probab.argmax(1)
print(f"Predicted class: {y_pred}")

Predicted class: tensor([5], device='cuda:0')

3. 权重和偏移

nn.Linear 模块随机初始化每个层的 weihts 权值和 bias 偏移，并使用Tensors在存储这些值。

print(f"First Linear weights: {model.linear_relu_stack[0].weight} \n")

print(f"First Linear bias: {model.linear_relu_stack[0].bias} \n")

First Linear weights: Parameter containing:
tensor([[-2.0643e-02, -1.0010e-02,  2.5111e-02,  ...,  1.4962e-02,
          8.7132e-03, -1.6987e-02],
        [-3.4400e-02, -8.4512e-03, -1.1077e-03,  ...,  1.2789e-02,
          1.2304e-02, -6.4526e-05],
        [-3.5437e-02,  1.7918e-02,  1.5651e-02,  ..., -1.0909e-02,
          4.4604e-03,  4.4839e-03],
        ...,
        [-1.3130e-02,  1.2011e-02,  2.3141e-02,  ...,  6.6499e-04,
         -2.6889e-02,  1.6781e-02],
        [ 5.5485e-03, -2.9435e-02,  3.1338e-02,  ..., -2.9512e-02,
          2.0570e-02, -2.9319e-02],
        [-3.0080e-02,  2.0014e-03, -2.0637e-02,  ..., -1.9132e-02,
          5.4276e-03, -2.9796e-02]], device='cuda:0', requires_grad=True) 

First Linear bias: Parameter containing:
tensor([-2.7859e-02, -1.4277e-02, -2.4598e-02, -3.3909e-02, -1.1257e-02,
        -3.0373e-02, -1.4296e-02, -3.3952e-02, -2.0355e-02,  2.0494e-02,
         3.0220e-02, -2.3674e-02,  3.0524e-02, -6.1689e-03,  3.0824e-02,
         2.9576e-02,  1.1716e-02, -2.0029e-03,  1.0179e-02, -2.8748e-02,
         3.0874e-02,  2.4701e-02, -1.3931e-02,  1.9038e-02,  1.8504e-02,
         1.1506e-02,  1.0955e-03,  1.1052e-02,  1.7949e-02,  7.8719e-03,
         ......
        -4.9406e-03, -1.2614e-02, -2.4643e-04,  4.4422e-03, -1.2743e-02,
         1.9200e-02,  2.7365e-02,  2.0517e-02, -1.8013e-02,  2.8233e-02,
        -3.2521e-02,  2.3622e-02, -8.3671e-03,  8.0795e-03,  1.3233e-02,
        -2.7003e-02, -1.4649e-02, -6.7963e-03,  2.4624e-02,  2.1091e-02,
        -1.5413e-02,  2.6431e-02, -1.0906e-02,  2.3642e-02,  5.8323e-05,
         1.6193e-02, -2.7259e-02], device='cuda:0', requires_grad=True)

4. 模型层

让我们来分析一下 FashionMNIST 模型的各个层次。为了便于演示，我们将取3张尺寸为28x28的图像样本，看看当我们通过网络传输时它会发生什么。

input_image = torch.rand(3,28,28)
print(input_image.size())

torch.Size([3, 28, 28])

input_image # 输出三层 大小为 28 * 28 的张量

tensor([[[0.1806, 0.1588, 0.2650,  ..., 0.8189, 0.1492, 0.7021],
         [0.6383, 0.5496, 0.8597,  ..., 0.3353, 0.4231, 0.8923],
         [0.4220, 0.5437, 0.4618,  ..., 0.1781, 0.4458, 0.6504],
         ...,
         [0.6918, 0.0460, 0.4659,  ..., 0.2298, 0.4545, 0.3284],
         [0.0353, 0.4444, 0.3548,  ..., 0.1031, 0.7395, 0.7877],
         [0.7347, 0.1196, 0.5165,  ..., 0.8644, 0.3094, 0.3349]],

        [[0.7137, 0.4733, 0.6694,  ..., 0.2383, 0.4780, 0.3434],
         [0.5816, 0.0685, 0.0298,  ..., 0.5669, 0.8380, 0.7312],
         [0.6724, 0.0839, 0.8786,  ..., 0.0268, 0.4224, 0.7359],
         ...,
         [0.2831, 0.0440, 0.1511,  ..., 0.6940, 0.3366, 0.2833],
         [0.8742, 0.1033, 0.1213,  ..., 0.3196, 0.2729, 0.6815],
         [0.3664, 0.8527, 0.7868,  ..., 0.2662, 0.0993, 0.4389]],

        [[0.8529, 0.3103, 0.8327,  ..., 0.8391, 0.0164, 0.4294],
         [0.3549, 0.7392, 0.1445,  ..., 0.3816, 0.6529, 0.9349],
         [0.6571, 0.7573, 0.8282,  ..., 0.7254, 0.2802, 0.1497],
         ...,
         [0.4622, 0.1789, 0.3085,  ..., 0.0842, 0.4107, 0.5507],
         [0.4934, 0.0883, 0.2976,  ..., 0.0023, 0.4198, 0.7771],
         [0.4539, 0.7521, 0.4816,  ..., 0.8054, 0.3958, 0.7503]]])

① nn.Flatten

我们初始化 nn.Flatten 层，将每个2d平面的 28x28图像转换为一个784像素值的连续数组(维持最小批量维度(dim = 0))。每个像素被传递到神经网络的输入层。

flatten = nn.Flatten()
flat_image = flatten(input_image)
print(flat_image.size())

torch.Size([3, 784])

② nn.Linear

线性层是一个模块，将存储的权重和偏差输入进行线性映射。输入层中每个像素的灰度值将送入隐藏层中的神经元进行计算。转换所用的计算方法是 weight * input + bias。这是线性算法。

layer1 = nn.Linear(in_features=28*28, out_features=20)
hidden1 = layer1(flat_image)
print(hidden1.size())

torch.Size([3, 20])

③ nn.ReLU

非线性激活用于在模型的输入和输出之间创建复杂的映射。它们在线性变换之后被用来引入非线性，帮助神经网络学习各种各样的现象。在这个模型中，我们在我们的线性层之间使用 nn.ReLU，当然我们也可以使用其他的非线性激活函数来实现。

ReLU接受线性层计算的输出，并将负值替换为零。

Linear output: ${ x = {weight * input + bias}}$ .
ReLU: $f(x)= \begin{cases} 0, & \text{if } x < 0\\ x, & \text{if } x\geq 0\\ \end{cases}$

# 作用效果
print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

Before ReLU: tensor([[-0.0820,  0.0107, -0.3294,  0.1028,  0.3456, -0.2562, -0.0999, -0.4788,
          0.5453,  0.1438,  0.8770, -0.1515,  0.4959,  0.5390,  0.0439,  0.4358,
         -0.1566,  0.5860,  0.3900,  0.4657],
        [-0.4172,  0.1570, -0.2344,  0.0813,  0.2614, -0.3595,  0.0420, -0.2112,
          0.7050,  0.1588,  0.3820, -0.2690,  0.3867,  0.7539, -0.1585,  0.5034,
         -0.0430,  0.4146,  0.5382,  0.4226],
        [-0.5348,  0.3887, -0.6873,  0.2107,  0.4464, -0.3431, -0.3271, -0.2255,
          0.2931,  0.3856,  0.3697, -0.2962,  0.5845,  0.8527, -0.0195,  0.3592,
         -0.1541,  0.4704,  0.1831,  0.8558]], grad_fn=<AddmmBackward0>)


After ReLU: tensor([[0.0000, 0.0107, 0.0000, 0.1028, 0.3456, 0.0000, 0.0000, 0.0000, 0.5453,
         0.1438, 0.8770, 0.0000, 0.4959, 0.5390, 0.0439, 0.4358, 0.0000, 0.5860,
         0.3900, 0.4657],
        [0.0000, 0.1570, 0.0000, 0.0813, 0.2614, 0.0000, 0.0420, 0.0000, 0.7050,
         0.1588, 0.3820, 0.0000, 0.3867, 0.7539, 0.0000, 0.5034, 0.0000, 0.4146,
         0.5382, 0.4226],
        [0.0000, 0.3887, 0.0000, 0.2107, 0.4464, 0.0000, 0.0000, 0.0000, 0.2931,
         0.3856, 0.3697, 0.0000, 0.5845, 0.8527, 0.0000, 0.3592, 0.0000, 0.4704,
         0.1831, 0.8558]], grad_fn=<ReluBackward0>)

④ nn.Sequential

nn.Sequential 是模块的有序容器。数据按照定义的相同顺序在所有模块中传递。顺序容器可以用来组合一个像 seq_modules 模块这样的快速网络。

seq_modules = nn.Sequential(
    flatten, # 串行
    layer1, # 线性层处理
    nn.ReLU(), # 非线性层处理
    nn.Linear(20, 10) # 传入的是输入和输出的尺寸，各层的权值和偏移量已被 nn.Linear 自动初始化了
)
input_image = torch.rand(3,28,28)
logits = seq_modules(input_image)
logits.shape

torch.Size([3, 10])

⑤ nn.Softmax

神经网络的最后一个线性层返回[-infty，infty ]中的 logits-raw 值，这些值被传递给 nn.Softmax 模块。Softmax激活函数是用来计算概率输出的神经网络。它只用于神经网络的输出层。结果被缩放到值[0, 1]代表模型的每一类预测密度(常用于分类，取概率最高者作为输出)。Dim 参数指示结果值必须和为1的维度。概率最高的节点预测期望的输出。

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)
pred_probab # 第一层的总和为1，其余几层可以不为1

tensor([[0.1157, 0.0989, 0.0702, 0.1203, 0.1093, 0.1068, 0.0883, 0.0785, 0.0864,
         0.1256], # 和为 1
        [0.1183, 0.1078, 0.0719, 0.1274, 0.1060, 0.1052, 0.0917, 0.0801, 0.0873,
         0.1042], # 和为 0.9999 非 1
        [0.1153, 0.1056, 0.0810, 0.1269, 0.1109, 0.1035, 0.0820, 0.0750, 0.0797,
         0.1201]], grad_fn=<SoftmaxBackward0>)

5. 模型参数

神经网络中的许多层都是参数化的，即具有在训练期间优化的相关权重和偏差。子类化 nn.Module模块自动跟踪模型对象内部定义的所有字段，并使用模型的 parameters ()或 named_parameters()方法访问所有参数。

下面，我们迭代每个参数，并打印它的大小和值的预览。

print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters(): # 遍历每一层
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

Model structure:  NeuralNetwork(
  (flatten): Flatten(start_dim=1, end_dim=-1)
  (linear_relu_stack): Sequential(
    (0): Linear(in_features=784, out_features=512, bias=True)
    (1): ReLU()
    (2): Linear(in_features=512, out_features=512, bias=True)
    (3): ReLU()
    (4): Linear(in_features=512, out_features=10, bias=True)
    (5): ReLU()
  )
) 


Layer: linear_relu_stack.0.weight | Size: torch.Size([512, 784]) | Values : tensor([[-2.0643e-02, -1.0010e-02,  2.5111e-02,  ...,  1.4962e-02,
          8.7132e-03, -1.6987e-02],
        [-3.4400e-02, -8.4512e-03, -1.1077e-03,  ...,  1.2789e-02,
          1.2304e-02, -6.4526e-05]], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.0.bias | Size: torch.Size([512]) | Values : tensor([-0.0279, -0.0143], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.weight | Size: torch.Size([512, 512]) | Values : tensor([[ 0.0292,  0.0030,  0.0112,  ...,  0.0099, -0.0405, -0.0232],
        [-0.0163, -0.0240,  0.0004,  ...,  0.0181,  0.0338,  0.0156]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.2.bias | Size: torch.Size([512]) | Values : tensor([-0.0201,  0.0041], device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.weight | Size: torch.Size([10, 512]) | Values : tensor([[-0.0197,  0.0282, -0.0331,  ...,  0.0116,  0.0324,  0.0258],
        [ 0.0246, -0.0118, -0.0015,  ..., -0.0424,  0.0015,  0.0236]],
       device='cuda:0', grad_fn=<SliceBackward0>) 

Layer: linear_relu_stack.4.bias | Size: torch.Size([10]) | Values : tensor([-0.0240,  0.0121], device='cuda:0', grad_fn=<SliceBackward0>)

PyTorch 03 - Building the model layers