神经网络的层

卷积层

CONV2D

CLASS torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros' , device=None, dtype=None)[SOURCE]

dilation(扩张)
groups: 一般都设置成1，分组卷积时才需改动（几乎用不到）
bias: 一般设置为True，对卷积后的结果是否加减一个常数
padding_mode: 对padding按照什么样的模式进行填充，如'zero'

Applies a 2D convolution over an input signal composed of several input planes.

在由几个输入平面组成的输入信号上应用二维卷积。

In the simplest case, the output value of the layer with input size(N,C_{in},H,W)and output(N,C_{out},H{out},W_{out}) can be precisely described as:

out(N i ​ ,C out j ​ ​ )=bias(C out j ​ ​ )+ k=0 ∑ C in ​ −1 ​ weight(C out j ​ ​ ,k)⋆input(N i ​ ,k)

Parameters

in_channels (int) – Number of channels in the input image 输入图像通道数
out_channels (int) – Number of channels produced by the convolution 通过卷积之后输出通道数
kernel_size (int or tuple) – Size of the convolving kernel 卷积核的大小(3 means 3*3)
stride (int or tuple , optional) – Stride of the convolution. Default: 1 卷积核在卷积过程中移动的步进
padding (int , tuple or str , optional) – Padding added to all four sides of the input. Default: 0 在输入图像边缘填充
padding_mode (str , optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros' 填充的方式
dilation (int or tuple , optional) – Spacing between kernel elements. Default: 1 核元素之间的间距
groups (int , optional) – Number of blocked connections from input channels to output channels. Default: 1 从输入通道到输出通道的阻塞连接数（常设置成1）
bias (bool , optional) – If True, adds a learnable bias to the output. Default: True （常设置成1）

out_channels设置成 2 时，会生成 2 个卷积核分别卷积，如图：

示例

import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root="./dataset", train=False, download=True,
                                       transform=torchvision.transforms.ToTensor())

dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# 搭建简单的神经网络
class XiaoMo(nn.Module):
    def __init__(self):
        super(XiaoMo, self).__init__()

        # 定义卷积层
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=3, stride=1, padding=0)

    def forward(self, x):
        x = self.conv1(x)
        return x

xiaomo = XiaoMo()
print(xiaomo)
'''
XiaoMo(
  (conv1): Conv2d(3, 6, kernel_size=(3, 3), stride=(1, 1))
)
'''


writer = SummaryWriter("logs")
step = 0
for data in dataloader:
    imgs, targets = data
    output = xiaomo(imgs)
    print(imgs.shape)  # torch.Size([64, 3, 32, 32])
    print(output.shape)  # torch.Size([64, 6, 30, 30])  # 经过卷积后图像变小了(32->30)

    writer.add_images(tag="input", img_tensor=imgs, global_step=step)
    # 必须调整形状，因为 6 通道的图片无法显示
    # 此处用的方法相当于将 通道 中多出来的元素，放到 batch 里面去，相当于 batch_size 增加了
    # -1 代表此处的参数不知道为多少，函数会自动计算之
    output = torch.reshape(input=output, shape=[-1,3,30,30])
    writer.add_images(tag="output", img_tensor=output, global_step=step)
    step += 1

运行结果：

VGG16

推导过程：

最大池化层的使用

最大池化有时也叫做下采样

一般的池化操作

nn.MaxPool2d 下采样(最常用)
nn.MaxUnpool2d 上采样
nn.AvgPool2d 平均池化
nn.AdaptiveMaxPool2d 自适应最大池化

why maxpool？

最大池化的目的是保留输入的特征，但同时将所运算的数据量大大减小！

比如输入大小为5x5的图片，在经过池化后其大小可以变成3x3，对于整个网络而言进行计算的数据量就减少了，自然计算得就会更快

如在网上看视频，有1080p和720p，可以理解为两个池化前后的版本

MAXPOOL2D

CLASS torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)[SOURCE]

Parameters

kernel_size (Union [ int , Tuple [ int , int ]**] ) – the size of the window to take a max over 取最大值窗口，可以是单个整数也可以是双元组(H, W)
stride (Union [ int , Tuple [ int , int ]**] ) – the stride of the window. Default value is kernel_size 窗口的步进，默认值为kernel_size的大小（注意池化层和卷积层中该参数的默认值不同）
padding (Union [ int , Tuple [ int , int ]**] ) – Implicit negative infinity padding to be added on both sides
dilation (Union [ int , Tuple [ int , int ]**] ) – a parameter that controls the stride of elements in the window 空洞卷积
return_indices (bool) – if True, will return the max indices along with the outputs. Useful for torch.nn.MaxUnpool2d later （用的很少，一般不设置）
ceil_mode (bool) – when True, will use ceil instead of floor to compute the output shape 默认为Flase，设置为True 时，使用 ceil 模式计算输出形状，ceil=保留；floor=舍去

关于dilation的说明：dilation(扩张) 也叫空洞卷积

关于ceil_mode的说明：该情况下走到第2步时，这6个元素需不需要计算结果并保留呢？

设置为True时则需要保留计算结果

示例

import torch
import torchvision
from torch import nn
from torch.nn import MaxPool2d
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, download=True,
                                       transform=torchvision.transforms.ToTensor())

dataloader = DataLoader(dataset, batch_size=64, shuffle=True)

# 定义神经网络
class XiaoMo(nn.Module):
    def __init__(self):
        super(XiaoMo, self).__init__()
        self.maxpool1 = MaxPool2d(kernel_size=3,ceil_mode=True)
        # self.maxpool2 = MaxPool2d(kernel_size=3,ceil_mode=False)

    def forward(self, input):
        output1 = self.maxpool1(input)
        # output2 = self.maxpool2(input)
        return output1  # , output2

xiaomo = XiaoMo()

writer = SummaryWriter("logs")
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images(tag="input", img_tensor=imgs, global_step=step)
    output1 = xiaomo(imgs)
    writer.add_images(tag="output", img_tensor=output1, global_step=step)
    step += 1    

writer.close()

'''
# 注意此时input是2维矩阵，而池化操作需要4维:(N,C,Hin,Win)
# 注意此处的数据类型为浮点数
input = torch.tensor([[1,2,0,3,1],
                      [0,1,2,3,1],
                      [1,2,1,0,0],
                      [5,2,3,1,1],
                      [2,1,0,1,1]], dtype=torch.float32)

input = torch.reshape(input, (-1, 1, 5, 5))

print(input.shape)  # torch.Size([1, 1, 5, 5])

# 定义神经网络
class XiaoMo(nn.Module):
    def __init__(self):
        super(XiaoMo, self).__init__()
        self.maxpool1 = MaxPool2d(kernel_size=3,ceil_mode=True)
        self.maxpool2 = MaxPool2d(kernel_size=3,ceil_mode=False)

    def forward(self, input):
        output1 = self.maxpool1(input)
        output2 = self.maxpool2(input)
        return output1, output2

xiaomo = XiaoMo()
output1, output2 = xiaomo(input)
print(output1, output2)
# tensor([[[[2., 3.],
#           [5., 1.]]]]) tensor([[[[2.]]]])
'''

通过结果可以看到，池化的作用相当于打马赛克~

非线性激活

目的是在网络中引入非线性特征，提高模型的泛化能力

RELU

CLASS torch.nn.ReLU(inplace=False)

ReLU(x)=(x)^+=max(0,x)

Parameters

inplace - can optionally do the operation in-place.Default: False

是否需要将原变量替换为函数的计算结果，一般设置为False，以保留原始数据

SIGMOID

CLASS torch.nn.Sigmoid

Sigmiod(x)=\sigma(x)=\frac{1}{1+e^{-x}}

示例

import torch
import torchvision
from torch import nn
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter

# 注意激活函数的参数维度，此处还不能直接使用
# 需要reshape，添加batch_size参数
input = torch.tensor([[1, -0.5],
                       [-1, 3]])

input = torch.reshape(input, (-1, 1, 2, 2))
print(input.shape)  # torch.Size([1, 1, 2, 2])

dataset = torchvision.datasets.CIFAR10(root='./dataset', train=False, download=True, transform=torchvision.transforms.ToTensor())
dataloader = DataLoader(dataset, batch_size=64)

class XiaoMo(nn.Module):
    def __init__(self):
        super(XiaoMo, self).__init__()
        self.relu = nn.ReLU(inplace=False)
        self.sigmoid = nn.Sigmoid()

    def forward(self, input):
        output = self.sigmoid(input)
        return output

xiaomo = XiaoMo()
'''
output = xiaomo(input)
print(output)
'''
# tensor([[[[1., 0.],
#           [0., 3.]]]])
'''
'''

writer = SummaryWriter("logs")
step = 0
for data in dataloader:
    imgs, targets = data
    writer.add_images(tag="input", img_tensor=imgs, global_step=step)

    output = xiaomo(imgs)
    writer.add_images(tag="output", img_tensor=output, global_step=step)

    step += 1

writer.close()

线性层及其他层

Normalization Layers(正则化层)

eg: nn.BatchNorm2d

对输入采用正则化的方法，可以加快神经网络的训练速度，出自一篇论文： Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, device=None, dtype=None)

Parameters

num_features (int) – CC from an expected input of size (N, C, H, W)
eps (float) – a value added to the denominator for numerical stability. Default: 1e-5
momentum (float) – the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default: 0.1
affine (bool) – a boolean value that when set to True, this module has learnable affine parameters. Default: True
track_running_stats (bool) – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics. in both training and eval modes. Default: True

Shape:

Input: (N, C, H, W)
Output: (N, C, H, W) (same shape as input)

Recurrent Layers(递归层)

专门用来处理序列数据的层，这些层的关键特性是它们能够保留并利用前一时间步的信息，这使得它们非常适合处理如文本、语音、时间序列数据等顺序依赖性强的任务

nn.RNN
nn.LSTM
nn.GRU
...

Transformer Layers

一般用不到~

Transformer Layers 是 Transformer 模型中的基本构成单元，Transformer 模型是一种深度学习架构，最初是为了解决自然语言处理（NLP）中的任务而设计的，如机器翻译、文本生成等。这种模型由 Vaswani 等人在2017年的论文《Attention Is All You Need》中首次提出，并迅速成为了NLP领域的一种革命性技术。

Transformer层的关键特点：

注意力机制（Attention Mechanism）：

Transformer层的核心是注意力机制，尤其是自注意力（Self-Attention）机制，这允许模型根据任务的需要对输入数据的不同部分赋予不同的重要性。

它使模型能够在执行特定任务时专注于输入的相关部分。

层结构：

一个典型的Transformer层包括多头自注意力机制（Multi-Head Self-Attention）和逐位置的前馈网络（Position-wise Feed-Forward Network）。

归一化（Normalization）和残差连接（Residual Connection）也是这些层的关键组成部分，有助于稳定训练过程并加快训练速度。

无递归或卷积：

与递归神经网络（RNN）或卷积神经网络（CNN）不同，Transformer不使用递归或卷积。它们同时处理整个输入数据，使得模型能够高效地进行并行计算。

位置编码（Positional Encoding）：

由于Transformer本身并不处理顺序数据，因此它们包含位置编码来维护输入数据的序列顺序。

应用领域：

语言建模与翻译： Transformer在语言建模和机器翻译方面设定了新的标准。

文本生成： 在GPT（生成预训练Transformer）等模型中用于高质量文本生成。

其他NLP任务： 如文本摘要、情感分析和问答系统。

优势：

并行化： 能够并行处理数据，计算效率高。

可扩展性： Transformer模型，特别是大型模型如GPT和BERT系列，在数据和参数增加时显示出了良好的可扩展性和性能提升。

灵活性： 可以适应广泛的NLP任务，并且已经扩展到其他领域，如图像处理（视觉Transformer）。

Linear Layers(often used)

Linear Layers	explanation
nn.Identity	A placeholder identity operator that is argument-insensitive.
nn.Linear	Applies a linear transformation to the incoming data: y = xA^T + b
nn.Bilinear	Applies a bilinear transformation to the incoming data: y = x_1^T A x_2 + b
nn.LazyLinear	A `torch.nn.Linear` module where in_features is inferred.

CLASS torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)

Parameters

in_features (int) – size of each input sample
out_features (int) – size of each output sample
bias (bool) – If set to False, the layer will not learn an additive bias. Default: True

上图中，in_features=d, out_features=L

示例

import torch
import torchvision
from torch import nn
from torch.nn import Linear
from torch.utils.data import DataLoader

dataset = torchvision.datasets.CIFAR10(root="./dataset", train=True, download=True,
                                       transform=torchvision.transforms.ToTensor())

dataloader = DataLoader(dataset, batch_size=64)

class XiaoMo(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = Linear(in_features=196608, out_features=10)

    def forward(self, input):
        output = self.linear1(input)
        return output

xiaomo = XiaoMo()

for imgs, targets in dataloader:
    print(imgs.shape)  # torch.Size([64, 3, 32, 32])

    # 将每个batch的图片展开成一维
    # 相当于torch.flatten()
    # output = torch.reshape(imgs, (1, 1, 1, -1))
    output = torch.flatten(imgs)  # torch.Size([196608])
    print(output.shape)  # torch.Size([1, 1, 1, 196608])

    output = xiaomo(output)
    print(output.shape)  # torch.Size([1, 1, 1, 10])  # torch.Size([10])

Dropout Layers

目的是防止过拟合

nn.Dropout 在训练过程中以 p (使用伯努利分布的样本)的概率将输入数据的某些元素赋值为0
nn.Dropout2d 随机将某些channels全部赋值为0

Sparse Layers

nn.Embedding 多用于自然语言处理当中
nn.EmbeddingBag

09-神经网络的层

神经网络的层

卷积层

CONV2D

示例

VGG16

最大池化层的使用

一般的池化操作

why maxpool？

MAXPOOL2D

示例

非线性激活

RELU

SIGMOID

示例

线性层及其他层

Normalization Layers(正则化层)

Recurrent Layers(递归层)

Transformer Layers

Transformer层的关键特点：

应用领域：

优势：

Linear Layers(often used)

示例

Dropout Layers

Sparse Layers