02-PyTorch学习笔记

117 阅读5分钟

PyTorch学习

  • installing PyTorch and configuring the related environment

    Pytorch安装教程(最全最详细版)-CSDN博客

    PyTorch安装最全流程_pycharm添加新的work torch-CSDN博客

    参考以上两个博客即可,注意本机的cuda驱动版本!

  • what is PyTorch?

    • an ML learning framework in python

    • 2 main features:

      • N-dimensional Tensor(张量) computation(like numpy) on GPUs
      • automatic differentiation for training deep neural networks
  • training & testing neural networks in PyTorch

    step1: data import

    • torch.utils.data.Dataset  # stores data samples and expected values
      torch.utils.data.DataLoader  # groups data in batches,enables multiprocessingdataset = MyDataset(file)
      # shuffle:洗牌
      # traing:True;Testing:False
      dataloader = DataLoader(dataset,batch_size,shuffle=True)
      
    • custom dataset:

    02-1.png

    02-2.png

    • tensors(high-dimensional matrices/arrays)(一种数据结构):

    1-D tensor; 2-D tensor; 3-D tensor

    dim in Pytorch == axis in NumPy

    02-3.png

    • how to creating tensors:

    02-4.png

    • common operations for tensors:

      • numerical operation:

        addition summation subtraction mean power

        z = x + y
        y = x.sum()
        z = x - y
        y = x.mean()
        y = x.pow(2)
        
      • matrix operation:

        transpose squeeze(压缩维度) unsqueeze cat

        x = torch.zeros([2,1,3])
        x = x.transpose(0,1)
        x.shape  # torch.Size([1, 2, 3])x = x.squeeze(0)  # remove the specified dimension with length = 1
        x.shape()  # torch.Size([2,3])x = x.unsqueeze(1)
        x.shape()  # torch.Size([2,1,3])
        

        02-5.png

        02-6.png

        x = torch.zeros([2,1,3])
        y = torch.zeros([2,2,3])
        z = torch.zeros([2,3,3])
        ​
        w = torch.cat([x,y,z],dim=1)  # concatenate multiple tensors
        w.shape  # torch.Size([2, 6, 3])
        
    • data type(using different data types for model and data will cause errors)

      data typedtypetensor
      32-bit floating pointtorch.floattorch.FloatTensor
      62-bit integer(signed)torch.longtorch.LongTensor

      Errors may occur when the model and data use different data types.

    • pytorch v.s. numpy

      • similar attributes

        pytorchnumpy
        x.shapex.shape
        x.dtypex.dtype
        x.reshape()/x.view()x.reshape()
        x.squeeze()x.squeeze()
        x.unsqueeze(1)np.expend_dims(x,1)
    • tensors & modules will be computed with CPU by default

    • tensors - gradient calculation

      x = torch.tensor(np.array([[1.,0.],[-1.,1.]]),requires_grad=True)
      z = x.pow(2).sum()
      z.backward()  # 注意 z 对 x 的梯度存储在 x.grad 中'''
      tensor([[ 2.,  0.],
              [-2.,  2.]], dtype=torch.float64)
      '''
      x.grad
      

      02-7.png

    step2: how to do training

    torch.nn.Module

    • define neural network
    • loss function
    • optimization algorithm

    network layers

    • linear layer(fully-connected layer)

      nn.Linear(in_features,out_features)
      

      02-8.png

      y=b+Wx\vec y=\vec b+W\vec x

      02-9.png

    network parameters

    layer = torch.nn.Linear(32,64)
    layer.weight.shape  # torch.Size([64,32])
    
    layer.bias.shape  # torch.Size([64])
    

    non-linear activation functions

    • sigmoid activation

      nn.Sigmoid()
      
    • ReLU activation

      nn.ReLU()
      
    • build your own neural network

      import torch.nn as nn
      ​
      class MyModel(nn.Module):
          # initialize your model & define layers
          def __init__(self):
              super(MyModel,self).__init__()
              self.net = nn. Sequential(
                  nn.Linear(10,32),  # 1st layer: 输入为 last dim = 10 的向量
                  nn.Sigmoid(),  # 2ed layer
                  nn.Linear(32,1)  # 3rd layer: 输出为 last dim = 1 的向量
              )
              
          # compute output of your NN
          # 规定模型对输入向量的运算规则并输出
          def forward(self,x):
              return self.net(x)
      

      等价于:

      import torch.nn as nn
      ​
      class MyModel(nn.Module):
          def __init__(self):
              super(MyModel,self).init()
              self.layer1 = nn.Linear(10,32)
              self.layer2 = nn.sigmoid()
              self.layer3 = nn.Linear(32,1)
          
          def forward(self,x):
              out = layer1(x)
              out = self.layer2(out)
              out = self.layer3(out)
              return out
      

    step3: test neural network in pytorch

    torch.nn.MSELoss
    torch.nn.CrossEntropyLoss
    # etc.
    
    • Mean Squared Error(for regression tasks)

      criterion = nn.MSELoss()
      
    • Cross Entropy(for classification tasks)

      criterion = nn.CrossEntropyLoss()
      
    • loss = criterion(model_output,expected_value)
      

    step4: select optimization algorithm

    torch.optim
    
    • gradient-based optimization algorithms that adjust network parameters to reduce error

    • e.g. Stochastic Gradient Descent(SGD)

      torch.optim.SGD(model.parameters(),lr,momentum=0)
      

      for every batch of data:

      • call optimizer.zero_grad() to reset gradients of model parameters

        将之前计算的梯度归零

      • call loss.backward() to back propagate(反向传播) gradients of prediction loss

        根据每一层的结果计算梯度

      • call optimizer.step() to adjust model parameters

        根据上一步算得的梯度来调整模型参数

    step5: entire procedure

    • training setup

      dataset = MyDataset(file)  # read data via MyDataset
      tr_set = DataLoader(dataset,16,shuffle=True)  # put dataset into Dataloader
      model = MyModel().to(device)  # construct model and move to device(cpu/cuda)
      criterion = nn.MSELose()  # set loss function
      optimizer = torch.optim.SGD(model.parameters(),0.1)  # set optimizer
      
    • training loop

      for epoch in range(n_epochs):  # iterate n_epochs
          model.train()  # set model to train mode
          
          # 处理dataloader中每个batch的数据
          # input: x; pred: y
          for x,y in tr_set:  # iterate through the dataloader
              optimizer.zero_grad()  # set gradient to zero
              x,y = x.to(device),y.to(device)  # move data to device(cpu/cuda)
              pred = model(x)  # forward pass (compute output)
              loss = criterion(pred,y)  # compute loss
              loss.backward()  # compute gradient(backpropagation)
              optimizer.step()  # update model with optimizer
      
    • validation loop

      model.eval()  # set model to evaluation mode !
      total_loss = 0
      ​
      for x,y in dv_set:  # iterate through the dataloader
          x,y = x.to(device),y.to(device)  # move data to device(cpu/cuda)
          
          with torch.no_grad():  # disable gradient calculation !
              pred = model(x)  # forward pass (compute output)
              loss = criterion(pred,y)  # compute loss
          
          total_loss += loss.cpu().item()*len(x)  # accumulate loss
          avg_loss = total_loss / len(dv_set.dataset)  # compute averaged loss
      

      关于为什么要在遍历数据集的过程中取消计算梯度的功能: (got answer from chat gpt3.5)

      • 减少计算开销: 计算梯度是在训练过程中需要的,因为在训练时需要反向传播误差并更新模型参数。然而,在验证模型时,我们只关心模型的预测结果和性能评估,而不需要计算梯度。取消计算梯度可以显著减少计算的开销,提高验证速度。
      • 防止不必要的内存占用: 计算梯度需要存储中间结果,这可能占用大量内存,特别是在大型神经网络和大规模数据集上。在验证过程中,取消计算梯度可以避免不必要的内存占用,使内存资源得以释放。
      • 模型参数保持不变: 验证模型的过程中,我们希望模型的参数保持不变,以便对其性能进行评估。如果不取消计算梯度,PyTorch或其他深度学习框架可能会默认计算梯度并更新参数,这可能导致模型参数在验证期间被错误地修改。
      • 通常,在验证循环中,你可以通过在PyTorch中使用torch.no_grad()上下文管理器或将模型的requires_grad属性设置为False来取消计算梯度。这样可以确保在验证期间不会计算梯度,同时仍然可以使用模型进行前向传播和性能评估。

      总之,在验证模型准确性时取消计算梯度可以提高效率并确保模型参数的稳定性。这对于评估模型在新数据上的性能非常重要。

    • testing loop

      model.eval()  # set model to evaluation mode
      preds = []
      ​
      for x in tt_set:  # iterate through the dataloader
          x = x.to(device)  # moce data to device (cpu/cude)
          
          with torch.no_grad():  # disable gradient calculation
              pred = model(x)  # forward pass (compute output)
              preds.append(pred.cpu())  # collect prediction
      
    • notice: model.eval(), torch.no_grad()

      • model.eval()

        changes behaviour of some model layers, such as dropout and batch normalization

        模型中可能有些 layers 在训练和测试的过程中所做的事情是不一样的!

      • with torch.no_grad():

        • prevents calculations from being added into gradient computation graph
        • usually used to prevent accidental training on validation/testing data
    • save/load trained models

      • save:

        torch.save(model.state_dict(),path)
        
      • load:

        ckpt = torch.load(path)
        model.load_state_dict(ckpt)
        
  • PyTorch深度学习快速入门教程(绝对通俗易懂!)【小土堆】哔哩哔哩bilibili

    最后安利一个我本人认为不错的教学视频~