CV with PyTorch 05 - Solving vision problems with MobileNet

330 阅读11分钟

使用MobileNet解决视觉问题

import torch
import torch.nn as nn
import torchvision
import matplotlib.pyplot as plt
from torchinfo import summary
import os

from pytorchcv import train, display_dataset, train_long, load_cats_dogs_dataset, validate, common_transform
dataset, train_loader, test_loader = load_cats_dogs_dataset()
F:\anaconda3\lib\site-packages\PIL\TiffImagePlugin.py:822: UserWarning: Truncated File Read
  warnings.warn(str(msg))

一、MobileNet

在上一单元中,我们已经看到了用于图像分类的 ResNet 架构。 更轻量级的 ResNet 模拟是 MobileNet,它使用所谓的 Inverted Residual Blocks。 让我们加载预先训练好的移动网络,看看它是如何工作的:

model = torch.hub.load('pytorch/vision:v0.6.0', 'mobilenet_v2', pretrained=True)
model.eval()
print(model)
Downloading: "https://github.com/pytorch/vision/archive/v0.6.0.zip" to C:\Users\yzx20/.cache\torch\hub\v0.6.0.zip
Downloading: "https://download.pytorch.org/models/mobilenet_v2-b0353104.pth" to C:\Users\yzx20/.cache\torch\hub\checkpoints\mobilenet_v2-b0353104.pth



  0%|          | 0.00/13.6M [00:00<?, ?B/s]


MobileNetV2(
  (features): Sequential(
    (0): ConvNormActivation(
      (0): Conv2d(3, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
    (1): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
          (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): Conv2d(32, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (2): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(16, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=96, bias=False)
          (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(96, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (3): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(144, 24, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(24, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (4): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(24, 144, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(144, 144, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=144, bias=False)
          (1): BatchNorm2d(144, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (5): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (6): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (7): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(32, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(192, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=192, bias=False)
          (1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(192, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (8): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (9): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (10): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (11): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=384, bias=False)
          (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (12): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (13): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (14): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(96, 576, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(576, 576, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=576, bias=False)
          (1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(576, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (15): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (16): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 160, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (17): InvertedResidual(
      (conv): Sequential(
        (0): ConvNormActivation(
          (0): Conv2d(160, 960, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (1): ConvNormActivation(
          (0): Conv2d(960, 960, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=960, bias=False)
          (1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU6(inplace=True)
        )
        (2): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (3): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (18): ConvNormActivation(
      (0): Conv2d(320, 1280, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (1): BatchNorm2d(1280, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU6(inplace=True)
    )
  )
  (classifier): Sequential(
    (0): Dropout(p=0.2, inplace=False)
    (1): Linear(in_features=1280, out_features=1000, bias=True)
  )
)

接下来我们将其运用到我们的数据集并确保能够生效

sample_image = dataset[0][0].unsqueeze(0)
res = model(sample_image)
print(res[0].argmax())
tensor(281)

结果(281)就是ImageNet类号,我们在上一个单元中讲过。

  • 注意: MobileNet 和全尺寸 ResNet 模型中的参数数量存在显着差异。 在某些方面,MobileNet 比 VGG 模型系列更紧凑,但准确度较低。 然而,参数数量的减少自然会导致模型精度有所下降。

二、使用MobileNet进行迁移学习

现在让我们执行与上一单元相同的迁移学习过程,但使用的是 MobileNet。 首先,让我们冻结模型的所有参数:

for x in model.parameters():
    x.requires_grad = False

然后,替换最终的分类器。 我们还将模型转移到我们的默认训练设备(GPU 或 CPU):

device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.classifier = nn.Linear(1280,2)
model = model.to(device)
summary(model,input_size=(1,3,244,244))
===============================================================================================
Layer (type:depth-idx)                        Output Shape              Param #
===============================================================================================
MobileNetV2                                   --                        --
├─Sequential: 1-1                             [1, 1280, 8, 8]           --
│    └─ConvNormActivation: 2-1                [1, 32, 122, 122]         --
│    │    └─Conv2d: 3-1                       [1, 32, 122, 122]         (864)
│    │    └─BatchNorm2d: 3-2                  [1, 32, 122, 122]         (64)
│    │    └─ReLU6: 3-3                        [1, 32, 122, 122]         --
│    └─InvertedResidual: 2-2                  [1, 16, 122, 122]         --
│    │    └─Sequential: 3-4                   [1, 16, 122, 122]         (896)
│    └─InvertedResidual: 2-3                  [1, 24, 61, 61]           --
│    │    └─Sequential: 3-5                   [1, 24, 61, 61]           (5,136)
│    └─InvertedResidual: 2-4                  [1, 24, 61, 61]           --
│    │    └─Sequential: 3-6                   [1, 24, 61, 61]           (8,832)
│    └─InvertedResidual: 2-5                  [1, 32, 31, 31]           --
│    │    └─Sequential: 3-7                   [1, 32, 31, 31]           (10,000)
│    └─InvertedResidual: 2-6                  [1, 32, 31, 31]           --
│    │    └─Sequential: 3-8                   [1, 32, 31, 31]           (14,848)
│    └─InvertedResidual: 2-7                  [1, 32, 31, 31]           --
│    │    └─Sequential: 3-9                   [1, 32, 31, 31]           (14,848)
│    └─InvertedResidual: 2-8                  [1, 64, 16, 16]           --
│    │    └─Sequential: 3-10                  [1, 64, 16, 16]           (21,056)
│    └─InvertedResidual: 2-9                  [1, 64, 16, 16]           --
│    │    └─Sequential: 3-11                  [1, 64, 16, 16]           (54,272)
│    └─InvertedResidual: 2-10                 [1, 64, 16, 16]           --
│    │    └─Sequential: 3-12                  [1, 64, 16, 16]           (54,272)
│    └─InvertedResidual: 2-11                 [1, 64, 16, 16]           --
│    │    └─Sequential: 3-13                  [1, 64, 16, 16]           (54,272)
│    └─InvertedResidual: 2-12                 [1, 96, 16, 16]           --
│    │    └─Sequential: 3-14                  [1, 96, 16, 16]           (66,624)
│    └─InvertedResidual: 2-13                 [1, 96, 16, 16]           --
│    │    └─Sequential: 3-15                  [1, 96, 16, 16]           (118,272)
│    └─InvertedResidual: 2-14                 [1, 96, 16, 16]           --
│    │    └─Sequential: 3-16                  [1, 96, 16, 16]           (118,272)
│    └─InvertedResidual: 2-15                 [1, 160, 8, 8]            --
│    │    └─Sequential: 3-17                  [1, 160, 8, 8]            (155,264)
│    └─InvertedResidual: 2-16                 [1, 160, 8, 8]            --
│    │    └─Sequential: 3-18                  [1, 160, 8, 8]            (320,000)
│    └─InvertedResidual: 2-17                 [1, 160, 8, 8]            --
│    │    └─Sequential: 3-19                  [1, 160, 8, 8]            (320,000)
│    └─InvertedResidual: 2-18                 [1, 320, 8, 8]            --
│    │    └─Sequential: 3-20                  [1, 320, 8, 8]            (473,920)
│    └─ConvNormActivation: 2-19               [1, 1280, 8, 8]           --
│    │    └─Conv2d: 3-21                      [1, 1280, 8, 8]           (409,600)
│    │    └─BatchNorm2d: 3-22                 [1, 1280, 8, 8]           (2,560)
│    │    └─ReLU6: 3-23                       [1, 1280, 8, 8]           --
├─Linear: 1-2                                 [1, 2]                    2,562
===============================================================================================
Total params: 2,226,434
Trainable params: 2,562
Non-trainable params: 2,223,872
Total mult-adds (M): 378.33
===============================================================================================
Input size (MB): 0.71
Forward/backward pass size (MB): 130.67
Params size (MB): 8.91
Estimated Total Size (MB): 140.29
===============================================================================================

下面进行实际训练:

train_long(model,train_loader,test_loader,loss_fn=torch.nn.CrossEntropyLoss(),epochs=1,print_freq=90)
Epoch 0, minibatch 0: train acc = 0.40625, train loss = 0.023406166583299637
Epoch 0, minibatch 90: train acc = 0.9261675824175825, train loss = 0.00612303057869712
Epoch 0, minibatch 180: train acc = 0.9431975138121547, train loss = 0.004874951931653102
Epoch 0, minibatch 270: train acc = 0.9440728782287823, train loss = 0.005060039763081118
Epoch 0, minibatch 360: train acc = 0.9464162049861495, train loss = 0.00536170263369658
Epoch 0, minibatch 450: train acc = 0.9479628603104213, train loss = 0.005560285500570835
Epoch 0, minibatch 540: train acc = 0.9488216266173752, train loss = 0.005871331184937201
Epoch 0 done, validation acc = 0.93535, validation loss = 0.012154721832275391

三、提示

请注意,MobileNet 的准确度与 VGG-16 几乎相同,但略低于全尺寸 ResNet。 小型模型(例如 MobileNet 或 ResNet-18)的主要优点是它们可以在移动设备上使用。 1. pytorch.org/mobile/andr… 这是在 Android 设备上使用 ResNet-18 的官方示例, 2. heartbeat.fritz.ai/pytorch-mob… 这里是使用 MobileNet 的类似示例。

四、总结

  1. 在此模块中,你已了解卷积神经网络如何工作以及它们如何捕获二维图像中的模式。 事实上,CNN 还可用于发现一维信号(如声波或时序)和多维结构(例如视频中的事件,其中某些模式在一些帧中重复)中的模式。

  2. 此外,CNN 是用于解决更复杂的计算机视觉任务(如映像生成)的简单构建基块。 生成对抗性网络可用于生成给定数据集中类似的映像,例如,可用于生成计算机生成的图画。 同样,CNN 用于对象检测、实例分段等。 我们还用单独的一节课程了解了如何实现神经网络来解决这些问题,我们建议你继续学习掌握计算机视觉!