11. 语义分割基本模型

315 阅读12分钟

1. 语义分割基础

双线性插值

双线性插值是上采样的一种方式。

单线性插值

一条线上,根据 (x1,y1)(x_1,y_1)(x2,y2)(x_2,y_2),可以根据公式,由 x3x_3 计算到 y3y_3

y3y2x3x2=y1y2x1x2\frac{y_3-y_2}{x_3-x_2}=\frac{y_1-y_2}{x_1-x_2}

xx指代为位置,yy指代为像素值。

双线性插值

20191026141633147.png

注:xxyy都是坐标,横纵看都可以,作用是一样的。

Q12Q_{12} 已知位置 x1x_1 和 像素值 Q12Q_{12}Q22Q_{22} 已知位置 x2x_2 和 像素值 Q22Q_{22}; 根据 单线性插值,已知R2R_2 的位置 xx ,可以求得 R2R_2 的像素:

x1xQ12R2=x1x2Q12Q22\frac{x_1-x}{Q_{12}-R_2}=\frac{x_1-x_2}{Q_{12}-Q_{22}}

同理可求 R1R_1的像素值。

有了R1R_1R2R_2 ,可求像素值 PP 。这就是双线性插值。

转置卷积

转置卷积是上采样的一种方式,转置卷积也叫做反卷积。

转置卷积的由来

转置卷积考虑的只是输入矩阵和输出矩阵形状的可逆性。

普通卷积:代码中通常采取最后那种等价方式,然后再 reshape 成2×2。

QQ截图20230731184903.png

转置卷积:

微信截图_20230731190001.png

已知输入特征图和转置卷积的参数,如何求经转置卷积后的输出特征图尺寸

方法步骤:

  1. 在输入特征图周围填充行/列数:K-P-1 行/列。
  2. 在输入特征图像素间填充行/列数:S-1 行/列。
  3. 将卷积核参数上下、左右翻转。
  4. 做正常卷积运算时,由1和2得到:

I=(KP1)2+I+(S1)(I1)I'=(K-P-1)*2+I+(S-1)*(I-1)

P=0P'=0KKS=1S'=1

那么正常卷积运算公式为: O=IK+2PS+1O=\frac{I'-K+2P'}{S'}+1

例:输入I=3,S=2,P=1,K=3。计算转置卷积的输出feature map尺寸:

  1. 输入特征图周围填充 1 行/列,数值为 0;

  2. 输入特征图像素间填充 1 行/列,数值为 0;

  3. 卷积核参数上下、左右翻转;

  4. 正常卷积运行得到输出 feature map 尺寸:O=731+1=5O=\frac{7-3}{1}+1=5

图片.png

已知输入特征图,为了得到特定尺寸的输出特征图,如何安排转置卷积参数

方法:逆过来运算,将输出feature map当作输入feature map,考虑kernel_size、s、p的大小。

例如:为了对35×35尺寸的特征图进行转置卷积得到71×71尺寸的特征图。将输入看作输出,输出看做输入。

输入:71×71尺寸的特征图,输入:35×35尺寸的特征图。

根据公式计算一下是:k=3,p=1,s=2

转置卷积参数列表

看一下Pytorch中转置卷积nn.ConvTranspose2d()的主要参数:

in_channels: int,
out_channels: int,
kernel_size: _size_2_t,
stride: _size_2_t = 1,
padding: _size_2_t = 0,
output_padding: _size_2_t = 0,

膨胀卷积

转置卷积也叫做空洞卷积。

膨胀卷积的优点

  • 保持参数个数不变的情况下增大了卷积核的感受野,让每个卷积输出都包含较大范围的信息。
  • 保证输出的特征映射(feature map)的大小保持不变。

膨胀卷积计算

膨胀后的卷积核大小 = dilation_ratekernel_size1+1dilation\_rate*(kernel\_size - 1)+1

输出 feature map大小按照卷积公式无差别。

例:输入feature map=7×7,dilation_rate=2,kernel_size=3,s=1,p=0。计算输出feature map尺寸。

图片.png

卷积核大小 = 2*(3-1) + 1 = 5

输出 feature map = (7-5)/1+1 = 3

2. FCN

全卷积网络

VGG

VGG-16的组成:(卷积+池化)+ 全连接(4096+4096+1000)+ softmax

这个图下面是VGG16的卷积操作,上面是经过对应操作的 output feature map。

图片.png

图2-1 VGG16结构

FCN结构

FCN主要分为两个部分:特征提取网络和预测网络。

  1. 特征提取网络:输入 VGG16到全连接的前一层 后,输出的 feature map相比输入下降了32倍;再通过VGG16最后2个全连接层改造成的2个卷积层;通过 num_cls1×1 的卷积得到 num_cls 通道的特征图。

  2. 预测网络:转置卷积进行上采样32倍得到原图大小、num_cls 通道的输出特征图,再进行 softmax 处理,得到每个像素在每个类别上的概率。

以图2-2 FCN-32d为例,顺一遍FCN-32d的过程为:

  1. 卷积:
  • 图片 224×224×3 通过 VGG16到全连接的前一层 后,输出的 feature map相比输入下降了32倍,输出为 7×7×512

  • 通过将VGG16最后2个全连接层改造成的2个卷积层(分别是 7×7×512, p=3, s=1, 4096个 卷积和 1×1×4096, p=0, s=1, 4096个 卷积),输出为7×7×4096

  • 通过最后一个 1×1×4096, p=0, s=1, num_cls个 卷积,输出为7×7×num_cls

  1. 转置卷积:通过1个 64×64×num_cls, p=16, s=32, num_cls个 转置卷积进行上采样32倍得到原图大小的输出 224×224×num_cls。注:该转置卷积通过双线性插值进行初始化。

  2. 预测:将 224×224×num_cls 的feature map 进行softmax处理,得到每个像素在每个类别上的概率。

图片.png

图2-2 FCN-32d的结构

图片.png

图2-3 FCN中将VGG中的最后两层全连接层改为卷积层

FCN代码

pre_model = models.vgg16_bn(weights=True)

class FCN32(nn.Module):
    def __init__(self, num_classes = 1):
        super(FCN32, self).__init__()

        self.stage1 = pre_model.features[:7]
        self.stage2 = pre_model.features[7:14]
        self.stage3 = pre_model.features[14:24]
        self.stage4 = pre_model.features[24:34]
        self.stage5 = pre_model.features[34:]


        self.stage6 = nn.Sequential(
                    nn.Conv2d(512, 4096, kernel_size=(7,7), stride=1, padding=3),
                    nn.ReLU(inplace=True),
                    nn.Dropout())
        self.stage7 = nn.Sequential(
                    nn.Conv2d(4096, 4096, kernel_size=(1,1), stride=1),
                    nn.ReLU(inplace=True),
                    nn.Dropout())

        self.conv1 = nn.Sequential(
                    nn.Conv2d(4096, num_classes, kernel_size=(1,1), stride=1))

        self.invers_conv1 = nn.ConvTranspose2d(num_classes,num_classes,kernel_size=(64,64), stride=32, padding=16)


    def forward(self, x):
        x_size = x.size()
        out = self.stage1(x)
        out = self.stage2(out)
        out = self.stage3(out)
        out = self.stage4(out)
        out = self.stage5(out)


        out = self.stage6(out)
        out = self.stage7(out)
        out = self.conv1(out)

        out = self.invers_conv1(out)
        return out 

3. Unet

Unet结构

Unet主要分为三个部分:主干特征提取网络、加强特征提取网络、预测网络。

  1. 主干特征提取网络-Encoder:输入进行 2次3×3 卷积后,下采样 4次,每次尺寸缩小1倍,下采样后都会进行 2次3×3 卷积通道扩大1倍

  2. 加强特征提取网络-Decoder:将主干部分提取的最后一个特征层进行上采样,上采样 4次,每次尺寸扩大1倍通道缩小1倍,上采样后与初步的有效特征层进行特征融合(concat拼接),然后进行 2次3×3 卷积通道缩小1倍

  3. 预测网络:通过 num_cls1×1 的卷积,然后再经过 softmax/sigmoid得到每个像素在每个类别上的概率。

注:Unet的输入图像最好是可以被2的4次方整除,这样就不会造成 encoder和 decoder 不匹配的情况发生。若输入图像不能被2的4次方整除,则resize。

图片.png

图3-1 Unet结构

Unet代码

模型构建

if args.arch == 'UNet':
    model = Unet(3, 1).to(device)

Unet结构:二元分割代码案例

import torch.nn as nn
import torch
from torch import autograd
from functools import partial
import torch.nn.functional as F
from torchvision import models

class DoubleConv(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(DoubleConv, self).__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_ch, out_ch, 3, padding=1),
            nn.BatchNorm2d(out_ch),
            nn.ReLU(inplace=True)
        )

    def forward(self, input):
        return self.conv(input)


class Unet(nn.Module):
    def __init__(self, in_ch, out_ch):
        super(Unet, self).__init__()
        
        # 主干特征提取网络
        self.conv1 = DoubleConv(in_ch, 32)
        self.pool1 = nn.MaxPool2d(2)
        self.conv2 = DoubleConv(32, 64)
        self.pool2 = nn.MaxPool2d(2)
        self.conv3 = DoubleConv(64, 128)
        self.pool3 = nn.MaxPool2d(2)
        self.conv4 = DoubleConv(128, 256)
        self.pool4 = nn.MaxPool2d(2)
        self.conv5 = DoubleConv(256, 512)
        
        # 加强特征提取网络
        self.up6 = nn.ConvTranspose2d(512, 256, 2, stride=2)# 上采样采用转置卷积
        self.conv6 = DoubleConv(512, 256)
        self.up7 = nn.ConvTranspose2d(256, 128, 2, stride=2)
        self.conv7 = DoubleConv(256, 128)
        self.up8 = nn.ConvTranspose2d(128, 64, 2, stride=2)
        self.conv8 = DoubleConv(128, 64)
        self.up9 = nn.ConvTranspose2d(64, 32, 2, stride=2)
        self.conv9 = DoubleConv(64, 32)
        self.conv10 = nn.Conv2d(32, out_ch, 1)

    def forward(self, x):
        # 主干特征提取网络
        #print(x.shape)
        c1 = self.conv1(x)
        p1 = self.pool1(c1)
        #print(p1.shape)
        c2 = self.conv2(p1)
        p2 = self.pool2(c2)
        #print(p2.shape)
        c3 = self.conv3(p2)
        p3 = self.pool3(c3)
        #print(p3.shape)
        c4 = self.conv4(p3)
        p4 = self.pool4(c4)
        #print(p4.shape)
        c5 = self.conv5(p4)
        
        # 加强特征提取网络
        up_6 = self.up6(c5)
        merge6 = torch.cat([up_6, c4], dim=1)
        c6 = self.conv6(merge6)
        up_7 = self.up7(c6)
        merge7 = torch.cat([up_7, c3], dim=1)
        c7 = self.conv7(merge7)
        up_8 = self.up8(c7)
        merge8 = torch.cat([up_8, c2], dim=1)
        c8 = self.conv8(merge8)
        up_9 = self.up9(c8)
        merge9 = torch.cat([up_9, c1], dim=1)
        c9 = self.conv9(merge9)
        c10 = self.conv10(c9)
        out = nn.Sigmoid()(c10)
        return out

4. DeepLabv3+

DeepLabv3+结构

图片.png

图4-1 DeepLabv3+结构

DeepLabv3+主要分为三个部分:Encoder、Decoder、预测部分。

  1. Encoder:特征提取,包括backbone和ASPP结构。
  • backbone:可以是MobileNetv3。将深层次特征给ASPP,将浅层次特征给Decoder。

  • ASPP:使用不同膨胀率的膨胀卷积可以获得不同大小感受野的情况。

  1. Decoder:深层特征与浅层特征的融合。将Encoder的深层次特征上采样与Decoder的浅层次特征经过1×1卷积后特征融合(拼接-concat)。

  2. 预测部分:经过 softmax/sigmoid 得到每个像素在每个类别上的概率。

DeepLabv3+代码

import torch
import torch.nn as nn
import torch.nn.functional as F
from nets.xception import xception
from nets.mobilenetv2 import mobilenetv2

class MobileNetV2(nn.Module):
    def __init__(self, downsample_factor=8, pretrained=True):
        super(MobileNetV2, self).__init__()
        from functools import partial
        
        model           = mobilenetv2(pretrained)
        self.features   = model.features[:-1]

        self.total_idx  = len(self.features)
        self.down_idx   = [2, 4, 7, 14]

        if downsample_factor == 8:
            for i in range(self.down_idx[-2], self.down_idx[-1]):
                self.features[i].apply(
                    partial(self._nostride_dilate, dilate=2)
                )
            for i in range(self.down_idx[-1], self.total_idx):
                self.features[i].apply(
                    partial(self._nostride_dilate, dilate=4)
                )
        elif downsample_factor == 16:
            for i in range(self.down_idx[-1], self.total_idx):
                self.features[i].apply(
                    partial(self._nostride_dilate, dilate=2)
                )
        
    def _nostride_dilate(self, m, dilate):
        classname = m.__class__.__name__
        if classname.find('Conv') != -1:
            if m.stride == (2, 2):
                m.stride = (1, 1)
                if m.kernel_size == (3, 3):
                    m.dilation = (dilate//2, dilate//2)
                    m.padding = (dilate//2, dilate//2)
            else:
                if m.kernel_size == (3, 3):
                    m.dilation = (dilate, dilate)
                    m.padding = (dilate, dilate)

    def forward(self, x):
        low_level_features = self.features[:4](x)
        x = self.features[4:](low_level_features)
        return low_level_features, x 


#-----------------------------------------#
#   ASPP特征提取模块
#   利用不同膨胀率的膨胀卷积进行特征提取
#-----------------------------------------#
class ASPP(nn.Module):
	def __init__(self, dim_in, dim_out, rate=1, bn_mom=0.1):
		super(ASPP, self).__init__()
		self.branch1 = nn.Sequential(
				nn.Conv2d(dim_in, dim_out, 1, 1, padding=0, dilation=rate,bias=True),
				nn.BatchNorm2d(dim_out, momentum=bn_mom),
				nn.ReLU(inplace=True),
		)
		self.branch2 = nn.Sequential(
				nn.Conv2d(dim_in, dim_out, 3, 1, padding=6*rate, dilation=6*rate, bias=True),
				nn.BatchNorm2d(dim_out, momentum=bn_mom),
				nn.ReLU(inplace=True),	
		)
		self.branch3 = nn.Sequential(
				nn.Conv2d(dim_in, dim_out, 3, 1, padding=12*rate, dilation=12*rate, bias=True),
				nn.BatchNorm2d(dim_out, momentum=bn_mom),
				nn.ReLU(inplace=True),	
		)
		self.branch4 = nn.Sequential(
				nn.Conv2d(dim_in, dim_out, 3, 1, padding=18*rate, dilation=18*rate, bias=True),
				nn.BatchNorm2d(dim_out, momentum=bn_mom),
				nn.ReLU(inplace=True),	
		)
		self.branch5_conv = nn.Conv2d(dim_in, dim_out, 1, 1, 0,bias=True)
		self.branch5_bn = nn.BatchNorm2d(dim_out, momentum=bn_mom)
		self.branch5_relu = nn.ReLU(inplace=True)

		self.conv_cat = nn.Sequential(
				nn.Conv2d(dim_out*5, dim_out, 1, 1, padding=0,bias=True),
				nn.BatchNorm2d(dim_out, momentum=bn_mom),
				nn.ReLU(inplace=True),		
		)

	def forward(self, x):
		[b, c, row, col] = x.size()
        #-----------------------------------------#
        #   一共五个分支
        #-----------------------------------------#
		conv1x1 = self.branch1(x)
		conv3x3_1 = self.branch2(x)
		conv3x3_2 = self.branch3(x)
		conv3x3_3 = self.branch4(x)
        #-----------------------------------------#
        #   第五个分支,全局平均池化+卷积
        #-----------------------------------------#
		global_feature = torch.mean(x,2,True)
		global_feature = torch.mean(global_feature,3,True)
		global_feature = self.branch5_conv(global_feature)
		global_feature = self.branch5_bn(global_feature)
		global_feature = self.branch5_relu(global_feature)
		global_feature = F.interpolate(global_feature, (row, col), None, 'bilinear', True)
		
        #-----------------------------------------#
        #   将五个分支的内容堆叠起来
        #   然后1x1卷积整合特征。
        #-----------------------------------------#
		feature_cat = torch.cat([conv1x1, conv3x3_1, conv3x3_2, conv3x3_3, global_feature], dim=1)
		result = self.conv_cat(feature_cat)
		return result

class DeepLab(nn.Module):
    def __init__(self, num_classes, backbone="mobilenet", pretrained=True, downsample_factor=16):
        super(DeepLab, self).__init__()
        if backbone=="xception":
            #----------------------------------#
            #   获得两个特征层
            #   浅层特征    [128,128,256]
            #   主干部分    [30,30,2048]
            #----------------------------------#
            self.backbone = xception(downsample_factor=downsample_factor, pretrained=pretrained)
            in_channels = 2048
            low_level_channels = 256
        elif backbone=="mobilenet":
            #----------------------------------#
            #   获得两个特征层
            #   浅层特征    [128,128,24]
            #   主干部分    [30,30,320]
            #----------------------------------#
            self.backbone = MobileNetV2(downsample_factor=downsample_factor, pretrained=pretrained)
            in_channels = 320
            low_level_channels = 24
        else:
            raise ValueError('Unsupported backbone - `{}`, Use mobilenet, xception.'.format(backbone))

        #-----------------------------------------#
        #   ASPP特征提取模块
        #   利用不同膨胀率的膨胀卷积进行特征提取
        #-----------------------------------------#
        self.aspp = ASPP(dim_in=in_channels, dim_out=256, rate=16//downsample_factor)
        
        #----------------------------------#
        #   浅层特征边
        #----------------------------------#
        self.shortcut_conv = nn.Sequential(
            nn.Conv2d(low_level_channels, 48, 1),
            nn.BatchNorm2d(48),
            nn.ReLU(inplace=True)
        )		

        self.cat_conv = nn.Sequential(
            nn.Conv2d(48+256, 256, 3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),

            nn.Conv2d(256, 256, 3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),

            nn.Dropout(0.1),
        )
        self.cls_conv = nn.Conv2d(256, num_classes, 1, stride=1)

    def forward(self, x):
        H, W = x.size(2), x.size(3)
        #-----------------------------------------#
        #   获得两个特征层
        #   low_level_features: 浅层特征-进行卷积处理
        #   x : 主干部分-利用ASPP结构进行加强特征提取
        #-----------------------------------------#
        low_level_features, x = self.backbone(x)
        x = self.aspp(x)
        low_level_features = self.shortcut_conv(low_level_features)
        
        #-----------------------------------------#
        #   将加强特征边上采样
        #   与浅层特征堆叠后利用卷积进行特征提取
        #-----------------------------------------#
        x = F.interpolate(x, size=(low_level_features.size(2), low_level_features.size(3)), mode='bilinear', align_corners=True)
        x = self.cat_conv(torch.cat((x, low_level_features), dim=1))
        x = self.cls_conv(x)
        x = F.interpolate(x, size=(H, W), mode='bilinear', align_corners=True)
        return x

5. ASPP-Head

ASPP-Head结构

  • 输入经 AdaptiveAvgPool2d 自适应平均池化操作提取特征,得到1个feature map;输入经 ASPPModule 不同空洞率的空洞卷积后,得到的4个包含了不同尺度的上下文信息的feature map。将这5个feature map 拼接(concat)起来,再进行一次卷积进行特征融合。

  • 将该特征图进行softmax处理,用于语义分割预测。

ASPP-Head代码

import torch
import torch.nn as nn
from mmcv.cnn import ConvModule

from mmseg.registry import MODELS
from ..utils import resize
from .decode_head import BaseDecodeHead


class ASPPModule(nn.ModuleList):
    """Atrous Spatial Pyramid Pooling (ASPP) Module.

    Args:
        dilations (tuple[int]): Dilation rate of each layer.
        in_channels (int): Input channels.
        channels (int): Channels after modules, before conv_seg.
        conv_cfg (dict|None): Config of conv layers.
        norm_cfg (dict|None): Config of norm layers.
        act_cfg (dict): Config of activation layers.
    """

    def __init__(self, dilations, in_channels, channels, conv_cfg, norm_cfg, act_cfg):
        super().__init__()
        self.dilations = dilations
        self.in_channels = in_channels
        self.channels = channels
        self.conv_cfg = conv_cfg
        self.norm_cfg = norm_cfg
        self.act_cfg = act_cfg
        for dilation in dilations:
            self.append(
                ConvModule(
                    self.in_channels,
                    self.channels,
                    1 if dilation == 1 else 3,
                    dilation=dilation,
                    padding=0 if dilation == 1 else dilation,
                    conv_cfg=self.conv_cfg,
                    norm_cfg=self.norm_cfg,
                    act_cfg=self.act_cfg))
                    

    def forward(self, x):
        """Forward function."""
        aspp_outs = []
        for aspp_module in self:
            aspp_outs.append(aspp_module(x))

        return aspp_outs


@MODELS.register_module()
class ASPPHead(BaseDecodeHead):

    def __init__(self, dilations=(1, 6, 12, 18), **kwargs):
        super().__init__(**kwargs)
        assert isinstance(dilations, (list, tuple))
        self.dilations = dilations
        self.image_pool = nn.Sequential( # Sequential 运行__init__:按照顺序 AdaptiveAvgPool2d对象和 ConvModule对象 定义好,他们会运行各自的 __init__
            nn.AdaptiveAvgPool2d(1),
            ConvModule(
                self.in_channels,
                self.channels,
                1,
                conv_cfg=self.conv_cfg,
                norm_cfg=self.norm_cfg,
                act_cfg=self.act_cfg))
        self.aspp_modules = ASPPModule( # ASPPModule 运行__init__:ASPPModule 根据dlations定义了4种空洞卷积
            dilations,
            self.in_channels,
            self.channels,
            conv_cfg=self.conv_cfg,
            norm_cfg=self.norm_cfg,
            act_cfg=self.act_cfg)
        self.bottleneck = ConvModule( # ConvModule 运行 __init__:构建普通卷积
            (len(dilations) + 1) * self.channels,
            self.channels,
            3,
            padding=1,
            conv_cfg=self.conv_cfg,
            norm_cfg=self.norm_cfg,
            act_cfg=self.act_cfg)

    def _forward_feature(self, inputs):
        """Forward function for feature maps before classifying each pixel with
        ``self.cls_seg`` fc.

        Args:
            inputs (list[Tensor]): List of multi-level img features.

        Returns:
            feats (Tensor): A tensor of shape (batch_size, self.channels,
                H, W) which is feature map for last layer of decoder head.
        """
        x = self._transform_inputs(inputs)
        aspp_outs = [
            resize(
                self.image_pool(x), # Sequential 运行 forward
                size=x.size()[2:],
                mode='bilinear',
                align_corners=self.align_corners)
        ]
        aspp_outs.extend(self.aspp_modules(x)) # ASPPModule 运行 forward
        aspp_outs = torch.cat(aspp_outs, dim=1)
        feats = self.bottleneck(aspp_outs) # ConvModule 运行 forward
        return feats

    def forward(self, inputs):
        """Forward function."""
        output = self._forward_feature(inputs)
        output = self.cls_seg(output)
        return output