1. 语义分割基础
双线性插值
双线性插值是上采样的一种方式。
单线性插值
一条线上,根据 和 ,可以根据公式,由 计算到 :
指代为位置,指代为像素值。
双线性插值
注:和都是坐标,横纵看都可以,作用是一样的。
已知位置 和 像素值 ;
已知位置 和 像素值 ;
根据 单线性插值,已知 的位置 ,可以求得 的像素:
同理可求 的像素值。
有了 和 ,可求像素值 。这就是双线性插值。
转置卷积
转置卷积是上采样的一种方式,转置卷积也叫做反卷积。
转置卷积的由来
转置卷积考虑的只是输入矩阵和输出矩阵形状的可逆性。
普通卷积:代码中通常采取最后那种等价方式,然后再 reshape 成2×2。
转置卷积:
已知输入特征图和转置卷积的参数,如何求经转置卷积后的输出特征图尺寸
方法步骤:
- 在输入特征图周围填充行/列数:K-P-1 行/列。
- 在输入特征图像素间填充行/列数:S-1 行/列。
- 将卷积核参数上下、左右翻转。
- 做正常卷积运算时,由1和2得到:
;
,,
那么正常卷积运算公式为:
例:输入I=3,S=2,P=1,K=3。计算转置卷积的输出feature map尺寸:
输入特征图周围填充 1 行/列,数值为 0;
输入特征图像素间填充 1 行/列,数值为 0;
卷积核参数上下、左右翻转;
正常卷积运行得到输出 feature map 尺寸:
已知输入特征图,为了得到特定尺寸的输出特征图,如何安排转置卷积参数
方法:逆过来运算,将输出feature map当作输入feature map,考虑kernel_size、s、p的大小。
例如:为了对35×35尺寸的特征图进行转置卷积得到71×71尺寸的特征图。将输入看作输出,输出看做输入。
输入:71×71尺寸的特征图,输入:35×35尺寸的特征图。
根据公式计算一下是:k=3,p=1,s=2
转置卷积参数列表
看一下Pytorch中转置卷积nn.ConvTranspose2d()的主要参数:
in_channels: int,
out_channels: int,
kernel_size: _size_2_t,
stride: _size_2_t = 1,
padding: _size_2_t = 0,
output_padding: _size_2_t = 0,
膨胀卷积
转置卷积也叫做空洞卷积。
膨胀卷积的优点
- 保持参数个数不变的情况下增大了卷积核的感受野,让每个卷积输出都包含较大范围的信息。
- 保证输出的特征映射(feature map)的大小保持不变。
膨胀卷积计算
膨胀后的卷积核大小 =
输出 feature map大小按照卷积公式无差别。
例:输入feature map=7×7,dilation_rate=2,kernel_size=3,s=1,p=0。计算输出feature map尺寸。
卷积核大小 = 2*(3-1) + 1 = 5
输出 feature map = (7-5)/1+1 = 3
2. FCN
全卷积网络
VGG
VGG-16的组成:(卷积+池化)+ 全连接(4096+4096+1000)+ softmax
这个图下面是VGG16的卷积操作,上面是经过对应操作后的 output feature map。
图2-1 VGG16结构
FCN结构
FCN主要分为两个部分:特征提取网络和预测网络。
-
特征提取网络:输入 VGG16到全连接的前一层 后,输出的 feature map相比输入下降了32倍;再通过VGG16最后2个全连接层改造成的2个卷积层;通过
num_cls个1×1的卷积得到num_cls通道的特征图。 -
预测网络:
转置卷积进行上采样32倍得到原图大小、num_cls 通道的输出特征图,再进行softmax处理,得到每个像素在每个类别上的概率。
以图2-2 FCN-32d为例,顺一遍FCN-32d的过程为:
- 卷积:
-
图片
224×224×3通过 VGG16到全连接的前一层 后,输出的 feature map相比输入下降了32倍,输出为7×7×512; -
通过将VGG16最后2个全连接层改造成的2个卷积层(分别是
7×7×512, p=3, s=1, 4096个卷积和1×1×4096, p=0, s=1, 4096个卷积),输出为7×7×4096; -
通过最后一个
1×1×4096, p=0, s=1, num_cls个卷积,输出为7×7×num_cls。
-
转置卷积:通过1个
64×64×num_cls, p=16, s=32, num_cls个转置卷积进行上采样32倍得到原图大小的输出224×224×num_cls。注:该转置卷积通过双线性插值进行初始化。 -
预测:将
224×224×num_cls的feature map 进行softmax处理,得到每个像素在每个类别上的概率。
图2-2 FCN-32d的结构
图2-3 FCN中将VGG中的最后两层全连接层改为卷积层
FCN代码
pre_model = models.vgg16_bn(weights=True)
class FCN32(nn.Module):
def __init__(self, num_classes = 1):
super(FCN32, self).__init__()
self.stage1 = pre_model.features[:7]
self.stage2 = pre_model.features[7:14]
self.stage3 = pre_model.features[14:24]
self.stage4 = pre_model.features[24:34]
self.stage5 = pre_model.features[34:]
self.stage6 = nn.Sequential(
nn.Conv2d(512, 4096, kernel_size=(7,7), stride=1, padding=3),
nn.ReLU(inplace=True),
nn.Dropout())
self.stage7 = nn.Sequential(
nn.Conv2d(4096, 4096, kernel_size=(1,1), stride=1),
nn.ReLU(inplace=True),
nn.Dropout())
self.conv1 = nn.Sequential(
nn.Conv2d(4096, num_classes, kernel_size=(1,1), stride=1))
self.invers_conv1 = nn.ConvTranspose2d(num_classes,num_classes,kernel_size=(64,64), stride=32, padding=16)
def forward(self, x):
x_size = x.size()
out = self.stage1(x)
out = self.stage2(out)
out = self.stage3(out)
out = self.stage4(out)
out = self.stage5(out)
out = self.stage6(out)
out = self.stage7(out)
out = self.conv1(out)
out = self.invers_conv1(out)
return out
3. Unet
Unet结构
Unet主要分为三个部分:主干特征提取网络、加强特征提取网络、预测网络。
-
主干特征提取网络-Encoder:输入进行
2次3×3卷积后,下采样4次,每次尺寸缩小1倍,下采样后都会进行2次的3×3卷积通道扩大1倍。 -
加强特征提取网络-Decoder:将主干部分提取的最后一个特征层进行上采样,上采样
4次,每次尺寸扩大1倍、通道缩小1倍,上采样后与初步的有效特征层进行特征融合(concat拼接),然后进行2次的3×3卷积通道缩小1倍。 -
预测网络:通过
num_cls个1×1的卷积,然后再经过softmax/sigmoid得到每个像素在每个类别上的概率。
注:Unet的输入图像最好是可以被2的4次方整除,这样就不会造成 encoder和 decoder 不匹配的情况发生。若输入图像不能被2的4次方整除,则resize。
图3-1 Unet结构
Unet代码
模型构建
if args.arch == 'UNet':
model = Unet(3, 1).to(device)
Unet结构:二元分割代码案例
import torch.nn as nn
import torch
from torch import autograd
from functools import partial
import torch.nn.functional as F
from torchvision import models
class DoubleConv(nn.Module):
def __init__(self, in_ch, out_ch):
super(DoubleConv, self).__init__()
self.conv = nn.Sequential(
nn.Conv2d(in_ch, out_ch, 3, padding=1),
nn.BatchNorm2d(out_ch),
nn.ReLU(inplace=True),
nn.Conv2d(out_ch, out_ch, 3, padding=1),
nn.BatchNorm2d(out_ch),
nn.ReLU(inplace=True)
)
def forward(self, input):
return self.conv(input)
class Unet(nn.Module):
def __init__(self, in_ch, out_ch):
super(Unet, self).__init__()
# 主干特征提取网络
self.conv1 = DoubleConv(in_ch, 32)
self.pool1 = nn.MaxPool2d(2)
self.conv2 = DoubleConv(32, 64)
self.pool2 = nn.MaxPool2d(2)
self.conv3 = DoubleConv(64, 128)
self.pool3 = nn.MaxPool2d(2)
self.conv4 = DoubleConv(128, 256)
self.pool4 = nn.MaxPool2d(2)
self.conv5 = DoubleConv(256, 512)
# 加强特征提取网络
self.up6 = nn.ConvTranspose2d(512, 256, 2, stride=2)# 上采样采用转置卷积
self.conv6 = DoubleConv(512, 256)
self.up7 = nn.ConvTranspose2d(256, 128, 2, stride=2)
self.conv7 = DoubleConv(256, 128)
self.up8 = nn.ConvTranspose2d(128, 64, 2, stride=2)
self.conv8 = DoubleConv(128, 64)
self.up9 = nn.ConvTranspose2d(64, 32, 2, stride=2)
self.conv9 = DoubleConv(64, 32)
self.conv10 = nn.Conv2d(32, out_ch, 1)
def forward(self, x):
# 主干特征提取网络
#print(x.shape)
c1 = self.conv1(x)
p1 = self.pool1(c1)
#print(p1.shape)
c2 = self.conv2(p1)
p2 = self.pool2(c2)
#print(p2.shape)
c3 = self.conv3(p2)
p3 = self.pool3(c3)
#print(p3.shape)
c4 = self.conv4(p3)
p4 = self.pool4(c4)
#print(p4.shape)
c5 = self.conv5(p4)
# 加强特征提取网络
up_6 = self.up6(c5)
merge6 = torch.cat([up_6, c4], dim=1)
c6 = self.conv6(merge6)
up_7 = self.up7(c6)
merge7 = torch.cat([up_7, c3], dim=1)
c7 = self.conv7(merge7)
up_8 = self.up8(c7)
merge8 = torch.cat([up_8, c2], dim=1)
c8 = self.conv8(merge8)
up_9 = self.up9(c8)
merge9 = torch.cat([up_9, c1], dim=1)
c9 = self.conv9(merge9)
c10 = self.conv10(c9)
out = nn.Sigmoid()(c10)
return out
4. DeepLabv3+
DeepLabv3+结构
图4-1 DeepLabv3+结构
DeepLabv3+主要分为三个部分:Encoder、Decoder、预测部分。
- Encoder:特征提取,包括backbone和ASPP结构。
-
backbone:可以是MobileNetv3。将深层次特征给ASPP,将浅层次特征给Decoder。
-
ASPP:使用不同膨胀率的膨胀卷积可以获得不同大小感受野的情况。
-
Decoder:深层特征与浅层特征的融合。将Encoder的深层次特征上采样与Decoder的浅层次特征经过1×1卷积后特征融合(拼接-concat)。
-
预测部分:经过
softmax/sigmoid得到每个像素在每个类别上的概率。
DeepLabv3+代码
import torch
import torch.nn as nn
import torch.nn.functional as F
from nets.xception import xception
from nets.mobilenetv2 import mobilenetv2
class MobileNetV2(nn.Module):
def __init__(self, downsample_factor=8, pretrained=True):
super(MobileNetV2, self).__init__()
from functools import partial
model = mobilenetv2(pretrained)
self.features = model.features[:-1]
self.total_idx = len(self.features)
self.down_idx = [2, 4, 7, 14]
if downsample_factor == 8:
for i in range(self.down_idx[-2], self.down_idx[-1]):
self.features[i].apply(
partial(self._nostride_dilate, dilate=2)
)
for i in range(self.down_idx[-1], self.total_idx):
self.features[i].apply(
partial(self._nostride_dilate, dilate=4)
)
elif downsample_factor == 16:
for i in range(self.down_idx[-1], self.total_idx):
self.features[i].apply(
partial(self._nostride_dilate, dilate=2)
)
def _nostride_dilate(self, m, dilate):
classname = m.__class__.__name__
if classname.find('Conv') != -1:
if m.stride == (2, 2):
m.stride = (1, 1)
if m.kernel_size == (3, 3):
m.dilation = (dilate//2, dilate//2)
m.padding = (dilate//2, dilate//2)
else:
if m.kernel_size == (3, 3):
m.dilation = (dilate, dilate)
m.padding = (dilate, dilate)
def forward(self, x):
low_level_features = self.features[:4](x)
x = self.features[4:](low_level_features)
return low_level_features, x
#-----------------------------------------#
# ASPP特征提取模块
# 利用不同膨胀率的膨胀卷积进行特征提取
#-----------------------------------------#
class ASPP(nn.Module):
def __init__(self, dim_in, dim_out, rate=1, bn_mom=0.1):
super(ASPP, self).__init__()
self.branch1 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 1, 1, padding=0, dilation=rate,bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch2 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 3, 1, padding=6*rate, dilation=6*rate, bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch3 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 3, 1, padding=12*rate, dilation=12*rate, bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch4 = nn.Sequential(
nn.Conv2d(dim_in, dim_out, 3, 1, padding=18*rate, dilation=18*rate, bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
self.branch5_conv = nn.Conv2d(dim_in, dim_out, 1, 1, 0,bias=True)
self.branch5_bn = nn.BatchNorm2d(dim_out, momentum=bn_mom)
self.branch5_relu = nn.ReLU(inplace=True)
self.conv_cat = nn.Sequential(
nn.Conv2d(dim_out*5, dim_out, 1, 1, padding=0,bias=True),
nn.BatchNorm2d(dim_out, momentum=bn_mom),
nn.ReLU(inplace=True),
)
def forward(self, x):
[b, c, row, col] = x.size()
#-----------------------------------------#
# 一共五个分支
#-----------------------------------------#
conv1x1 = self.branch1(x)
conv3x3_1 = self.branch2(x)
conv3x3_2 = self.branch3(x)
conv3x3_3 = self.branch4(x)
#-----------------------------------------#
# 第五个分支,全局平均池化+卷积
#-----------------------------------------#
global_feature = torch.mean(x,2,True)
global_feature = torch.mean(global_feature,3,True)
global_feature = self.branch5_conv(global_feature)
global_feature = self.branch5_bn(global_feature)
global_feature = self.branch5_relu(global_feature)
global_feature = F.interpolate(global_feature, (row, col), None, 'bilinear', True)
#-----------------------------------------#
# 将五个分支的内容堆叠起来
# 然后1x1卷积整合特征。
#-----------------------------------------#
feature_cat = torch.cat([conv1x1, conv3x3_1, conv3x3_2, conv3x3_3, global_feature], dim=1)
result = self.conv_cat(feature_cat)
return result
class DeepLab(nn.Module):
def __init__(self, num_classes, backbone="mobilenet", pretrained=True, downsample_factor=16):
super(DeepLab, self).__init__()
if backbone=="xception":
#----------------------------------#
# 获得两个特征层
# 浅层特征 [128,128,256]
# 主干部分 [30,30,2048]
#----------------------------------#
self.backbone = xception(downsample_factor=downsample_factor, pretrained=pretrained)
in_channels = 2048
low_level_channels = 256
elif backbone=="mobilenet":
#----------------------------------#
# 获得两个特征层
# 浅层特征 [128,128,24]
# 主干部分 [30,30,320]
#----------------------------------#
self.backbone = MobileNetV2(downsample_factor=downsample_factor, pretrained=pretrained)
in_channels = 320
low_level_channels = 24
else:
raise ValueError('Unsupported backbone - `{}`, Use mobilenet, xception.'.format(backbone))
#-----------------------------------------#
# ASPP特征提取模块
# 利用不同膨胀率的膨胀卷积进行特征提取
#-----------------------------------------#
self.aspp = ASPP(dim_in=in_channels, dim_out=256, rate=16//downsample_factor)
#----------------------------------#
# 浅层特征边
#----------------------------------#
self.shortcut_conv = nn.Sequential(
nn.Conv2d(low_level_channels, 48, 1),
nn.BatchNorm2d(48),
nn.ReLU(inplace=True)
)
self.cat_conv = nn.Sequential(
nn.Conv2d(48+256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Conv2d(256, 256, 3, stride=1, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.Dropout(0.1),
)
self.cls_conv = nn.Conv2d(256, num_classes, 1, stride=1)
def forward(self, x):
H, W = x.size(2), x.size(3)
#-----------------------------------------#
# 获得两个特征层
# low_level_features: 浅层特征-进行卷积处理
# x : 主干部分-利用ASPP结构进行加强特征提取
#-----------------------------------------#
low_level_features, x = self.backbone(x)
x = self.aspp(x)
low_level_features = self.shortcut_conv(low_level_features)
#-----------------------------------------#
# 将加强特征边上采样
# 与浅层特征堆叠后利用卷积进行特征提取
#-----------------------------------------#
x = F.interpolate(x, size=(low_level_features.size(2), low_level_features.size(3)), mode='bilinear', align_corners=True)
x = self.cat_conv(torch.cat((x, low_level_features), dim=1))
x = self.cls_conv(x)
x = F.interpolate(x, size=(H, W), mode='bilinear', align_corners=True)
return x
5. ASPP-Head
ASPP-Head结构
-
输入经
AdaptiveAvgPool2d自适应平均池化操作提取特征,得到1个feature map;输入经ASPPModule不同空洞率的空洞卷积后,得到的4个包含了不同尺度的上下文信息的feature map。将这5个feature map 拼接(concat)起来,再进行一次卷积进行特征融合。 -
将该特征图进行softmax处理,用于语义分割预测。
ASPP-Head代码
import torch
import torch.nn as nn
from mmcv.cnn import ConvModule
from mmseg.registry import MODELS
from ..utils import resize
from .decode_head import BaseDecodeHead
class ASPPModule(nn.ModuleList):
"""Atrous Spatial Pyramid Pooling (ASPP) Module.
Args:
dilations (tuple[int]): Dilation rate of each layer.
in_channels (int): Input channels.
channels (int): Channels after modules, before conv_seg.
conv_cfg (dict|None): Config of conv layers.
norm_cfg (dict|None): Config of norm layers.
act_cfg (dict): Config of activation layers.
"""
def __init__(self, dilations, in_channels, channels, conv_cfg, norm_cfg, act_cfg):
super().__init__()
self.dilations = dilations
self.in_channels = in_channels
self.channels = channels
self.conv_cfg = conv_cfg
self.norm_cfg = norm_cfg
self.act_cfg = act_cfg
for dilation in dilations:
self.append(
ConvModule(
self.in_channels,
self.channels,
1 if dilation == 1 else 3,
dilation=dilation,
padding=0 if dilation == 1 else dilation,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg,
act_cfg=self.act_cfg))
def forward(self, x):
"""Forward function."""
aspp_outs = []
for aspp_module in self:
aspp_outs.append(aspp_module(x))
return aspp_outs
@MODELS.register_module()
class ASPPHead(BaseDecodeHead):
def __init__(self, dilations=(1, 6, 12, 18), **kwargs):
super().__init__(**kwargs)
assert isinstance(dilations, (list, tuple))
self.dilations = dilations
self.image_pool = nn.Sequential( # Sequential 运行__init__:按照顺序 AdaptiveAvgPool2d对象和 ConvModule对象 定义好,他们会运行各自的 __init__
nn.AdaptiveAvgPool2d(1),
ConvModule(
self.in_channels,
self.channels,
1,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg,
act_cfg=self.act_cfg))
self.aspp_modules = ASPPModule( # ASPPModule 运行__init__:ASPPModule 根据dlations定义了4种空洞卷积
dilations,
self.in_channels,
self.channels,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg,
act_cfg=self.act_cfg)
self.bottleneck = ConvModule( # ConvModule 运行 __init__:构建普通卷积
(len(dilations) + 1) * self.channels,
self.channels,
3,
padding=1,
conv_cfg=self.conv_cfg,
norm_cfg=self.norm_cfg,
act_cfg=self.act_cfg)
def _forward_feature(self, inputs):
"""Forward function for feature maps before classifying each pixel with
``self.cls_seg`` fc.
Args:
inputs (list[Tensor]): List of multi-level img features.
Returns:
feats (Tensor): A tensor of shape (batch_size, self.channels,
H, W) which is feature map for last layer of decoder head.
"""
x = self._transform_inputs(inputs)
aspp_outs = [
resize(
self.image_pool(x), # Sequential 运行 forward
size=x.size()[2:],
mode='bilinear',
align_corners=self.align_corners)
]
aspp_outs.extend(self.aspp_modules(x)) # ASPPModule 运行 forward
aspp_outs = torch.cat(aspp_outs, dim=1)
feats = self.bottleneck(aspp_outs) # ConvModule 运行 forward
return feats
def forward(self, inputs):
"""Forward function."""
output = self._forward_feature(inputs)
output = self.cls_seg(output)
return output