膨胀卷积Dilated Convolution本文已参与「新人创作礼」活动，一起开启掘金创作之路。 https://ar

本文已参与「新人创作礼」活动，一起开启掘金创作之路。

arxiv.org/abs/1702.08…

膨胀卷积Dilated Convolution，又叫空洞卷积。

通过动图来直观的理解一下。

普通卷积：

卷积核大小kernel=3，填充padding=1，步距stride=1， （膨胀因子r=1，这里不用管膨胀因子，还没介绍）

膨胀卷积：

卷积核大小kernel=3，填充padding=1，步距stride=1，膨胀因子r=2

通过上图可以发现，膨胀卷积就是计算卷积时，卷积核中间存在间隙，这个间隙就是通过膨胀因子进行控制的，当膨胀因子r=1时，卷积核之间就不存在间隙，就是普通的卷积，当膨胀因子r=2时，卷积核间存在1个间隙。

作用

增大感受野
一般膨胀卷积要保持原输入特征图的高度和宽度

为什么使用膨胀卷积

在语义分割中，经过骨干网络（backbone）进行特征提取后，特征图变小，然后经过上采样得到原图大小。在之前讲的VGG网络中，特征图变小是通过最大池化下采样（max pooling）实现的，它会使特征图大小变小，同时也会丢失一些细节信息以及一些小目标，这些丢失的目标是无法通过上采样恢复的，因此语义分割效果可能会差。但如果去掉了下采样操作，特征图大小将会不变，后续计算量将会变大，同时感受野也会减小，所以不能直接去除最大池化下采样，所以就可以使用膨胀卷积来增加感受野。

网格现象Gridding Effect

在使用时不能无脑的堆叠膨胀卷积，容易出现网格现象（gridding effect）

实验一

上述图像的数字代表每个像素利用Layer1上该像素的次数

可以看到Layer4利用Layer1上像素并不是连续的，并没有利用到范围内的所有像素值，这就是gridding effect现象，这样也会导致丢失一些细节信息，所以使用的时候要避免。

实验二

可以看到，当r分别设置为1,2,3时，就用到了区域内所有像素。

实验三

可以看到，在参数数量相同时，这里卷积核大小都为3×3，普通卷积的感受野为7×7，而膨胀卷积感受野为13×13。

如何设计膨胀系数

一、

要让

例如，如果K=3，r = [1, 2, 5]，它的，满足

i=2时，

如果K=3，r = [1, 2, 9]，它的，不满足

i=2时，

Here we propose a simple solution- hybrid dilated convolution (HDC), to address this theoretical issue. Suppose we have N convolutional layers with kernel size K×K that have dilation rates of , the goal of HDC is to let the final size of the RF of a series of convolutional operations fully covers a square region without any holes or missing edges. We define the “maximum distance between two nonzero values” as with . The design goal is to let . For example, for kernel size K = 3, an r = [1, 2, 5] pattern works as ; however, an r = [1, 2, 9] pattern does not work as .

下面分别给出了两种情况的图：

r=[1, 2, 9]时，非零元素最大距离为3

作者给的r都是从1开始的。

如果我们想高层特征图用到底层所有像素，那么，而又是从三个数中取最大，所以，所以

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap


def dilated_conv_one_pixel(center,
                           feature_map: np.ndarray,
                           k: int = 3,
                           r: int = 1,
                           v: int = 1):
    """
    膨胀卷积核中心在指定坐标center处时，统计哪些像素被利用到，
    并在利用到的像素位置处加上增量v
    Args:
        center: 膨胀卷积核中心的坐标
        feature_map: 记录每个像素使用次数的特征图
        k: 膨胀卷积核的kernel大小
        r: 膨胀卷积的dilation rate
        v: 使用次数增量
    """
    assert divmod(3, 2)[1] == 1

    # left-top: (x, y)
    left_top = (center[0] - ((k - 1) // 2) * r, center[1] - ((k - 1) // 2) * r)
    for i in range(k):
        for j in range(k):
            feature_map[left_top[1] + i * r][left_top[0] + j * r] += v


def dilated_conv_all_map(dilated_map: np.ndarray,
                         k: int = 3,
                         r: int = 1):
    """
    根据输出特征矩阵中哪些像素被使用以及使用次数，
    配合膨胀卷积k和r计算输入特征矩阵哪些像素被使用以及使用次数
    Args:
        dilated_map: 记录输出特征矩阵中每个像素被使用次数的特征图
        k: 膨胀卷积核的kernel大小
        r: 膨胀卷积的dilation rate
    """
    new_map = np.zeros_like(dilated_map)
    for i in range(dilated_map.shape[0]):
        for j in range(dilated_map.shape[1]):
            if dilated_map[i][j] > 0:
                dilated_conv_one_pixel((j, i), new_map, k=k, r=r, v=dilated_map[i][j])

    return new_map


def plot_map(matrix: np.ndarray):
    plt.figure()

    c_list = ['white', 'blue', 'red']
    new_cmp = LinearSegmentedColormap.from_list('chaos', c_list)
    plt.imshow(matrix, cmap=new_cmp)

    ax = plt.gca()
    ax.set_xticks(np.arange(-0.5, matrix.shape[1], 1), minor=True)
    ax.set_yticks(np.arange(-0.5, matrix.shape[0], 1), minor=True)

    # 显示color bar
    plt.colorbar()

    # 在图中标注数量
    thresh = 5
    for x in range(matrix.shape[1]):
        for y in range(matrix.shape[0]):
            # 注意这里的matrix[y, x]不是matrix[x, y]
            info = int(matrix[y, x])
            ax.text(x, y, info,
                    verticalalignment='center',
                    horizontalalignment='center',
                    color="white" if info > thresh else "black")
    ax.grid(which='minor', color='black', linestyle='-', linewidth=1.5)
    plt.show()
    plt.close()


def main():
    # bottom to top
    dilated_rates = [1, 2, 9]
    # init feature map
    size = 31
    m = np.zeros(shape=(size, size), dtype=np.int32)
    center = size // 2
    m[center][center] = 1
    # print(m)
    # plot_map(m)

    for index, dilated_r in enumerate(dilated_rates[::-1]):
        new_map = dilated_conv_all_map(m, r=dilated_r)
        m = new_map
    print(m)
    plot_map(m)


if __name__ == '__main__':
    main()

二、膨胀系数锯齿结构

三、公约数不能大于1

例如2,4,8，他们的公约数为2，这种情况仍然存在网格现象

HDC准则效果展示

第一行是GT，第二行是没有按照本文描述的方法使用的膨胀卷积，第三行是按照本文描述方法使用的膨胀卷积