ShuffleNetV1

88 阅读2分钟

本文已参与「新人创作礼」活动,一起开启掘金创作之路。

ShuffleNet V1:arxiv.org/abs/1707.01…

ShuffleNetV1

注意:对于stage2的第一个block的第一个1×1卷积是不使用组卷积的,因为输入特征矩阵channel很少,不用使用。

  • 每个stage第一个block是stride=2的
  • 下一个stage输出通道是上一个的2倍
  • 在每个block中,第一个1×1卷积和DW卷积的通道数是第二个1×1卷积的1/4(类似于ResNet的block)

The first building block in each stage is applied with stride = 2. Other hyper-parameters within a stage stay the same, and for the next stage the output channels are doubled. Similar to [9], we set the number of bottleneck channels to 1/4 of the output channels for each ShuffleNet unit.

网络亮点:

ShuffleNetV1

  • 组卷积后加上channel shuffle模块
  • block中两个1×1卷积改为组卷积

网络结构

ShuffleNetV1

channel shuffle

修改group卷积,原本的group虽然减少了参数量和计算量,但只能融合组内信息,组间是没有交流的。

ShuffleNet Units

对1×1卷积进行修改,原始的1×1卷积占用太多计算量

Among them, state-of-the-art networks such as Xception [3] and ResNeXt [40] introduce efficient depthwise separable convolutions or group convolutions into the building blocks to strike an excellent trade-off between representation capability and computational cost. However, we notice that both designs do not fully take the 1 × 1 convolutions (also called pointwise convolutions in [12]) into account, which require considerable complexity. For example, in ResNeXt [40] only 3 × 3 layers are equipped with group convolutions. As a result, for each residual unit in ResNeXt the pointwise convolutions occupy 93.4% multiplication-adds (cardinality = 32 as suggested in [40]). In tiny networks, expensive pointwise convolutions result in limited number of channels to meet the complexity constraint, which might significantly damage the accuracy.

疑问:这里为什么concat,这样维度不是翻倍了吗?是不是第一个block用于减少HW,增大channel,在下一个block中,stride=1,用的add,是不改变通道的,第一个1×1先降维,最后一个1×1卷积再升维。