本文已参与「新人创作礼」活动,一起开启掘金创作之路。
ShuffleNet V1:arxiv.org/abs/1707.01…
ShuffleNetV1
注意:对于stage2的第一个block的第一个1×1卷积是不使用组卷积的,因为输入特征矩阵channel很少,不用使用。
- 每个stage第一个block是stride=2的
- 下一个stage输出通道是上一个的2倍
- 在每个block中,第一个1×1卷积和DW卷积的通道数是第二个1×1卷积的1/4(类似于ResNet的block)
The first building block in each stage is applied with stride = 2. Other hyper-parameters within a stage stay the same, and for the next stage the output channels are doubled. Similar to [9], we set the number of bottleneck channels to 1/4 of the output channels for each ShuffleNet unit.
网络亮点:
ShuffleNetV1
- 组卷积后加上channel shuffle模块
- block中两个1×1卷积改为组卷积
网络结构
ShuffleNetV1
channel shuffle
修改group卷积,原本的group虽然减少了参数量和计算量,但只能融合组内信息,组间是没有交流的。
ShuffleNet Units
对1×1卷积进行修改,原始的1×1卷积占用太多计算量
Among them, state-of-the-art networks such as Xception [3] and ResNeXt [40] introduce efficient depthwise separable convolutions or group convolutions into the building blocks to strike an excellent trade-off between representation capability and computational cost. However, we notice that both designs do not fully take the 1 × 1 convolutions (also called pointwise convolutions in [12]) into account, which require considerable complexity. For example, in ResNeXt [40] only 3 × 3 layers are equipped with group convolutions. As a result, for each residual unit in ResNeXt the pointwise convolutions occupy 93.4% multiplication-adds (cardinality = 32 as suggested in [40]). In tiny networks, expensive pointwise convolutions result in limited number of channels to meet the complexity constraint, which might significantly damage the accuracy.
疑问:这里为什么concat,这样维度不是翻倍了吗?是不是第一个block用于减少HW,增大channel,在下一个block中,stride=1,用的add,是不改变通道的,第一个1×1先降维,最后一个1×1卷积再升维。