ShuffleNetV1本文已参与「新人创作礼」活动，一起开启掘金创作之路。 ShuffleNet V1：https:/

本文已参与「新人创作礼」活动，一起开启掘金创作之路。

ShuffleNetV1

注意：对于stage2的第一个block的第一个1×1卷积是不使用组卷积的，因为输入特征矩阵channel很少，不用使用。

每个stage第一个block是stride=2的
下一个stage输出通道是上一个的2倍
在每个block中，第一个1×1卷积和DW卷积的通道数是第二个1×1卷积的1/4（类似于ResNet的block）

The first building block in each stage is applied with stride = 2. Other hyper-parameters within a stage stay the same, and for the next stage the output channels are doubled. Similar to [9], we set the number of bottleneck channels to 1/4 of the output channels for each ShuffleNet unit.

网络亮点：

ShuffleNetV1

组卷积后加上channel shuffle模块
block中两个1×1卷积改为组卷积

网络结构

ShuffleNetV1

channel shuffle

修改group卷积，原本的group虽然减少了参数量和计算量，但只能融合组内信息，组间是没有交流的。

ShuffleNet Units

对1×1卷积进行修改，原始的1×1卷积占用太多计算量

Among them, state-of-the-art networks such as Xception [3] and ResNeXt [40] introduce efficient depthwise separable convolutions or group convolutions into the building blocks to strike an excellent trade-off between representation capability and computational cost. However, we notice that both designs do not fully take the 1 × 1 convolutions (also called pointwise convolutions in [12]) into account, which require considerable complexity. For example, in ResNeXt [40] only 3 × 3 layers are equipped with group convolutions. As a result, for each residual unit in ResNeXt the pointwise convolutions occupy 93.4% multiplication-adds (cardinality = 32 as suggested in [40]). In tiny networks, expensive pointwise convolutions result in limited number of channels to meet the complexity constraint, which might significantly damage the accuracy.

疑问：这里为什么concat，这样维度不是翻倍了吗？是不是第一个block用于减少HW，增大channel，在下一个block中，stride=1，用的add，是不改变通道的，第一个1×1先降维，最后一个1×1卷积再升维。