深度学习-论文阅读笔记&开源代码

Self-Attention Generative Adversarial Networks

代码：

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.001.png

Abstract：In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks. Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps. In SAGAN, details can be generated using cues from all feature locations. Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other. Furthermore, recent work has shown that generator conditioning affects GAN performance. Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics. The proposed SAGAN achieves the state-of-the-art results, boosting the best published Inception score from 36.8 to 52.52 and reducing Frechet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset. Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.

在这篇论文中，我们提出了自我注意生成对抗网络(SAGAN)，它允许对图像生成任务进行注意力驱动的长期依赖建模。在低分辨率特征图中，传统的卷积GANs只根据空间局部点生成高分辨率细节。在SAGAN中，可以使用来自所有特征位置的线索生成细节。此外，该鉴别器还可以检查图像中较远部分的细节特征是否一致。此外，最近的研究表明，生成器调节影响GAN的性能。利用这一观点，我们将光谱归一化应用于GAN生成器，并发现这改善了训练的动态性。在具有挑战性的ImageNet数据集上，提议的SAGAN实现了最先进的结果，将最佳发布Inception分数从36.8提高到52.52，并将Frechet Inception距离从27.62降低到18.65。注意力层的可视化显示，生成器利用的是与对象形状对应的邻域，而不是固定形状的局部区域。

要点：

attention模型：获取全局依赖关系；

Self-attention模型（也称intra-attention）：通过关注同一序列中的所有位置来计算序列中某个位置的响应。

SAGAN在图像的内部表示中有效地找到全局的、长期的依赖关系。

卷积在一个局部邻域内处理信息，因此仅使用卷积层对于在图像中建模长期依赖关系的计算效率很低。

两个技巧：

spectral normalization（论文16）：通过限制各层的谱范数来约束鉴别器的Lipschitz常数，以稳定化gan网络训练。与其他归一化技术相比，光谱归一化不需要额外的超参数调优(将所有权重层的光谱范数设置为1在实践中表现良好)。此外，计算成本也相对较小。谱归一化可以防止参数幅度的增大，避免异常梯度。

two-timescale update rule (TTUR)：训练生成器和判别器使用不同的学习率（using separate learning rates (TTUR) for the generator and the discriminator）

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.002.png

评估指标：

Inception score[26]计算条件类分布和边缘类分布之间的KL散度，越高表示图像质量越好。缺陷：它的主要目的是确保模型生成的样本可以被自信地识别为属于一个特定的类，并且模型从许多类生成样本，而不必评估细节的真实性或类内多样性。

Fréchet Inception distance (FID) [8]计算生成的图像与真实图像之间的Wasserstein-2距离，越小越好。在评估生成样本的真实性和变异性时更符合人类的评价。

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.004.png

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.005.png

FaceBoxes: A CPU Real-time Face Detector with High Accuracy

arxiv.org/abs/1708.05…

代码：

Caffe实现：github.com/sfzhang15/F…

Although tremendous strides have been made in face detection, one of the remaining open challenges is to achieve real-time speed on the CPU as well as maintain high performance, since effective models for face detection tend to be computationally prohibitive. To address this challenge, we propose a novel face detector, named FaceBoxes, with superior performance on both speed and accuracy. Specifically, our method has a lightweight yet powerful network structure that consists of the Rapidly Digested Convolutional Layers (RDCL) and the Multiple Scale Convolutional Layers (MSCL). The RDCL is designed to enable FaceBoxes to achieve real-time speed on the CPU. The MSCL aims at enriching the receptive fields and discretizing anchors over different layers to handle faces of various scales. Besides, we propose a new anchor densification strategy to make different types of anchors have the same density on the image, which significantly improves the recall rate of small faces. As a consequence, the proposed detector runs at 20 FPS on a single CPU core and 125 FPS using a GPU for VGA-resolution images. Moreover, the speed of FaceBoxes is invariant to the number of faces. We comprehensively evaluate this method and present state-of-the-art detection performance on several face detection benchmark datasets, including the AFW, PASCAL face, and FDDB. Code is available at github.com/sfzhang15/F…

虽然在人脸检测方面已经取得了巨大进步，但其余的开放性挑战之一是在CPU上实现实时以及保持高性能检测，因为用于面部检测的有效模型往往在计算上代价过高。为了应对这一挑战，我们提出了一种名为FaceBoxes的新型人脸检测器，它在速度和精度方面都具有卓越的性能。具体来说，我们的方法具有轻量级但功能强大的网络结构，包括快速处理卷积层（RDCL）和多尺度卷积层（MSCL）。 RDCL使FaceBoxes能够在CPU上实现实时速度。 MSCL的目的是丰富接收域，并在不同的层上离散锚点来处理不同尺度的面。此外，我们提出了一种新的锚点致密化策略，使不同类型的锚点在图像上具有相同的密度，从而显着提高了小脸部的召回率。因此，所提出的检测器在单CPU核心上以20 FPS运行，在使用GPU用于VGA分辨率图像时以125 FPS运行。而且，FaceBoxes的速度不受面部的数量影响。我们全面评估了这种方法，并在多个人脸检测基准数据集上展示了最先进的检测性能，包括AFW，PASCAL人脸和FDDB。代码可在github.com/sfzhang15/F…

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.006.png

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.007.png

Aspose.Words.8a654e31-48c9-4550-a0af-45a830ed047c.008.png