CycleGAN 论文阅读笔记解决的痛点问题是配对的图像不好找，所以尝试找到一个映射函数 G，可以将 X 域上的图像映射

欢迎访问更多文章：xerrors.fun

解决的痛点问题是配对的图像不好找，所以尝试找到一个映射函数 G，可以将 X 域上的图像映射到 Y 域上，由于映射关系没有约束，很容易出现训练上的问题，所以训练了两个映射函数，另外一个映射函数将 F 域上的图像映射到 X 上，最终实现的效果就是 F(G(X)) 近似于 X。

原论文首先是提出了一个假设：

We assume there is some underlying relationship between the domains – for example, that they are two different renderings of the same underlying scene – and seek to learn that relationship.

我们假设在不同域之间存在某种潜在的关系（例如：它们是同一基本场景的两种不同的呈现形式），并试图了解这种关系。

所以作者想要训练一个映射关系 G，G 可以将 X 域映射到等同于 Y 分布的域 G(X)，但是问题出现了，没法保证这样的转换会将输入 x 和输出 y 匹配，同时也没有办法进行优化；这里的问题应该就是使用非配对数据都会面临的问题；同时在训练的过程中经常会出现「mode collapse」：会将所有的图像映射到一个输出上面。

为了保证每次的输出都能够跟输入有关系，这里可以理解为保留原本的「潜在关系」，所以作者引入了「循环一致性」原理，也就是要保证 x 经过映射关系 G 之后所得到的 G(x) 可以通过另外一个映射关系 F 映射回来，也就是要满足F(G(x)) = x,G(F(y))=y。

1. 相关工作

这一部分主要是介绍前人的工作和自己的理解，这里稍微记录一下；

作者认为 GANs 成功的关键在于「对抗性损失」的理念，同时作者也基于此提出了自己的「循环一致性损失」。

The key to GANs’ success is the idea of an adversarial loss that forces the generated images to be, in principle, indistinguishable from real photos.

GANs成功的关键在于“对抗性损失”的理念，这种理念迫使生成的图像原则上与真实的照片无法区分。

在总结前人在 image-to-image 上的工作的时候，说明自己的网络跟前人相比，不需要特定的与训练或者特定的匹配关系，同时也没有假定输入和输出的低维映射空间是一样的。（我觉得这跟他之前的假设不是相悖的吗？）

Unlike the above approaches, our formulation does not rely on any task-specific, predefined similarity function between the input and output, nor do we assume that the input and output have to lie in the same low-dimensional embedding space.

与上述方法不同，我们的方法不依赖于输入和输出之间任何针对特定任务定制的、预定义的相似性函数，也不假设输入和输出必须处于相同的低维嵌入空间。

在涉及到「循环一致性」的时候，作者说明，使用传递性作为优化方法的概念由来已久，也已经在很多领域比如翻译、3D模型匹配有所应用，同时使用也已经有人将循环一致性损失运用在模型训练中，知识跟我们所不同的是，他们知识利用传递性来监督 CNN 的训练。

Of these, Zhou et al. and Godard et al. are most similar to our work, as they use a cycle consistency loss as a way of using transitivity to supervise CNN training.

其中Zhou和Godard等人与我们的工作最为相似，他们将循环一致性损失作为一种利用传递性来监督CNN训练的方式。

同时也出现了一个巧合 DualGAN：

Concurrent with our work, in these same proceedings, Yi et al. independently use a similar objective for unpaired image-to-image translation, inspired by dual learning in machine translation.

与我们的工作同时，在同一篇论文中，受机器翻译中的双重学习启发，Yi等人独立地使用了一个类似的目标用于图像到图像的非配对翻译。

同时作者还对比了 Neural Style Transfer，尽管呈现结果相似，但是 Cycle GAN 所注重的是两个图像集之间的映射，所提取的是外观之外了更高级的特征。所以该模型也更容易应用到其他的任务上。

2. 数学理论

想要理解这里的数学概念，主要还是先了解这个图中的每个部分的功能；

网络中包含两个映射函数以及两个判别器，在训练映射关系 G 和判别器 DY 的时候，目标函数如下所示（F 和 DX 同理）：

\begin{aligned} \mathcal{L}_{\mathrm{GAN}}\left(G, D_{Y}, X, Y\right) &=\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\log D_{Y}(y)\right] \\ &+\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\log \left(1-D_{Y}(G(x))\right]\right. \end{aligned}

损失 cycle consistency loss：

\begin{aligned} \mathcal{L}_{\text {cyc }}(G, F) &=\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\|F(G(x))-x\|_{1}\right] \\ &+\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\|G(F(y))-y\|_{1}\right] \end{aligned}

结合在一起可以得到，目标函数如下，其中 $\lambda$ 所控制着这两个目标的相对重要性。：

\begin{aligned} \mathcal{L}\left(G, F, D_{X}, D_{Y}\right) &=\mathcal{L}_{\mathrm{GAN}}\left(G, D_{Y}, X, Y\right) \\ &+\mathcal{L}_{\mathrm{GAN}}\left(F, D_{X}, Y, X\right) \\ &+\lambda \mathcal{L}_{\mathrm{cyc}}(G, F) \end{aligned}

整个算法的就是为了解决下面这样一个公式：

G^{*}, F^{*}=\arg \min _{G, F} \max _{D_{x}, D_{Y}} \mathcal{L}\left(G, F, D_{X}, D_{Y}\right)

除此之外对于风格转换，还有一个损失函数 identity loss，作者发现引入额外的损失来激励生成器映射，可以很好的保留输入和输出之间的颜色成分：

\begin{aligned} & \mathcal{L}_{\text {identity }}(G, F)=\mathbb{E}_{y \sim p_{\text {data }}(y)}\left[\|G(y)-y\|_{1}\right]+\mathbb{E}_{x \sim p_{\text {data }}(x)}\left[\|F(x)-x\|_{1}\right] \end{aligned}

3. 网络结构

网络架构方面，看论文是没怎么看懂；倒是下面这张图片看的比较明白，图中展示的是一次单向训练的过程。

生成器由三个部分完成：

「编码-转换-解码」第一部分可以理解为特征提取（编码），提取原有图像的抽象特征，第二部分是转换，将特征从图像域 A 转换到图像域 B，之后通过还原（解码）变成域 B 上的图片。所以一般采用两个顶端相对的提醒来表示生成器的模型；

Generator

生成器

class Generator(nn.Module):
    def __init__(self, input_nc, output_nc, n_residual_blocks=9):
        super(Generator, self).__init__()

        # Initial convolution block (256x256x3 -> 256x256x64)
        ## https://zhuanlan.zhihu.com/p/66989411 详解 nn.ReflectionPad2d
        model = [   nn.ReflectionPad2d(3),
                    nn.Conv2d(input_nc, 64, 7),
                    nn.InstanceNorm2d(64),
                    nn.ReLU(inplace=True) ]

        # Downsampling, Encoding (256x256x64 -> 128x128x128 -> 64x64x256)
        ## 采用两个卷积层进行特征提取
        in_features = 64
        out_features = in_features*2
        for _ in range(2):
            model += [  nn.Conv2d(in_features, out_features, 3, stride=2, padding=1),
                        nn.InstanceNorm2d(out_features),
                        nn.ReLU(inplace=True) ]
            in_features = out_features
            out_features = in_features*2

        # Residual blocks, Transformation (64x64x256 -> 64x64x256)
        ## 添加 6 个（默认）残差模块进行风格转换
        for _ in range(n_residual_blocks):
            model += [ResidualBlock(in_features)]

        # Upsampling, Decoding (64x64x256 -> 128x128x128 -> 256x256x64)
        ## 将图像特征还原到图像域 B 上，使用两层逆卷积操作
        out_features = in_features//2
        for _ in range(2):
            model += [  nn.ConvTranspose2d(in_features, out_features, 3, stride=2, padding=1, output_padding=1),
                        nn.InstanceNorm2d(out_features),
                        nn.ReLU(inplace=True) ]
            in_features = out_features
            out_features = in_features//2

        # Output layer (2556x256x64 -> 256x256x)
        model += [  nn.ReflectionPad2d(3),
                    nn.Conv2d(64, output_nc, 7),
                    nn.Tanh() ]

        self.model = nn.Sequential(*model)

    def forward(self, x):
        return self.model(x)

残差模块

class ResidualBlock(nn.Module):
    def __init__(self, in_features):
        super(ResidualBlock, self).__init__()

        conv_block = [  nn.ReflectionPad2d(1),
                        nn.Conv2d(in_features, in_features, 3),
                        nn.InstanceNorm2d(in_features),
                        nn.ReLU(inplace=True),
                        nn.ReflectionPad2d(1),
                        nn.Conv2d(in_features, in_features, 3),
                        nn.InstanceNorm2d(in_features)  ]

        self.conv_block = nn.Sequential(*conv_block)

    def forward(self, x):
        return x + self.conv_block(x)

判别器

就是一个简单的分类器

discriminator

判别器：

class Discriminator(nn.Module):
    def __init__(self, input_nc):
        super(Discriminator, self).__init__()

        # A bunch of convolutions one after another
        model = [   nn.Conv2d(input_nc, 64, 4, stride=2, padding=1),
                    nn.LeakyReLU(0.2, inplace=True) ]

        model += [  nn.Conv2d(64, 128, 4, stride=2, padding=1),
                    nn.InstanceNorm2d(128), 
                    nn.LeakyReLU(0.2, inplace=True) ]

        model += [  nn.Conv2d(128, 256, 4, stride=2, padding=1),
                    nn.InstanceNorm2d(256), 
                    nn.LeakyReLU(0.2, inplace=True) ]

        model += [  nn.Conv2d(256, 512, 4, padding=1),
                    nn.InstanceNorm2d(512), 
                    nn.LeakyReLU(0.2, inplace=True) ]

        # FCN classification layer
        model += [nn.Conv2d(512, 1, 4, padding=1)]

        self.model = nn.Sequential(*model)

    def forward(self, x):
        x =  self.model(x)
        # Average pooling and flatten
        return F.avg_pool2d(x, x.size()[2:]).view(x.size()[0], -1)

4. 训练过程

对生成器进行判断的时候，使用了三种损失函数，GAN loss、Identity loss、Cycle loss；一图胜千言，我画的太棒了！

训练生成器的过程

从图中可以看到对于生成器的训练的输入是取自域 A 和域 B 的两个图片 Real A、Real B，整个生成器的训练分为了三个部分，分别是橙色的区域来计算 identity loss，黄色的区域用来计算 Cycle loss，紫色的其余来计算 GAN loss；对于两个生成器所生成的 6 个 loss，进行累加作为生成器的 loss，之后进行反向传播，后面的代码也能说明这一点。

判别器的训练方法：

训练判别器的过程

代码可以参考这个大佬实现的易读的 Pytorch 版本：PyTorch-CycleGAN/blob/master/train

CycleGAN 论文阅读笔记

1. 相关工作

2. 数学理论

3. 网络结构

生成器

判别器

4. 训练过程

参考资料