Deep Learning Face Representation by Joint Identification-Verification【翻译】

Doc2X：科研翻译与解析工具提供批量PDF处理、公式解析、多栏识别，以及 GPT 翻译与深度语料提取功能。 Doc2X: Research Translation and Parsing Tool Offers batch PDF processing, formula parsing, multi-column recognition, along with GPT translation and corpus extraction. 👉 立即使用 Doc2X | Use Doc2X Now

原文链接：1406.4773

Deep Learning Face Representation by Joint Identification-Verification

深度学习人脸表示通过联合身份验证

Yi Sun1

${}^{1}$ Department of Information Engineering,The Chinese University of Hong Kong

${}^{1}$ 香港中文大学信息工程系

${}^{2}$ Department of Electronic Engineering,The Chinese University of Hong Kong

${}^{2}$ 香港中文大学电子工程系

${}^{3}$ Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences

${}^{3}$ 中国科学院深圳先进技术研究院

sy011@ie.cuhk.edu.hk

Abstract

摘要

The key challenge of face recognition is to develop effective feature representations for reducing intra-personal variations while enlarging inter-personal differences. In this paper, we show that it can be well solved with deep learning and using both face identification and verification signals as supervision. The Deep IDentification-verification features (DeepID2) are learned with carefully designed deep convolutional networks. The face identification task increases the inter-personal variations by drawing DeepID2 extracted from different identities apart, while the face verification task reduces the intra-personal variations by pulling DeepID2 extracted from the same identity together, both of which are essential to face recognition. The learned DeepID2 features can be well generalized to new identities unseen in the training data. On the challenging LFW dataset [11], $\mathbf{{99.15}\% }$ face verification accuracy is achieved. Compared with the best deep learning result [21] on LFW, the error rate has been significantly reduced by $\mathbf{{67}\% }$ .

人脸识别的关键挑战在于开发有效的特征表示，以减少同一人之间的变化，同时增大不同人之间的差异。本文展示了如何通过深度学习，并利用人脸识别和验证信号作为监督，来很好地解决这一问题。深度身份验证特征（DeepID2）是通过精心设计的深度卷积网络学习得到的。人脸识别任务通过将不同身份提取的DeepID2特征拉开，增加了不同人之间的差异；而人脸验证任务则通过将同一身份提取的DeepID2特征拉近，减少了同一人之间的变化，这两者对人脸识别都是至关重要的。所学习的DeepID2特征能够很好地推广到训练数据中未见过的新身份。在具有挑战性的LFW数据集[11]上， $\mathbf{{99.15}\% }$ 人脸验证准确率得到了提高。与LFW上最好的深度学习结果[21]相比，错误率显著降低了 $\mathbf{{67}\% }$ 。

1 Introduction

1 引言

Faces of the same identity could look much different when presented in different poses, illuminations, expressions, ages, and occlusions. Such variations within the same identity could overwhelm the variations due to identity differences and make face recognition challenging, especially in unconstrained conditions. Therefore, reducing the intra-personal variations while enlarging the inter-personal differences is an eternal topic in face recognition. It can be traced back to early subspace face recognition methods such as LDA [1], Bayesian face [17], and unified subspace [23, 24]. For example, LDA approximates inter- and intra-personal face variations by using two linear subspaces and finds the projection directions to maximize the ratio between them. More recent studies have also targeted the same goal, either explicitly or implicitly. For example, metric learning [6, 9, 15] maps faces to some feature representation such that faces of the same identity are close to each other while those of different identities stay apart. However, these models are much limited by their linear nature or shallow structures, while inter- and intra-personal variations are complex, highly nonlinear, and observed in high-dimensional image space.

同一身份的面孔在不同的姿势、光照、表情、年龄和遮挡情况下可能看起来截然不同。这种同一身份内部的变化可能会掩盖由于身份差异而产生的变化，使得面部识别变得具有挑战性，尤其是在不受约束的条件下。因此，减少个体内部的变化，同时增大个体之间的差异，是面部识别中的一个永恒主题。这可以追溯到早期的子空间面部识别方法，如线性判别分析（LDA）[1]、贝叶斯面部识别[17]和统一子空间[23, 24]。例如，LDA通过使用两个线性子空间来近似个体间和个体内的面部变化，并找到投影方向以最大化它们之间的比率。最近的研究也以显性或隐性的方式针对同一目标。例如，度量学习[6, 9, 15]将面孔映射到某种特征表示，使得同一身份的面孔彼此接近，而不同身份的面孔则相距较远。然而，这些模型受到其线性特性或浅层结构的限制，而个体间和个体内的变化是复杂的、高度非线性的，并且在高维图像空间中观察到。

In this work, we show that deep learning provides much more powerful tools to handle the two types of variations. Thanks to its deep architecture and large learning capacity, effective features for face recognition can be learned through hierarchical nonlinear mappings. We argue that it is essential to learn such features by using two supervisory signals simultaneously, i.e. the face identification and verification signals, and the learned features are referred to as Deep IDentification-verification features (DeepID2). Identification is to classify an input image into a large number of identity classes, while verification is to classify a pair of images as belonging to the same identity or not (i.e. binary classification). In the training stage, given an input face image with the identification signal, its DeepID2 features are extracted in the top hidden layer of the learned hierarchical nonlinear feature representation, and then mapped to one of a large number of identities through another function $g$ (DeepID2). In the testing stage,the learned DeepID2 features can be generalized to other tasks (such as face verification) and new identities unseen in the training data. The identification supervisory signal tend to pull apart DeepID2 of different identities since they have to be classified into different classes. Therefore, the learned features would have rich identity-related or interpersonal variations. However, the identification signal has a relatively weak constraint on DeepID2 extracted from the same identity, since dissimilar DeepID2 could be mapped to the same identity through function $g\left( \cdot \right)$ . This leads to problems when DeepID2 features are generalized to new tasks and new identities in test where $g$ is not applicable anymore. We solve this by using an additional face verification signal, which requires that every two DeepID2 vectors extracted from the same identity are close to each other while those extracted from different identities are kept away. The strong per-element constraint on DeepID2 can effectively reduce the intra-personal variations. On the other hand, using the verification signal alone (i.e. only distinguishing a pair of DeepID2 at a time) is not as effective in extracting identity-related features as using the identification signal (i.e. distinguishing thousands of identities at a time). Therefore, the two supervisory signals emphasize different aspects in feature learning and should be employed together.

在本研究中，我们展示了深度学习提供了更强大的工具来处理两种类型的变异。由于其深层架构和大规模的学习能力，面部识别的有效特征可以通过层次化的非线性映射来学习。我们认为，使用两种监督信号同时学习这些特征是至关重要的，即面部识别和验证信号，所学特征被称为深度身份识别-验证特征（DeepID2）。识别是将输入图像分类到大量的身份类别中，而验证是将一对图像分类为是否属于同一身份（即二元分类）。在训练阶段，给定带有身份识别信号的输入面部图像，其DeepID2特征会在学习到的层次化非线性特征表示的顶层隐藏层中提取出来，然后通过另一个函数 $g$ （DeepID2）映射到大量身份中的一个。在测试阶段，学习到的DeepID2特征可以推广到其他任务（如面部验证）和在训练数据中未见过的新身份。身份识别监督信号倾向于将不同身份的DeepID2分开，因为它们必须被分类到不同的类别中。因此，所学特征将具有丰富的与身份相关或人际变异的特征。然而，身份识别信号对同一身份中提取的DeepID2的约束相对较弱，因为不同的DeepID2可能通过函数 $g\left( \cdot \right)$ 被映射到相同的身份。这会导致在测试阶段，当 $g$ 不再适用时，DeepID2特征在推广到新任务和新身份时遇到问题。我们通过使用额外的面部验证信号来解决这个问题，该信号要求从同一身份中提取的每两个DeepID2向量彼此接近，而从不同身份中提取的DeepID2向量则保持距离。对DeepID2的每个元素的强约束可以有效减少同一身份内部的变异。另一方面，单独使用验证信号（即一次只区分一对DeepID2）在提取与身份相关的特征方面不如使用身份识别信号（即一次区分成千上万的身份）有效。因此，这两种监督信号在特征学习中强调不同的方面，应当一起使用。

To characterize faces from different aspects, complementary DeepID2 features are extracted from various face regions and resolutions, and are concatenated to form the final feature representation after PCA dimension reduction. Since the learned DeepID2 features are diverse among different identities while consistent within the same identity, it makes the following face recognition easier. Using the learned feature representation and a recently proposed face verification model [3], we achieved the highest $\mathbf{{99.15}\% }$ face verification accuracy on the challenging and extensively studied LFW dataset [11]. This is the first time that a machine provided with only the face region achieves an accuracy on par with the ${99.20}\%$ accuracy of human to whom the entire LFW face image including the face region and large background area are presented to verify.

为了从不同方面表征面孔，从各种面部区域和分辨率中提取互补的 DeepID2 特征，并在 PCA 降维后将其连接以形成最终的特征表示。由于学习到的 DeepID2 特征在不同身份之间是多样的，而在同一身份内是一致的，这使得后续的面部识别变得更加容易。利用学习到的特征表示和最近提出的面部验证模型 [3]，我们在具有挑战性且广泛研究的 LFW 数据集 [11] 上达到了最高的 $\mathbf{{99.15}\% }$ 面部验证准确率。这是第一次仅凭面部区域的机器在准确性上与人类相当，后者在验证时呈现了包括面部区域和大背景区域的整个 LFW 面部图像。

In recent years, a great deal of efforts have been made for face recognition with deep learning [5, 10, 20, 27, 8, 22, 21]. Among the deep learning works, [5, 20, 8] learned features or deep metrics with the verification signal, while [22, 21] learned features with the identification signal and achieved accuracies around ${97.45}\%$ on LFW. Our approach significantly improves the state-of-the-art. The idea of jointly solving the classification and verification tasks was applied to general object recognition [16], with the focus on improving classification accuracy on fixed object classes instead of hidden feature representations. Our work targets on learning features which can be well generalized to new classes (identities) and the verification task, while the classification accuracy on identities in the training set is not crucial for us.

近年来，深度学习在面部识别方面付出了大量努力 [5, 10, 20, 27, 8, 22, 21]。在深度学习的研究中，[5, 20, 8] 通过验证信号学习特征或深度度量，而 [22, 21] 则通过识别信号学习特征，并在 LFW 上达到了约 ${97.45}\%$ 的准确率。我们的方法显著提高了现有技术的水平。将分类和验证任务联合解决的思路被应用于一般物体识别 [16]，重点在于提高固定物体类别的分类准确率，而不是隐含特征表示。我们的工作旨在学习能够很好地推广到新类别（身份）和验证任务的特征，而训练集中身份的分类准确率对我们来说并不重要。

2 Identification-verification guided deep feature learning

2 识别-验证引导的深度特征学习

We learn features with variations of deep convolutional neural networks (deep ConvNets) [13]. The convolution and pooling operations in deep ConvNets are specially designed to extract visual features hierarchically, from local low-level features to global high-level ones. Our deep ConvNets take similar structures as in [21]. It contains four convolutional layers, the first three of which are followed by max-pooling. To learn a diverse number of high-level features, we do not require weight-sharing on the entire feature map in higher convolutional layers [10]. Specifically, in the third convolutional layer of our deep ConvNets,neuron weights are locally shared in every $2 \times 2$ local regions. In the fourth convolutional layer, which is more appropriately called a locally-connected layer, weights are totally unshared between neurons. The ConvNet extracts a 160-dimensional DeepID2 vector at its last layer of the feature extraction cascade. The DeepID2 layer to be learned are fully-connected to both the third and fourth convolutional layers. Since the fourth convolutional layer extracts more global features than the third one, the DeepID2 layer takes multi-scale features as input, forming the so called multi-scale ConvNets [19]. We use rectified linear units (ReLU) [18] for neurons in the convolutional layers and the DeepID2 layer. ReLU has better fitting abilities than the sigmoid units for large training datasets [12]. An illustration of the ConvNet structure used to extract DeepID2 is shown in Fig. 1 given an RGB input of size ${55} \times {47}$ . When the size of the input region changes, the map sizes in the following layers will change accordingly. The DeepID2 extraction process is denoted as $f = \operatorname{Conv}\left( {x,{\theta }_{c}}\right)$ ,where $\operatorname{Conv}\left( \cdot \right)$ is the feature extraction function defined by the ConvNet, $x$ is the input face patch, $f$ is the extracted DeepID2 vector,and ${\theta }_{c}$ denotes ConvNet parameters to be learned.

我们使用深度卷积神经网络（深度 ConvNets）[13] 学习特征。深度 ConvNets 中的卷积和池化操作专门设计用于分层提取视觉特征，从局部低级特征到全局高级特征。我们的深度 ConvNets 结构与 [21] 中的相似。它包含四个卷积层，其中前三个层后面跟着最大池化。为了学习多样化的高级特征，我们不要求在更高卷积层的整个特征图上进行权重共享 [10]。具体而言，在我们深度 ConvNets 的第三个卷积层中，神经元权重在每个 $2 \times 2$ 局部区域内是局部共享的。在第四个卷积层中，更恰当地称为局部连接层，神经元之间的权重完全不共享。ConvNet 在特征提取级联的最后一层提取一个 160 维的 DeepID2 向量。要学习的 DeepID2 层与第三和第四个卷积层完全连接。由于第四个卷积层提取的全局特征比第三个卷积层更多，DeepID2 层将多尺度特征作为输入，形成所谓的多尺度 ConvNets [19]。我们在卷积层和 DeepID2 层中使用整流线性单元（ReLU）[18]。对于大型训练数据集，ReLU 的拟合能力优于 sigmoid 单元 [12]。用于提取 DeepID2 的 ConvNet 结构示意图如图 1 所示，输入的 RGB 大小为 ${55} \times {47}$ 。当输入区域的大小变化时，后续层中的映射大小将相应变化。DeepID2 提取过程表示为 $f = \operatorname{Conv}\left( {x,{\theta }_{c}}\right)$ ，其中 $\operatorname{Conv}\left( \cdot \right)$ 是由 ConvNet 定义的特征提取函数， $x$ 是输入人脸补丁， $f$ 是提取的 DeepID2 向量， ${\theta }_{c}$ 表示待学习的 ConvNet 参数。

Figure 1: The ConvNet structure for DeepID2 extraction.

图 1: DeepID2 提取的 ConvNet 结构。

DeepID2 features are learned under two supervisory signals. The first is face identification signal, which classifies each face image into one of $n$ (e.g., $n = {8192}$ ) different identities. Identification is achieved by following the DeepID2 layer with an $n$ -way softmax layer,which outputs a probability distribution over the $n$ classes. The network is trained to minimize the cross-entropy loss,which we call the identification loss. It is denoted as

DeepID2 特征在两个监督信号下进行学习。第一个信号是人脸识别信号，它将每个面部图像分类为 $n$ （例如， $n = {8192}$ ）个不同的身份。通过在 DeepID2 层之后跟一个 $n$ 类别的 softmax 层来实现身份识别，该层输出一个对 $n$ 类别的概率分布。网络被训练以最小化交叉熵损失，我们称之为识别损失。其公式表示为

where $f$ is the DeepID2 vector, $t$ is the target class,and ${\theta }_{id}$ denotes the softmax layer parameters. ${p}_{i}$ is the target probability distribution,where ${p}_{i} = 0$ for all $i$ except ${p}_{t} = 1$ for the target class $t.{\widehat{p}}_{i}$ is the predicted probability distribution. To correctly classify all the classes simultaneously, the DeepID2 layer must form discriminative identity-related features (i.e. features with large interpersonal variations). The second is face verification signal, which encourages DeepID2 extracted from faces of the same identity to be similar. The verification signal directly regularize DeepID2 and can effectively reduce the intra-personal variations. Commonly used constraints include the L1/L2 norm and cosine similarity. We adopt the following loss function based on the L2 norm, which was originally proposed by Hadsell et al. [7] for dimensionality reduction,

其中， $f$ 是 DeepID2 向量， $t$ 是目标类别， ${\theta }_{id}$ 表示 softmax 层的参数。 ${p}_{i}$ 是目标概率分布，其中 ${p}_{i} = 0$ 对于所有 $i$ 除了目标类别 $t.{\widehat{p}}_{i}$ 的 ${p}_{t} = 1$ 是预测的概率分布。为了同时正确分类所有类别，DeepID2 层必须形成具有较大人际差异的判别性身份相关特征（即具有较大人际差异的特征）。第二个信号是人脸验证信号，它鼓励来自同一身份的人脸的 DeepID2 特征相似。验证信号直接对 DeepID2 进行正则化，可以有效减少个体内差异。常用的约束包括 L1/L2 范数和余弦相似度。我们采用基于 L2 范数的以下损失函数，该函数最初由 Hadsell 等人 [7] 提出用于降维，

where ${f}_{i}$ and ${f}_{j}$ are DeepID2 vectors extracted from the two face images in comparison. ${y}_{ij} = 1$ means that ${f}_{i}$ and ${f}_{j}$ are from the same identity. In this case,it minimizes the L2 distance between the two DeepID2 vectors. ${y}_{ij} = - 1$ means different identities,and Eq. 2 requires the distance larger than a margin $m.{\theta }_{ve} = \{ m\}$ is the parameter to be learned in the verification loss function. Loss functions based on the L1 norm could have similar formulations [16]. The cosine similarity was used in [18] as

其中 ${f}_{i}$ 和 ${f}_{j}$ 是从比较中的两个面部图像提取的 DeepID2 向量。 ${y}_{ij} = 1$ 表示 ${f}_{i}$ 和 ${f}_{j}$ 来自同一身份。在这种情况下，它最小化两个 DeepID2 向量之间的 L2 距离。 ${y}_{ij} = - 1$ 表示不同的身份，而 Eq. 2 要求距离大于一个边际 $m.{\theta }_{ve} = \{ m\}$ 是在验证损失函数中需要学习的参数。基于 L1 范数的损失函数可以有类似的公式 [16]。在 [18] 中使用了余弦相似度作为

where $d = \frac{{f}_{i} \cdot {f}_{j}}{{\begin{Vmatrix}{f}_{i}\end{Vmatrix}}_{2}{\begin{Vmatrix}{f}_{j}\end{Vmatrix}}_{2}}$ is the cosine similarity between DeepID2 vectors, ${\theta }_{ve} = \{ w,b\}$ are learnable scaling and shifting parameters, $\sigma$ is the sigmoid function,and ${y}_{ij}$ is the binary target of whether the two compared face images belong to the same identity. All the three loss functions are evaluated and compared in our experiments.

其中 $d = \frac{{f}_{i} \cdot {f}_{j}}{{\begin{Vmatrix}{f}_{i}\end{Vmatrix}}_{2}{\begin{Vmatrix}{f}_{j}\end{Vmatrix}}_{2}}$ 是 DeepID2 向量之间的余弦相似度， ${\theta }_{ve} = \{ w,b\}$ 是可学习的缩放和偏移参数， $\sigma$ 是 sigmoid 函数，而 ${y}_{ij}$ 是比较的两个面部图像是否属于同一身份的二元目标。我们在实验中评估并比较了这三种损失函数。

Our goal is to learn the parameters ${\theta }_{c}$ in the feature extraction function $\operatorname{Conv}\left( \cdot \right)$ ,while ${\theta }_{id}$ and ${\theta }_{ve}$ are only parameters introduced to propagate the identification and verification signals during training. In the testing stage,only ${\theta }_{c}$ is used for feature extraction. The parameters are updated by stochastic gradient descent. The identification and verification gradients are weighted by a hyperparameter $\lambda$ . The margin $m$ in Eq. (2) is a special case,which cannot be updated by gradient descent since its gradient would always be nonnegative. Instead,we adaptively update $m$ during training such that

我们的目标是学习特征提取函数 $\operatorname{Conv}\left( \cdot \right)$ 中的参数 ${\theta }_{c}$ ，而 ${\theta }_{id}$ 和 ${\theta }_{ve}$ 只是用于在训练期间传播识别和验证信号的参数。在测试阶段，仅使用 ${\theta }_{c}$ 进行特征提取。参数通过随机梯度下降进行更新。识别和验证梯度由超参数 $\lambda$ 加权。Eq. (2) 中的边际 $m$ 是一个特殊情况，无法通过梯度下降进行更新，因为其梯度总是非负的。相反，我们在训练期间自适应地更新 $m$ ，使得

Table 1: The DeepID2 learning algorithm.

表 1：DeepID2 学习算法。

input: training set $\chi = \left\{ \left( {{x}_{i},{l}_{i}}\right) \right\}$ ,initialized parameters ${\theta }_{c},{\theta }_{id}$ ,and ${\theta }_{ve}$ ,hyperparame-ter $\lambda$ ,learning rate $\eta \left( t\right) ,t \leftarrow 0$

输入：训练集 $\chi = \left\{ \left( {{x}_{i},{l}_{i}}\right) \right\}$ ，初始化参数 ${\theta }_{c},{\theta }_{id}$ 和 ${\theta }_{ve}$ ，超参数 $\lambda$ ，学习率 $\eta \left( t\right) ,t \leftarrow 0$

while not converge do

当未收敛时，执行

$t \leftarrow t + 1\;$ sample two training samples $\left( {{x}_{i},{l}_{i}}\right)$ and $\left( {{x}_{j},{l}_{j}}\right)$ from $\chi$

$t \leftarrow t + 1\;$ 样本两个训练样本 $\left( {{x}_{i},{l}_{i}}\right)$ 和 $\left( {{x}_{j},{l}_{j}}\right)$ 来自 $\chi$

${f}_{i} = \operatorname{Conv}\left( {{x}_{i},{\theta }_{c}}\right)$ and ${f}_{j} = \operatorname{Conv}\left( {{x}_{j},{\theta }_{c}}\right)$

${f}_{i} = \operatorname{Conv}\left( {{x}_{i},{\theta }_{c}}\right)$ 和 ${f}_{j} = \operatorname{Conv}\left( {{x}_{j},{\theta }_{c}}\right)$

$\nabla {\theta }_{id} = \frac{\partial \operatorname{Ident}\left( {{f}_{i},{l}_{i},{\theta }_{id}}\right) }{\partial {\theta }_{id}} + \frac{\partial \operatorname{Ident}\left( {{f}_{j},{l}_{j},{\theta }_{id}}\right) }{\partial {\theta }_{id}}$

$\nabla {\theta }_{ve} = \lambda \cdot \frac{\partial \operatorname{Verif}\left( {{f}_{i},{f}_{j},{y}_{ij},{\theta }_{ve}}\right) }{\partial {\theta }_{ve}}$ ,where ${y}_{ij} = 1$ if ${l}_{i} = {l}_{j}$ ,and ${y}_{ij} = - 1$ otherwise.

$\nabla {\theta }_{ve} = \lambda \cdot \frac{\partial \operatorname{Verif}\left( {{f}_{i},{f}_{j},{y}_{ij},{\theta }_{ve}}\right) }{\partial {\theta }_{ve}}$ ，其中 ${y}_{ij} = 1$ 如果 ${l}_{i} = {l}_{j}$ ，而 ${y}_{ij} = - 1$ 则不然。

$\nabla {f}_{i} = \frac{\partial \operatorname{Ident}\left( {{f}_{i},{l}_{i},{\theta }_{id}}\right) }{\partial {f}_{i}} + \lambda \cdot \frac{\partial \operatorname{Verif}\left( {{f}_{i},{f}_{j},{y}_{ij},{\theta }_{ve}}\right) }{\partial {f}_{i}}$

$\nabla {f}_{j} = \frac{\partial \operatorname{Ident}\left( {{f}_{j},{l}_{j},{\theta }_{id}}\right) }{\partial {f}_{j}} + \lambda \cdot \frac{\partial \operatorname{Verif}\left( {{f}_{i},{f}_{j},{y}_{ij},{\theta }_{ve}}\right) }{\partial {f}_{j}}$

$\nabla {\theta }_{c} = \nabla {f}_{i} \cdot \frac{\partial \operatorname{Conv}\left( {{x}_{i},{\theta }_{c}}\right) }{\partial {\theta }_{c}} + \nabla {f}_{j} \cdot \frac{\partial \operatorname{Conv}\left( {{x}_{j},{\theta }_{c}}\right) }{\partial {\theta }_{c}}$

update ${\theta }_{id} = {\theta }_{id} - \eta \left( t\right) \cdot {\theta }_{id},{\theta }_{ve} = {\theta }_{ve} - \eta \left( t\right) \cdot {\theta }_{ve}$ ,and ${\theta }_{c} = {\theta }_{c} - \eta \left( t\right) \cdot {\theta }_{c}$ .

更新 ${\theta }_{id} = {\theta }_{id} - \eta \left( t\right) \cdot {\theta }_{id},{\theta }_{ve} = {\theta }_{ve} - \eta \left( t\right) \cdot {\theta }_{ve}$ ，和 ${\theta }_{c} = {\theta }_{c} - \eta \left( t\right) \cdot {\theta }_{c}$ 。

end while

结束当

output ${\theta }_{c}$

输出 ${\theta }_{c}$

Figure 2: Patches selected for feature extraction.

图 2：用于特征提取的选定补丁。

it is the threshold that gives the lowest verification error on recent training samples. Our learning algorithm is summarized in Tab. 1

这是在最近的训练样本上给出最低验证错误的阈值。我们的学习算法在表 1 中进行了总结。

3 Face Verification

3 面部验证

To evaluate the feature learning algorithm described in Sec. 2. DeepID2 features are embedded into the conventional face verification pipeline of face alignment, feature extraction, and face verification. We first use the recently proposed SDM algorithm [25] to detect 21 facial landmarks. Then the face images are globally aligned by similarity transformation according to the detected landmarks. We cropped 400 face patches, which vary in positions, scales, color channels, and horizontal flipping, according to the globally aligned faces and the position of the facial landmarks. Accordingly, 400 DeepID2 vectors are extracted by a total of 200 deep ConvNets, each of which is trained to extract two 160-dimensional DeepID2 vectors on one particular face patch and its horizontally flipped counterpart, respectively, of each face.

为了评估第 2 节中描述的特征学习算法，DeepID2 特征被嵌入到传统的面部验证流程中，包括面部对齐、特征提取和面部验证。我们首先使用最近提出的 SDM 算法 [25] 检测 21 个面部标志点。然后根据检测到的标志点，通过相似变换对面部图像进行全局对齐。我们根据全局对齐的面部和面部标志点的位置裁剪了 400 个面部补丁，这些补丁在位置、尺度、颜色通道和水平翻转上有所不同。因此，通过总共 200 个深度卷积网络提取了 400 个 DeepID2 向量，每个网络分别训练提取一个特定面部补丁及其水平翻转对应物的两个 160 维 DeepID2 向量。

To reduce the redundancy among the large number of DeepID2 features and make our system practical, we use the forward-backward greedy algorithm [26] to select a small number of effective and complementary DeepID2 vectors ( 25 in our experiment), which saves most of the feature extraction time during test. Fig. 2 shows all the selected 25 patches, from which 25 160-dimensional DeepID2 vectors are extracted and are concatenated to a 4000-dimensional DeepID2 vector. The 4000-dimensional vector is further compressed by PCA for face verification.

为了减少大量 DeepID2 特征之间的冗余并使我们的系统具有实用性，我们使用前向-后向贪婪算法 [26] 选择少量有效且互补的 DeepID2 向量（在我们的实验中为 25 个），这在测试过程中节省了大部分特征提取时间。图 2 显示了所有选定的 25 个补丁，从中提取了 25 个 160 维 DeepID2 向量，并将它们连接成一个 4000 维的 DeepID2 向量。该 4000 维向量进一步通过 PCA 压缩以进行面部验证。

We learned the Joint Bayesian model [3] for face verification based on the extracted DeepID2. Joint Bayesian has been successfully used to model the joint probability of two faces being the same or different persons [3,4]. It models the feature representation $f$ of a face as the sum of inter- and intra-personal variations,or $f = \mu + \epsilon$ ,where both $\mu$ and $\epsilon$ are modeled as Gaussian distributions and are estimated from the training data. Face verification is achieved through log-likelihood ratio test, $\log \frac{P\left( {{f}_{1},{f}_{2} \mid {H}_{\text{inter }}}\right) }{P\left( {{f}_{1},{f}_{2} \mid {H}_{\text{intra }}}\right) }$ ,where the numerator and denominator are joint probabilities of two faces given the inter- or intra-personal variation hypothesis, respectively.

我们学习了基于提取的 DeepID2 的联合贝叶斯模型 [3] 用于人脸验证。联合贝叶斯成功地用于建模两张人脸是同一人或不同人的联合概率 [3,4]。它将人脸的特征表示 $f$ 建模为个人间和个人内变异的总和，或 $f = \mu + \epsilon$ ，其中 $\mu$ 和 $\epsilon$ 都被建模为高斯分布，并从训练数据中估计。人脸验证通过对数似然比检验实现 $\log \frac{P\left( {{f}_{1},{f}_{2} \mid {H}_{\text{inter }}}\right) }{P\left( {{f}_{1},{f}_{2} \mid {H}_{\text{intra }}}\right) }$ ，其中分子和分母分别是给定个人间或个人内变异假设的两张人脸的联合概率。

—— 更多内容请到Doc2X翻译查看—— —— For more content, please visit Doc2X for translations ——