Bayesian Face Revisited: A Joint Formulation【翻译】

72 阅读13分钟

Doc2X:高效的 PDF 转 Markdown 工具 Doc2X 提供精准的 PDF 转 Markdown 功能,支持公式和表格解析,是开发者与科研工作者的首选。 Doc2X: Efficient PDF to Markdown Tool Doc2X offers precise PDF to Markdown conversion, with formula and table parsing, making it the first choice for developers and researchers. 👉 立即体验 Doc2X | Try Doc2X Now

原文链接:eccv_2012_bayesian.dvi

Bayesian Face Revisited: A Joint Formulation

贝叶斯人脸再探:一种联合公式

Dong Chen 1{}^{1} ,Xudong Cao 3{}^{3} ,Liwei Wang 2{}^{2} ,Fang Wen 3{}^{3} ,and Jian Sun 3{}^{3}

董晨 1{}^{1},曹旭东 3{}^{3},王立伟 2{}^{2},温芳 3{}^{3},和孙剑 3{}^{3}

1 University of Science and Technology of China

1 中国科学技术大学

chendong@mail.ustc.edu.cn

2 The Chinese University of Hong Kong

2 香港中文大学

lwwang@cse.cuhk.edu.hk

3 Microsoft Research Asia, Beijing, China

3 微软亚洲研究院,中国北京

{xudongca,fangwen,jiansun}@microsoft.com

{xudongca,fangwen,jiansun}@microsoft.com

Abstract. In this paper, we revisit the classical Bayesian face recognition method by Baback Moghaddam et al. and propose a new joint formulation. The classical Bayesian method models the appearance difference between two faces. We observe that this "difference" formulation may reduce the separability between classes. Instead, we model two faces jointly with an appropriate prior on the face representation. Our joint formulation leads to an EM-like model learning at the training time and an efficient, closed-formed computation at the test time. On extensive experimental evaluations, our method is superior to the classical Bayesian face and many other supervised approaches. Our method achieved 92.4% test accuracy on the challenging Labeled Face in Wild (LFW) dataset. Comparing with current best commercial system, we reduced the error rate by 10%{10}\% .

摘要。本文重新审视了Baback Moghaddam等人提出的经典贝叶斯人脸识别方法,并提出了一种新的联合公式。经典的贝叶斯方法建模了两张人脸之间的外观差异。我们观察到,这种“差异”公式可能会降低类别之间的可分性。相反,我们通过对人脸表示施加适当的先验,联合建模两张人脸。我们的联合公式在训练时导致类似EM模型的学习,并在测试时实现高效的闭式计算。在广泛的实验评估中,我们的方法优于经典贝叶斯人脸识别及许多其他监督方法。我们的方法在具有挑战性的“野外标记人脸”(LFW)数据集上达到了92.4%的测试准确率。与当前最佳商业系统相比,我们将错误率降低了 10%{10}\%

1 Introduction

1 引言

Face verification and face identification are two sub-problems in face recognition. The former is to verify whether two given faces belong to the same person, while the latter answers "who is who" question in a probe face set given a gallery face set. In this paper, we focus on the verification problem, which is more widely applicable and lay the foundation of the identification problem.

人脸验证和人脸识别是人脸识别中的两个子问题。前者是验证给定的两张人脸是否属于同一个人,而后者则回答在给定画廊人脸集的情况下,探测人脸集中“谁是谁”的问题。本文重点关注验证问题,这一问题更广泛适用,并为识别问题奠定基础。

Bayesian face recognition [1] by Baback Moghaddam et al. is one of representative and successful face verification methods. It formulates the verification task as a binary Bayesian decision problem. Let HI{H}_{I} represents the intra-personal (same) hypothesis that two faces x1{x}_{1} and x2{x}_{2} belong to the same subject,and HE{H}_{E} is the extra-personal (not same) hypothesis that two faces are from different subjects. Then, the face verification problem amounts to classifying the difference Δ=x1x2\Delta = {x}_{1} - {x}_{2} as intra-personal variation or extra-personal variation. Based on the MAP (Maximum a Posterior) rule, the decision is made by testing a log likelihood ratio r(x1,x2)r\left( {{x}_{1},{x}_{2}}\right) :

贝叶斯人脸识别 [1] 由 Baback Moghaddam 等人提出,是一种具有代表性且成功的人脸验证方法。它将验证任务表述为一个二元贝叶斯决策问题。假设 HI{H}_{I} 表示两张人脸 x1{x}_{1}x2{x}_{2} 属于同一人的假设(同类假设),而 HE{H}_{E} 则是两张人脸来自不同人的假设(异类假设)。因此,人脸验证问题转化为将差异 Δ=x1x2\Delta = {x}_{1} - {x}_{2} 分类为同类差异或异类差异。根据最大后验(MAP)规则,通过测试对数似然比 r(x1,x2)r\left( {{x}_{1},{x}_{2}}\right) 来做出决策。

The above ratio can be also considered as a probabilistic measure of similarity between x1{x}_{1} and x2{x}_{2} for the face verification problem. In [1],two conditional probabilities in Eqn. (1) are modeled as Gaussians and eigen analysis is used for model learning and efficient computation.

上述比率也可以视为人脸验证问题中 x1{x}_{1}x2{x}_{2} 之间相似性的概率度量。在 [1] 中,方程式(1)中的两个条件概率被建模为高斯分布,并使用特征分析进行模型学习和高效计算。

Because of the simplicity and competitive performance [2] of Bayesian face, further progresses have been made along this research lines. For example, Wang and Tang [3] propose a unified framework for subspace face recognition which decomposes the face difference into three subspaces: intrinsic difference, transformation difference and noise. By excluding the transform difference and noise and retaining the intrinsic difference, better performance is obtained. In [4], a random subspace is introduced to handle the multi-model and high dimension problem. The appearance difference can be also computed in any feature space such as Gabor feature [5]. Instead of using a native Bayesian classifier, a SVM is trained in [6] to classify the the difference face which is projected and whitened in an intra-person subspace.

由于贝叶斯人脸识别的简单性和竞争力表现 [2],该领域已取得进一步进展。例如,Wang 和 Tang [3] 提出了一个统一的子空间人脸识别框架,该框架将人脸差异分解为三种子空间:固有差异、变换差异和噪声。通过排除变换差异和噪声,保留固有差异,可以获得更好的性能。在 [4] 中,引入了随机子空间来处理多模型和高维度问题。外观差异也可以在任何特征空间中计算,例如 Gabor 特征 [5]。在 [6] 中,使用支持向量机(SVM)来分类投影和白化后的差异人脸,而不是使用原生的贝叶斯分类器,这些差异人脸被映射到一个同类子空间中。

However, all above Bayesian face methods are generally based on the difference of a given face pair. As illustrated by a 2D example in Fig. 1, modeling the difference is equivalent to first projecting all 2D points on a 1D line (X-Y) and then performing classification in 1D. While such projection can capture the major discriminative information, it may reduce the separability. Therefore, the power of Bayesian face framework may be limited by discarding the discriminative information when we view two classes jointly in the original feature space.

然而,所有上述贝叶斯人脸方法通常基于给定的人脸对之间的差异。如图1所示,使用二维示例来说明,建模差异相当于首先将所有二维点投影到一条一维线(X-Y),然后在一维空间中进行分类。尽管这种投影可以捕捉到主要的区分信息,但它可能会降低可分性。因此,当我们在原始特征空间中联合观察两个类别时,贝叶斯人脸框架的性能可能会受到限制,因为它丢弃了区分性信息。

Fig. 1. The 2-D data is projected to 1-D by xyx - y . The two classes which are separable in joint representation are inseparable after the projecting. "Class1" and "Class2" could be considered as an intra-personal and an extra-personal hypothesis in face recognition.

图1. 二维数据通过 xyx - y 被投影到一维。两个在联合表示中可分的类别,在投影后变得不可分。"Class1" 和 "Class2" 可以被视为人脸识别中的人内假设和人外假设。

In this paper,we propose to directly model the joint distribution of {x1,x2}\left\{ {{x}_{1},{x}_{2}}\right\} for the face verification problem in the same Bayesian framework. We introduce an appropriate prior on face representation: each face is the summation of two independent Gaussian latent variables, i.e., intrinsic variable for identity, and intra-personal variable for within-person variation. Based on this prior, we can effectively learn the parametric models of two latent variables by an EM-like algorithm. Given the learned models,we can obtain joint distributions of {x1,x2}\left\{ {{x}_{1},{x}_{2}}\right\} and derive a closed-form expression of the log likelihood ratio, which makes the computation efficient in the test phase.

在本文中,我们提出在相同的贝叶斯框架下,直接建模用于人脸验证问题的 {x1,x2}\left\{ {{x}_{1},{x}_{2}}\right\} 的联合分布。我们引入了适当的先验知识来表示人脸:每张人脸是两个独立高斯潜变量的和,即表示身份的内在变量和表示个体内变化的人内变量。基于这个先验,我们可以通过类似EM算法的方式有效地学习两个潜变量的参数模型。在学习到这些模型后,我们可以获得 {x1,x2}\left\{ {{x}_{1},{x}_{2}}\right\} 的联合分布,并推导出对数似然比的封闭式表达式,从而使得在测试阶段的计算变得高效。

We also find interesting connections between our joint Bayesian formulation and other two types of face verification methods: metric learning [7-10] and reference-based methods [11-14]. On one hand, the similarity metric derive from our joint formulation is beyond the standard form of the Mahalanobis distance. The new similarity metric preserves the separability in the original feature space and leads to better performance. On the other hand, the joint Bayesian method could be viewed as a kind of reference model with parametric form.

我们还发现我们的联合贝叶斯模型与其他两种人脸验证方法之间存在有趣的联系:度量学习 [7-10] 和基于参考的方法 [11-14]。一方面,源自我们联合模型的相似度度量超出了标准的马哈拉诺比斯距离形式。新的相似度度量保留了原始特征空间中的可分性,并提高了性能。另一方面,联合贝叶斯方法可以看作是一种具有参数形式的参考模型。

Many supervised approaches including ours need a good training data which contains sufficient intra-person and extra-person variations. A good training data should be both "wide" and "deep": having large number of different subjects and having enough images of each subject. However, the current large face datasets in the wild condition suffer from either small width (Pubfig [11]) or small depth (Labeled Faces in Wild (LFW) [15]). To address this issue, we introduce a new dataset, Wide and Deep Reference dataset (WDRef), which is both wide (around 3,000 subjects) and deep (2,000+(2,{000} + subjects with over 15 images, 1,000+1,{000} + subjects with more than 40 images). To facilitate further research and evaluation on supervised methods on the same test bed, we also share two kinds of extracted low-level features of this dataset. The whole dataset can be downloaded from our project website home.ustc.edu.cn/~chendong/J….

许多监督学习方法,包括我们的方法,需要良好的训练数据,这些数据包含足够的个体内和个体间变化。一个好的训练数据应该是既“宽”又“深”的:即包含大量不同的主体,并且每个主体有足够的图像。然而,当前在野外条件下的大型人脸数据集要么宽度较小(Pubfig [11]),要么深度较小(野外标记人脸数据集(LFW) [15])。为了解决这一问题,我们引入了一个新的数据集,广泛且深度的参考数据集(WDRef),该数据集既宽(约3,000个主体)又深 (2,000+(2,{000} +(每个主体有超过15张图像, 1,000+1,{000} + 有超过40张图像的主体)。为了促进在相同测试平台上对监督学习方法的进一步研究和评估,我们还共享了该数据集的两种提取的低级特征。整个数据集可以从我们的网站 home.ustc.edu.cn/~chendong/J… 下载。

Our main contributions can be summarize as followings:

我们的主要贡献可以总结如下:

  • A joint formulation of Bayesian face with an appropriate prior on the face representation. The joint model can be effectively learned from large scale, high-dimension training data, and the verification can be efficiently performed by the derived closed-form solution.

  • 提出了一个联合贝叶斯人脸模型,并在面部表示上使用了适当的先验。该联合模型可以有效地从大规模、高维的训练数据中学习,且通过推导出的闭式解可以高效地进行验证。

  • We demonstrate our joint Bayesian face outperforms the state of arts supervised methods, through comprehensive comparisons on LFW and WDRef. Our simple system achieved better average accuracy than the current best commercial system (face.com) [16]4.

  • 我们通过在 LFW 和 WDRef 上的全面比较,展示了我们联合贝叶斯人脸识别系统的表现优于现有的有监督方法。我们的简单系统在平均准确率上超越了当前最好的商业系统 (face.com) [16]4。

  • A large dataset (with annotations and extracted low-level features) which is both wide and deep is released.

  • 发布了一个大型数据集(包括注释和提取的低级特征),该数据集既广泛又深入。

2 Our Approach: A Joint Formulation

2 我们的方法:一种联合公式

In this section, we first present a naive joint formulation and then introduce our core joint formulation and model learning algorithm.

在本节中,我们首先介绍一种简单的联合公式,然后介绍我们的核心联合公式和模型学习算法。

2.1 A naive formulation

2.1 一种简单的公式

A straightforward joint formulation is to directly model the joint distribution of {x1,x2}\left\{ {{x}_{1},{x}_{2}}\right\} as a Gaussian. Thus,we have P(x1,x2HI)=N(0,I)P\left( {{x}_{1},{x}_{2} \mid {H}_{I}}\right) = N\left( {0,{\sum }_{I}}\right) and P(x1,x2HE)=P\left( {{x}_{1},{x}_{2} \mid {H}_{E}}\right) = N(0,E)N\left( {0,{\sum }_{E}}\right) ,where covariance matrixes I{\sum }_{I} and E{\sum }_{E} can be estimated from the intra-personal pairs and extra-personal pairs respectively. The mean of all faces is subtracted in the preprocessing step. At the test time, the log likelihood ratio between two probabilities is used as the similarity metric. As will be seen in later experiments, such naive formulation is moderately better than the conventional Bayesian face.

一种直接的联合公式是将 {x1,x2}\left\{ {{x}_{1},{x}_{2}}\right\} 的联合分布建模为高斯分布。因此,我们有 P(x1,x2HI)=N(0,I)P\left( {{x}_{1},{x}_{2} \mid {H}_{I}}\right) = N\left( {0,{\sum }_{I}}\right)P(x1,x2HE)=P\left( {{x}_{1},{x}_{2} \mid {H}_{E}}\right) = N(0,E)N\left( {0,{\sum }_{E}}\right),其中协方差矩阵 I{\sum }_{I}E{\sum }_{E} 分别可以从同人对和异人对中估计得到。所有人脸的均值在预处理步骤中被减去。在测试时,两个概率之间的对数似然比被用作相似度度量。正如后续实验中所见,这种简单的公式比传统的贝叶斯人脸识别方法稍微好一些。


4{}^{4} Leveraging an accurate 3D reconstruct and billions training images. But the details have not been published.

4{}^{4} 利用精确的三维重建和数十亿张训练图像。但细节尚未公开。


In above formulation, two covariance matrixes are directly estimated from the data statistics. There are two factors which may limit its performance. First, suppose the face is represented as a dd -dimensional feature,in the naive formulation,we need to estimate the covariance matrix in higher dimension(2d)feature space of [x1x2]\left\lbrack \begin{array}{ll} {x}_{1} & {x}_{2} \end{array}\right\rbrack . We have higher chance to get a less reliable statistic since we do not have sufficient independent training samples. Second, since our collected training samples may not be complectly independent,the estimated E{\sum }_{E} may not be a blockwise diagonal. But in theory,it should be because x1{x}_{1} and x2{x}_{2} are statistically independent.

在上述公式中,两个协方差矩阵是直接从数据统计中估计得到的。存在两个因素可能限制其性能。首先,假设人脸被表示为一个 dd 维特征,在简单的公式中,我们需要在 [x1x2]\left\lbrack \begin{array}{ll} {x}_{1} & {x}_{2} \end{array}\right\rbrack 的二维特征空间中估计协方差矩阵。由于我们没有足够的独立训练样本,这样可能导致得到的统计量不够可靠。其次,由于我们收集的训练样本可能并非完全独立,估计得到的 E{\sum }_{E} 可能不是块对角的。但从理论上讲,它应该是的,因为 x1{x}_{1}x2{x}_{2} 在统计上是独立的。

To deal with these issues, we next introduce a simple prior on the face representation to form a new joint Bayesian formulation. The resulting model can be more reliably and accurately learned.

为了解决这些问题,我们接下来引入了一个简单的先验面部表示,以形成一个新的联合贝叶斯模型。由此得到的模型可以更可靠和更准确地进行学习。

Fig. 2. Prior on face representation: both of the identities distribution (left) and the within-person variation (right) are modeled by Gaussians. Each face instance is represented by the sum of identity and the its variant.

图2. 面部表示的先验:身份分布(左)和个体内变化(右)均通过高斯分布建模。每个面部实例由身份和其变异的和表示。

2.2 A joint formulation

2.2 联合模型

As already observed and used in previous works [17-20], the appearance of a face is influenced by two factors: identity, and intra-personal variation, as shown in Fig. 2. A face is represented by the sum of two independent Gaussian variables:

正如在先前的研究中观察并使用的那样 [17-20],面部外观受两个因素的影响:身份和个体内变化,如图2所示。一个面部由两个独立的高斯变量的和表示:

where xx is the observed face with the mean of all faces subtracted, μ\mu represents its identity, ε\varepsilon is the face variation (e.g.,lightings,poses,and expressions) within the

其中 xx 是观察到的面部,已减去所有面部的均值,μ\mu 代表其身份,ε\varepsilon 是面部变化(例如光照、姿势和表情)。

—— 更多内容请到Doc2X翻译查看—— —— For more content, please visit Doc2X for translations ——