OpenCV-实践指南-三-OpenCV 实践指南（三）八、基于关键点的基本机器学习和对象检测 Abstract 在这

OpenCV 实践指南（三）

原文：Practical OpenCV

协议：CC BY-NC-SA 4.0

八、基于关键点的基本机器学习和对象检测

Abstract

在这激动人心的一章中，我计划讨论以下内容:

一些通用术语，包括关键点和关键点描述符的定义
OpenCV 引以为豪的一些最先进的关键点提取和描述算法，包括 SIFT、SURF、FAST、BRIEF 和 ORB
基本机器学习概念和支持向量机的工作原理(SVM)
我们将使用我们的知识来制作一个基于关键点的对象检测器应用，该应用使用机器学习来使用视觉单词包框架实时检测多个对象。这个应用将对场景中的光照不变性和杂波具有鲁棒性

关键点和关键点描述符:简介和术语

到目前为止，我们一直在开发的基于颜色的对象检测器应用还有很多不足之处。基于颜色(或灰度强度)检测对象有两个明显的缺陷:

仅适用于单色对象。你可以反投影一个纹理物体的色调直方图，但是这很可能包括很多颜色，这将导致大量的误报
会被相同颜色的不同物体所迷惑

但是基于颜色的目标检测非常快。这使得它非常适合环境受到严格控制的应用，但在未知环境中几乎没有用。我将给你们举一个我在宾夕法尼亚大学机器人足球世界杯人形足球队工作时的例子。该比赛是关于让一个由三个人形机器人组成的团队与其他类似的机器人团队进行 100%自主足球比赛。这些机器人需要快速的球、场、场线和球门柱检测，以便有效地比赛，由于规则要求所有这些对象都具有特定的唯一纯色，我们使用基于颜色的对象检测，通过一些逻辑检查进行验证(如球高度和球门柱高宽比)。基于颜色的对象检测在这里工作得很好，因为环境是受控的，对象都是唯一的纯色。但是，如果你想设计一个搜索和救援机器人的视觉系统，你显然不应该依赖颜色来检测物体，因为你不知道你的机器人的工作环境会是什么样子，你感兴趣的物体可以由任意多种颜色组成。有了这个动机，让我们来学习关键点和关键点描述符是什么意思！

泛称

在目标检测问题中，你通常有两组图像。训练集将用于向计算机显示所需对象的外观。这可以通过计算色调直方图(正如您已经知道的)或通过计算关键点和关键点描述符(正如您将很快了解的)来完成。显然，优选的是，训练图像或者被注释(感兴趣的对象的位置由边界框指定)，或者仅包含感兴趣的对象而没有其他内容。测试集包含您的算法将被使用的图像——简而言之，您的应用的用例。

基于关键点的方法是如何工作的？

这种方法的理念是，不应该让计算机“学习”整个对象模板的特征(像计算充满洪水的点的直方图)并在其他图像中寻找类似的实例。相反，应该在对象模板中找到某些“重要”点(关键点),并将关于这些关键点的邻域的信息(关键点描述符)存储为对象的描述。在其他测试图像中，应该找到整个图像中关键点的关键点描述符，并尝试使用某种相似性概念来“匹配”两个描述符集(一个来自对象模板，一个来自测试图像)，并查看有多少描述符匹配。对于包含对象模板中的对象实例的测试图像，您将获得许多匹配，并且这些匹配将具有规则的趋势。图 8-1 将帮助您更好地理解这一点。它显示了一个训练和测试图像，每个图像都有其 SIFT(尺度不变特征变换-一种著名的关键点提取和描述算法)关键点，以及两幅图像之间的匹配描述符。查看所有匹配是如何遵循一个趋势的。

图 8-1。

SIFT keypoint and feature matching

看起来很酷？下一节描述 SIFT 是如何工作的。

筛选关键点和描述符

SIFT 可以说是当今最著名和最广泛实现的关键点检测和描述算法。这就是我选择首先介绍它的原因，希望能够介绍一些与关键点检测和描述相关的其他概念。你可以在 David G. Lowe 的经典论文“来自尺度不变关键点的独特图像特征”中获得该算法的非常详细的描述。对于所有其他算法，我将把讨论限制在该方法的要点上，并提供它们所基于的论文的参考。

关键点描述符通常也称为特征。使用 SIFT 的目标检测是尺度和旋转不变的。这意味着该算法将检测以下对象:

与训练图像相比，在测试图像中具有相同的外观但是更大或更小的尺寸(比例不变性)
相对于训练对象旋转(大约垂直于图像的比例)
表现出这两个条件的组合(旋转不变性)

这是因为 SIFT 提取的关键点具有与其相关联的比例和方向。术语“比例”需要一些解释，因为它在计算机视觉文献中被广泛使用。比例指的是物体被看到时的大小，或者相机离物体有多远。规模也几乎总是相对的。更高的比例意味着对象看起来更小(因为相机移动得更远)，反之亦然。更高的比例通常通过使用高斯内核平滑图像来实现(还记得gaussianBlur()？)和下采样。通过平滑和上采样的类似过程来实现较低的比例。这就是为什么比例也通常由用于实现它的高斯核的方差来指定(对于当前比例σ = 1)。

关键点检测和方向估计

关键点可以像边角一样简单。然而，因为 SIFT 要求关键点具有与其相关联的比例和方向，并且因为一些一旦知道如何构造关键点描述符就将清楚的效率原因，SIFT 使用以下方法来计算图像中的关键点位置:

图 8-5。

Square patch around a keypoint location and gradient orientations with gradient magnitudes (From “Distinctive Image Features from Scale-Invariant Keypoints,” David G. Lowe. Reprinted by permission of Springer Science+Business Media.)

通过使用一些巧妙的数学方法(如果你阅读了论文的第四部分，你就会明白)检查它们是否具有足够的对比度并且它们不是边缘的一部分，从而进一步过滤这样计算出的关键点位置(具有标度)
为了给关键点分配方向，使用它们的比例来选择具有最接近比例的高斯平滑图像(所有这些我们已经计算并存储用于构建狗金字塔),从而以比例不变的方式执行所有计算。在图像(图 8-5 )中选择关键点周围的一个正方形区域，该区域中每一点的梯度方向由以下等式计算(L 为高斯平滑图像)

图 8-4。

Maxima and minima selection by comparison with 26 neighbors (From “Distinctive Image Features from Scale-Invariant Keypoints,” David G. Lowe. Reprinted by permission of Springer Science+Business Media.)

Lowe 在他的论文中已经用数学方法(通过热扩散方程的解)表明，应用 DoG 算子的输出与应用于图像的拉普拉斯高斯算子的输出的近似直接成比例。经历整个过程的要点在于，文献中已经示出，应用于图像的拉普拉斯高斯算子的输出的最大值和最小值是比通过传统的基于梯度的方法获得的那些更稳定的关键点。将高斯算子的拉普拉斯应用于图像涉及在与高斯核卷积之后的二阶微分，而这里我们在应用高斯核之后仅使用图像差异来获得狗金字塔(其与对数金字塔非常接近)。如图 8-4 所示，通过将一个像素与其在金字塔中的 26 个邻居进行比较，获得狗金字塔的最大值和最小值。仅当一个点高于或低于其所有 26 个相邻点时，该点才被选为关键点。这种检查的成本相当低，因为大多数点将在最初的几次检查中被消除。关键点的比例是两个高斯分布中较小的方差的平方根，这两个高斯分布用于产生狗金字塔的特定级别。

图 8-3。

Difference of Gaussians Pyramid (From “Distinctive Image Features from Scale-Invariant Keypoints,” David G. Lowe. Reprinted by permission of Springer Science+Business Media.)

这个金字塔中的连续图像彼此相减。生成的图像被认为是原始图像上高斯差分(DoG)算子的输出，如图 8-3 所示。

图 8-2。

Scale Pyramid (From “Distinctive Image Features from Scale-Invariant Keypoints,” David G. Lowe. Reprinted by permission of Springer Science+Business Media.)

它将图像与方差连续增加的高斯图像进行卷积，从而增加比例(并且每当比例增加一倍时，缩减采样因子为 2；即高斯的方差增加了 4 倍)。这样，就形成了一个“比例金字塔”，如图 8-2 所示。记住，方差为σ ² 的 2-D 高斯函数由下式给出:

使用具有 36 个 10 度仓的方向直方图来收集这些方向。梯度方向对直方图的贡献根据等于关键点尺度的 1.5 倍的σ的高斯核来加权。直方图中的峰值对应于关键点的方向。如果有多个峰值(最高峰值的 80%以内的值)，则在同一位置以多个方向创建多个关键点。

筛选关键点描述符

每个 SIFT 关键点都有一个与之关联的 128 元素描述符。图 8-5 所示的正方形区域被分成 16 个相等的块(图中只显示了四个块)。对于每个块，梯度方向被编入八个仓中，贡献等于梯度幅度和高斯核的乘积，σ等于正方形宽度的一半。为了实现旋转不变性，正方形区域中的点的坐标和这些点处的梯度方向相对于关键点的方向旋转。图 8-6 显示了四个区块的四个这样的八面元直方图(箭头的长度表示相应面元的值)。因此，16 个八箱直方图构成了 128 元素的 SIFT 描述符。最后，128 元素向量被归一化为单位长度，以提供照明不变性。

图 8-6。

Gradient orientation histograms for keypoint description in SIFT (From “Distinctive Image Features from Scale-Invariant Keypoints,” David G. Lowe. Reprinted by permission of Springer Science+Business Media.)

匹配 SIFT 描述符

如果两个 128 元素 SIFT 描述符之间的欧几里德(又名 L2)距离很低，则认为它们匹配。Lowe 还提出了一个过滤这些匹配的条件，以使匹配更加健壮。假设您有两组描述符，一组来自训练图像，另一组来自测试图像，您希望找到鲁棒的匹配。通常，称为“最近邻搜索”的特殊算法被用于找出在欧几里得意义上最接近每个训练描述符的测试描述符。假设您使用最近邻搜索来获得所有训练描述符的 2 个最近邻。如果从训练描述符到其第一最近邻居的距离大于阈值乘以到第二最近邻居的距离，则描述符匹配被消除。这个阈值显然小于 1，通常设置为 0.8。这迫使匹配是唯一的和高度“有区别的”通过降低该阈值的值，可以使该要求更加严格。

OpenCV 有一组丰富的函数(作为类实现),用于各种关键点检测、描述符提取和描述符匹配。这样做的好处是，您可以使用一种算法提取关键点，使用另一种算法提取这些关键点周围的描述符，这样就可以根据您的需要来定制这个过程。

FeatureDetector是所有特定关键点检测类(如 SIFT、SURF、ORB 等)的基类。)都是遗传的。这个父类中的detect()方法接收一个图像并检测关键点，将它们返回到关键点对象的 STL 向量中。KeyPoint是一个通用类，用于存储关键点(位置、方向、比例、过滤器响应和其他一些信息)。SiftFeatureDetector是从FeatureDetector继承来的类，它使用我之前概述的算法从灰度图像中提取关键点。

与FeatureDetector类似，DescriptorExtractor是 OpenCV 中实现的各种描述符提取器的基类。它有一个名为compute()的方法，接受图像和关键点作为输入，并给出一个 Mat，其中每一行都是相应关键点的描述符。对于 SIFT，可以使用SiftDescriptorExtractor。

所有描述符匹配算法都继承自DescriptorMatcher类，该类有一些非常有用的方法。典型的流程是用add()方法将训练描述符放入匹配器对象，用train()方法初始化最近邻搜索数据结构。然后，您可以分别使用match()或knnMatch()方法为查询测试描述符找到最接近的匹配或最近的邻居。有两种方法可以得到最近的邻居:进行强力搜索(遍历每个测试点的所有训练点，BFmatcher())或者使用 OpenCV 包装器，在FlannBasedMatcher()类中实现快速近似最近邻居库(FLANN)。清单 8-1 非常简单:它使用 SIFT 关键点、SIFT 描述符，并使用蛮力方法获得最近邻。一定要仔细阅读它的语法细节。图 8-7 展示了我们第一个基于关键点的目标检测器！

清单 8-1。程序说明 SIFT 关键点和描述符提取，并使用蛮力匹配

// Program to illustrate SIFT keypoint and descriptor extraction, and matching using brute force

// Author: Samarth Manoj Brahmbhatt, University of Pennsylvania

#include <opencv2/opencv.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/nonfree/features2d.hpp>

#include <opencv2/features2d/features2d.hpp>

using namespace cv;

using namespace std;

int main() {

Mat train = imread("template.jpg"), train_g;

cvtColor(train, train_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the train image

vector<KeyPoint> train_kp;

Mat train_desc;

SiftFeatureDetector featureDetector;

featureDetector.detect(train_g, train_kp);

SiftDescriptorExtractor featureExtractor;

featureExtractor.compute(train_g, train_kp, train_desc);

// Brute Force based descriptor matcher object

BFMatcher matcher;

vector<Mat> train_desc_collection(1, train_desc);

matcher.add(train_desc_collection);

matcher.train();

// VideoCapture object

VideoCapture cap(0);

unsigned int frame_count = 0;

while(char(waitKey(1)) != 'q') {

double t0 = getTickCount();

Mat test, test_g;

cap >> test;

if(test.empty())

continue;

cvtColor(test, test_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the test image

vector<KeyPoint> test_kp;

Mat test_desc;

featureDetector.detect(test_g, test_kp);

featureExtractor.compute(test_g, test_kp, test_desc);

// match train and test descriptors, getting 2 nearest neighbors for all test descriptors

vector<vector<DMatch> > matches;

matcher.knnMatch(test_desc, matches, 2);

// filter for good matches according to Lowe's algorithm

vector<DMatch> good_matches;

for(int i = 0; i < matches.size(); i++) {

if(matches[i][0].distance < 0.6 * matches[i][1].distance)

good_matches.push_back(matches[i][0]);

}

Mat img_show;

drawMatches(test, test_kp, train, train_kp, good_matches, img_show);

imshow("Matches", img_show);

cout << "Frame rate = " << getTickFrequency() / (getTickCount() - t0) << endl;

}

return 0;

}

图 8-7。

SIFT keypoint based object detector using the Brute Force matcher

注意使用非常方便的drawMatches()函数来可视化检测和我用来测量帧速率的方法。您可以看到，当没有对象实例时，我们得到的帧速率约为 2.1 fps，当有实例时，我们得到的帧速率为 2.1 fps，如果您考虑的是实时应用，这不是很好。

清单 8-2 和图 8-8 显示，使用基于 FLANN 的匹配器将帧速率提高到 2.2 fps 和 1.8 fps。

清单 8-2。程序说明 SIFT 关键点和描述符提取，并使用 FLANN 匹配

// Program to illustrate SIFT keypoint and descriptor extraction, and matching using FLANN

// Author: Samarth Manoj Brahmbhatt, University of Pennsylvania

#include <opencv2/opencv.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/nonfree/features2d.hpp>

#include <opencv2/features2d/features2d.hpp>

using namespace cv;

using namespace std;

int main() {

Mat train = imread("template.jpg"), train_g;

cvtColor(train, train_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the train image

vector<KeyPoint> train_kp;

Mat train_desc;

SiftFeatureDetector featureDetector;

featureDetector.detect(train_g, train_kp);

SiftDescriptorExtractor featureExtractor;

featureExtractor.compute(train_g, train_kp, train_desc);

// FLANN based descriptor matcher object

FlannBasedMatcher matcher;

vector<Mat> train_desc_collection(1, train_desc);

matcher.add(train_desc_collection);

matcher.train();

// VideoCapture object

VideoCapture cap(0);

unsigned int frame_count = 0;

while(char(waitKey(1)) != 'q') {

double t0 = getTickCount();

Mat test, test_g;

cap >> test;

if(test.empty())

continue;

cvtColor(test, test_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the test image

vector<KeyPoint> test_kp;

Mat test_desc;

featureDetector.detect(test_g, test_kp);

featureExtractor.compute(test_g, test_kp, test_desc);

// match train and test descriptors, getting 2 nearest neighbors for all test descriptors

vector<vector<DMatch> > matches;

matcher.knnMatch(test_desc, matches, 2);

// filter for good matches according to Lowe's algorithm

vector<DMatch> good_matches;

for(int i = 0; i < matches.size(); i++) {

if(matches[i][0].distance < 0.6 * matches[i][1].distance)

good_matches.push_back(matches[i][0]);

}

Mat img_show;

drawMatches(test, test_kp, train, train_kp, good_matches, img_show);

imshow("Matches", img_show);

cout << "Frame rate = " << getTickFrequency() / (getTickCount() - t0) << endl;

}

return 0;

}

图 8-8。

SIFT keypoint based object detector using the FLANN based matcher

浏览要点和描述符

Herbert Bay 等人在他们的论文“SURF:加速的鲁棒特征”中提出了一种关键点检测和描述算法，该算法与 SIFT 一样鲁棒且可重复，但速度是 SIFT 的两倍多。在这里，我将概述这个算法的要点。(如果你想要更详细的信息，请参考论文。)

冲浪关键点检测

SURF 之所以快，是因为它对复杂的连续实值函数使用矩形离散化整数近似。SURF 使用 Hessian 矩阵行列式的最大值。现在，Hessian 矩阵被定义为:

其中:

是高斯二阶导数与点x处的图像I的卷积。

图 8-9 显示了二阶高斯导数及其 SURF 矩形整数近似。

图 8-9。

Second order Gaussian derivatives in the Y and XY direction (left); their box-filter approximations (right). (From “SURF: Speeded Up Robust Features,” by Herbert Bay et al. Reprinted by permission of Springer Science+Business Media)

为了理解为什么 Hessian 矩阵行列式极值表示角状关键点，请记住卷积是关联的。因此，在用高斯滤波器对图像进行平滑之后，Hessian 矩阵的元素也可以被认为是图像的二阶空间导数。如果有突然的强度变化，图像的二阶空间导数将有一个峰值。然而，边缘也可以算作突然的强度变化。矩阵的行列式有助于我们区分边和角。现在，Hessian 矩阵的行列式由下式给出:

从滤波器的结构可以清楚地看出，Lxx 和 Lyy 分别响应于垂直和水平边缘，而 Lxy 最适合于由对角边缘形成的拐角。因此，当有一对水平和垂直边相交时(形成一个角)，或者当有一个由对角边构成的角时，行列式将具有高值。这就是我们想要的。

SURF 使用的矩形近似也称为箱式过滤器。盒式滤波器的使用使得能够使用积分图像，这是一种将卷积运算加速几个数量级的巧妙结构。

积分图像的概念及其在用盒滤波器加速卷积中的应用值得讨论，因为它在图像处理中被广泛使用。所以让我们绕一小段路。

对于任何图像，其在一个像素的积分图像是其从原点(左上角)开始直到该像素的累积和。数学上，如果I是一幅图像，H是积分图像，H的像素(x, y)由下式给出:

可以使用递归方程在线性时间内从原始图像计算积分图像:

积分图像最有趣的特性之一是，一旦你有了它，你就可以用它来得到原始图像中任意大小像素的总和，只需要 4 次运算，使用这个等式(也见图 8-10 ):

图 8-10。

Using integral image to sum up sum across a rectangular region

与核的卷积只是像素值与核元素的逐元素乘法，然后求和。如果内核元素不变，事情就变得简单多了。我们可以将核下的像素值相加，然后将和乘以常数核值。现在你明白为什么箱式过滤器(在矩形区域中具有常数元素的核)和积分图像(帮助你非常快速地合计矩形区域中的像素值的图像)是如此伟大的一对了吧？

为了构建尺度空间金字塔，SURF 增加了高斯滤波器的大小，而不是减小图像的大小。在构建之后，它通过比较金字塔中的一个点与其 26 个邻居来寻找不同尺度下的 Hessian 矩阵行列式值的极值，就像 SIFT 一样。这给了冲浪关键点和它们的比例。

关键点方向通过选择关键点周围半径为关键点比例 6 倍的圆形邻域来确定。在这个邻域的每一点，水平和垂直盒式滤波器(称为 Haar 小波，如图 8-11 所示)的响应被记录。

图 8-11。

Vertical (top) and horizontal (bottom) box filters used for orientation assignment

一旦用高斯函数(σ = 2.5 倍标度)对响应进行加权，它们就被表示为空间中的向量，水平响应强度沿 x 轴，垂直响应强度沿 y 轴。角度为 60 度的滑动弧扫过该空间一圈(图 8-12 )。

图 8-12。

Sliding orientation windows used in SURF

对窗口内的所有响应求和，以给出新的向量；这些向量的数量与滑动窗口的迭代次数一样多。这些向量中最大的一个将其方向借给关键点。

冲浪描述符

获得定向关键点后，在计算 SURF 描述符时会涉及以下步骤:

这个区域被分成 16 个正方形的子区域。在每个次区域中，在 5 x 5 个规则间隔的网格点上计算出一组四个特征。这些特征包括水平和垂直方向上的 Harr 小波响应及其绝对值
这 4 个特征在每个单独的子区域上被总结，并构成每个子区域的四元素描述符。十六个这样的子区域构成了关键点的 64 元素描述符

图 8-13。

Oriented square patches around SURF keypoints

在关键点周围构造一个正方形区域，边长等于关键点比例的 20 倍，方向由关键点的方向决定，如图 8-13 所示。

使用与 SIFT 相同的最近邻距离比策略来匹配 SURF 描述符。

在不牺牲性能的情况下，SURF 的速度至少是 SIFT 的两倍。清单 8-3 显示了基于关键点的对象检测器应用的 SURF 版本，它使用了 SurfFeatureDetector 和 SurfDescriptorExtractor 类。图 8-14 显示 SURF 和 SIFT 一样精确，同时给我们高达 6 fps 的帧速率。

清单 8-3。说明 SURF 关键点和描述符提取以及使用 FLANN 进行匹配的程序

// Program to illustrate SURF keypoint and descriptor extraction, and matching using FLANN

// Author: Samarth Manoj Brahmbhatt, University of Pennsylvania

#include <opencv2/opencv.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/nonfree/features2d.hpp>

#include <opencv2/features2d/features2d.hpp>

using namespace cv;

using namespace std;

int main() {

Mat train = imread("template.jpg"), train_g;

cvtColor(train, train_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the train image

vector<KeyPoint> train_kp;

Mat train_desc;

SurfFeatureDetector featureDetector(100);

featureDetector.detect(train_g, train_kp);

SurfDescriptorExtractor featureExtractor;

featureExtractor.compute(train_g, train_kp, train_desc);

// FLANN based descriptor matcher object

FlannBasedMatcher matcher;

vector<Mat> train_desc_collection(1, train_desc);

matcher.add(train_desc_collection);

matcher.train();

// VideoCapture object

VideoCapture cap(0);

unsigned int frame_count = 0;

while(char(waitKey(1)) != 'q') {

double t0 = getTickCount();

Mat test, test_g;

cap >> test;

if(test.empty())

continue;

cvtColor(test, test_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the test image

vector<KeyPoint> test_kp;

Mat test_desc;

featureDetector.detect(test_g, test_kp);

featureExtractor.compute(test_g, test_kp, test_desc);

// match train and test descriptors, getting 2 nearest neighbors for all test descriptors

vector<vector<DMatch> > matches;

matcher.knnMatch(test_desc, matches, 2);

// filter for good matches according to Lowe's algorithm

vector<DMatch> good_matches;

for(int i = 0; i < matches.size(); i++) {

if(matches[i][0].distance < 0.6 * matches[i][1].distance)

good_matches.push_back(matches[i][0]);

}

Mat img_show;

drawMatches(test, test_kp, train, train_kp, good_matches, img_show);

imshow("Matches", img_show);

cout << "Frame rate = " << getTickFrequency() / (getTickCount() - t0) << endl;

}

return 0;

}

图 8-14。

SURF-based object detector using FLANN matching

请注意，SIFT 和 SURF 算法都在美国获得了专利(可能在其他一些国家也是如此)，因此您将无法在商业应用中使用它们。

ORB(定向快速旋转简报)

ORB 是一种关键点检测和描述技术，它的流行程度正赶上 SIFT 和 SURF。它以极快的运算速度而闻名，同时几乎没有牺牲性能精度。ORB 是比例和旋转不变的，对噪声和仿射变换具有鲁棒性，并且仍然能够提供 25 fps 的帧速率！

该算法实际上是快速(来自加速段测试的特征)关键点检测与添加到该算法中的定向以及被修改以处理定向关键点的 BRIEF(二进制鲁棒独立基本特征)关键点描述符算法的组合。在进一步描述 ORB 之前，您可以阅读以下文章，以获得关于这些算法的详细信息:

“ORB:筛选或冲浪的有效替代方案”，作者 Ethan Rublee 等人。
“更快更好:角点检测的机器学习方法”，Edward Rosten 等人。
“简介:二进制鲁棒独立基本特征”，迈克尔·科兰德等人。

定向快速关键点

最初的快速关键点检测器在围绕一个像素的一个圆圈中测试 16 个像素。如果中心像素比 16 个像素中的阈值数量更暗或更亮，则确定为拐角。为了使这个过程更快，使用机器学习方法来决定检查 16 个像素的有效顺序。ORB 中 FAST 的自适应通过制作图像的比例金字塔来检测多个比例的角点，并通过找到强度质心来为这些角点添加方向。像素块的强度质心由下式给出:其中

补片的方向是连接补片中心和强度质心的向量的方向。具体来说:

简要描述

BRIEF 的基本原理是，图像中的关键点可以通过围绕该关键点的一系列二进制像素强度测试来充分描述。这是通过选取关键点周围的像素对(根据随机或非随机采样模式)然后比较两个强度来完成的。如果第一个像素的亮度高于第二个像素的亮度，测试返回 1，否则返回 0。由于所有这些输出都是二进制的，所以可以将它们打包成字节，以便有效地存储。一个很大的优点是，由于描述符是二进制的，两个描述符之间的距离度量是汉明的，而不是欧几里得的。两个相同长度的二进制字符串之间的汉明距离是它们之间不同的位数。通过在两个描述符之间进行逐位 XOR 运算，然后计算 1 的数量，可以非常有效地实现汉明距离。

ORB 中的简短实现使用机器学习算法来获得一个配对挑选模式，以挑选 256 个将捕获最多信息的配对，看起来有点像图 8-15 。

图 8-15。

Typical pair-picking pattern for BRIEF

为了补偿关键点的方向，在挑选对并执行 256 二进制测试之前，围绕关键点的补片的坐标被旋转该方向。

清单 8-4 使用 ORB 关键点和特性实现了基于关键点的对象检测器，图 8-16 显示帧速率上升到了 28 fps，而性能没有受到太大影响。我们还使用局部敏感哈希(LSH)算法来执行基于 FLANN 的搜索，这进一步加快了基于汉明距离的最近邻搜索。

清单 8-4。程序说明 ORB 关键点和描述符提取，并使用弗兰恩 LSH 匹配

// Program to illustrate ORB keypoint and descriptor extraction, and matching using FLANN-LSH

// Author: Samarth Manoj Brahmbhatt, University of Pennsylvania

#include <opencv2/opencv.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/nonfree/features2d.hpp>

#include <opencv2/features2d/features2d.hpp>

using namespace cv;

using namespace std;

int main() {

Mat train = imread("template.jpg"), train_g;

cvtColor(train, train_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the train image

vector<KeyPoint> train_kp;

Mat train_desc;

OrbFeatureDetector featureDetector;

featureDetector.detect(train_g, train_kp);

OrbDescriptorExtractor featureExtractor;

featureExtractor.compute(train_g, train_kp, train_desc);

cout << "Descriptor depth " << train_desc.depth() << endl;

// FLANN based descriptor matcher object

flann::Index flannIndex(train_desc, flann::LshIndexParams(12, 20, 2), cvflann::FLANN_DIST_HAMMING);

// VideoCapture object

VideoCapture cap(0);

unsigned int frame_count = 0;

while(char(waitKey(1)) != 'q') {

double t0 = getTickCount();

Mat test, test_g;

cap >> test;

if(test.empty())

continue;

cvtColor(test, test_g, CV_BGR2GRAY);

//detect SIFT keypoints and extract descriptors in the test image

vector<KeyPoint> test_kp;

Mat test_desc;

featureDetector.detect(test_g, test_kp);

featureExtractor.compute(test_g, test_kp, test_desc);

// match train and test descriptors, getting 2 nearest neighbors for all test descriptors

Mat match_idx(test_desc.rows, 2, CV_32SC1), match_dist(test_desc.rows, 2, CV_32FC1);

flannIndex.knnSearch(test_desc, match_idx, match_dist, 2, flann::SearchParams());

// filter for good matches according to Lowe's algorithm

vector<DMatch> good_matches;

for(int i = 0; i < match_dist.rows; i++) {

if(match_dist.at<float>(i, 0) < 0.6 * match_dist.at<float>(i, 1)) {

DMatch dm(i, match_idx.at<int>(i, 0), match_dist.at<float>(i, 0));

good_matches.push_back(dm);

}

Mat img_show;

drawMatches(test, test_kp, train, train_kp, good_matches, img_show);

imshow("Matches", img_show);

cout << "Frame rate = " << getTickFrequency() / (getTickCount() - t0) << endl;

}

return 0;

}

图 8-16。

ORB keypoint based object detecor

基础机器学习

在本节中，我们将讨论一些入门级的机器学习概念，我们将在下一节中使用这些概念来制作我们的终极对象检测器应用！机器学习(ML)主要研究让机器(计算机)从经验中学习的技术。机器可以学习解决两种类型的问题:

分类:预测一个数据样本属于哪个类别。因此，分类算法的输出是一个类别，它是离散的，而不是连续的。例如，我们将在本章稍后处理的机器学习问题是确定图像中存在哪个对象(来自一系列不同的对象)。这是一个分类问题，因为输出是对象类别，这是一个离散的类别号
回归:预测一个很难用方程表达的函数的输出。例如，学习从以前的数据预测股票价格的算法就是回归算法。注意，输出是一个连续的数字——它可以取任何值

ML 算法的训练输入由两部分组成:

特征:人们认为与手头问题相关的参数。例如，对于我们之前讨论的对象分类问题，特征可以是图像的 SURF 描述符。典型地，ML 算法被给予来自不是一个而是许多数据实例的特征，以“教导”它许多可能的输入条件
标签:与数据实例相关联的每组特性都有相应的“正确响应”这叫做标签。例如，对象分类问题中的标签可以是训练图像中存在的对象类别。分类问题会有离散标签，而回归算法会有连续标签。

特征和标签一起构成了 ML 算法所需的“经验”。

有太多的 ML 算法了。朴素贝叶斯分类器、逻辑回归器、支持向量机和隐马尔可夫模型是一些例子。每种算法都有自己的优缺点。我们将在这里简要讨论支持向量机(SVM)的工作原理，然后继续在我们的 object detector 应用中使用 OpenCV 实现支持向量机。

支持向量机

由于其通用性和易用性，支持向量机是当今使用最广泛的 ML 算法。典型的 SVM 是一种分类算法，但一些研究人员已经将其修改为执行回归。基本上，给定一组带标签的数据点，SVM 试图找到一个能最好地正确分离数据的超平面。超平面是给定维数的线性结构。例如，如果数据是二维的，则它是一条线，三维的是一个平面，四维的是一个“四维平面”，依此类推。所谓“最佳分离”，我的意思是从超平面到超平面两侧最近点的距离必须相等，并且尽可能最大。图 8-17 中显示了一个二维的简单例子，其中直线是超平面。

图 8-17。

SVM classifier hyperplane

敏锐的读者可能会说，并不是所有的数据点排列都能被超平面正确地分离。例如，如果你有四个带标签的点，如图 8-18 所示，没有一条线可以正确地将它们全部分开。

图 8-18。

SVM XOR problem

为了解决这个问题，支持向量机使用了核的概念。内核是将数据从一种配置映射到另一种配置的函数，可能会改变维度。例如，可以解决图 8-18 中问题的内核将是一个可以围绕x1 = x2线“折叠”平面的内核(图 8-18 )。如果使用核，SVM 算法会在由核变换的特征空间中找到一个用于正确分类的超平面。然后通过核变换每个测试输入，并使用训练好的超平面进行分类。超平面在原始特征空间中可能不保持线性，但是在变换后的特征空间中是线性的。

在 OpenCV 中，CvSVM类及其train()和predict()方法可以分别用于训练使用不同内核的 SVM 和使用训练好的 SVM 进行预测。

对象分类

在本节中，我们将开发一个对象检测器应用，它使用关键点的 SURF 描述符作为特征，并使用支持向量机来预测图像中存在的对象的类别。对于这个应用，我们将使用 CMake 构建系统，这使得配置代码项目并将其链接到各种库变得轻而易举。我们还将使用 Boost C++库的filesystem组件来遍历我们的数据文件夹和 STL 数据结构map和multimap，以一种易于访问的方式存储和组织数据。编写代码中的注释是为了使所有内容都易于理解，但是鼓励您在继续之前先了解一些关于 CMake 和 STL 数据结构映射和重映射的基础知识。

战略

在这里，我概述了我们将使用的策略，它是由 Gabriella Csurka 等人在他们的论文“用关键点包进行视觉分类”中提出的，并在 OpenCV features2d模块中得到了很好的实现。

在所有模板中的关键点处计算(冲浪)描述符，并将它们汇集在一起
将相似的描述符分组到任意数量的簇中。这里，相似性由 64 元素 SURF 描述符之间的欧几里德距离决定，分组由BOWKMeansTrainer类的cluster()方法完成。这些集群在论文中被称为“关键点包”或“视觉单词”，它们共同代表了程序的“词汇”。每个聚类都有一个聚类中心，它可以被认为是属于该聚类的所有描述符的代表描述符
现在，计算 SVM 分类器的训练数据。这由以下人员完成
- 为每个训练图像计算(SURF)描述符
- 通过到聚类中心的欧几里德距离将这些描述符中的每一个与词汇表中的一个聚类相关联
- 根据这种关联制作直方图。该直方图具有与词汇表中的聚类一样多的仓。每个箱计数训练图像中有多少描述符与对应于该箱的聚类相关联。直观地说，这个直方图描述了图像“词汇”中的“视觉单词”，被称为图像的“视觉单词包描述符”。这是通过使用BOWImageDescriptorExtractor类的compute()方法来完成的
之后，使用训练数据为每一类对象训练一个一对一 SVM。详细信息:
- 对于每个类别，正数据示例是其训练图像的 BOW 描述符，而负数据示例是所有其他类别的训练图像的 BOW 描述符
- 正例标为 1，反例标为 0
- 这两个都给了CvSVM类的train()方法来训练一个 SVM。因此，每个类别都有一个 SVM
- 支持向量机也能够进行多类分类，但是直观上(和数学上)分类器更容易决定一个数据样本是否属于一个类，而不是决定一个数据样本属于多个类中的哪个类
现在，使用训练好的支持向量机进行分类。详细信息:
- 从相机中捕捉一个图像并计算弓描述符，同样使用BOWImageDescriptorExtractor类的compute()方法
- 在使用CvSVM类的predict()方法时，将此描述提供给 SVM 进行预测
- 对于每个类别，相应的 SVM 将告诉我们描述符所描述的图像是否属于该类别，以及它对其决定的信心程度。这个指标越小，SVM 对自己的决定越有信心
- 选择度量值最小的类别作为检测到的类别

组织

项目文件夹应如图 8-19 所示进行组织。

图 8-19。

Project root folder organization

图 8-20 显示了“数据”文件夹的结构。它包含两个名为“模板”和“训练图像”的文件夹。“模板”文件夹包含显示我们要根据类别名称分类的对象的图像。如图 8-20 所示，我的类别被称为“a”、“b”和“c”。“train _ images”文件夹中的文件夹与模板一样多，同样以对象类别命名。每个文件夹都包含用于训练 SVM 进行分类的图像，图像本身的名称并不重要。

图 8-20。

Organization of the two folders in the “data” folder—“templates” and “train_images”

图 8-21 显示了我的三个类别的模板和训练图像。请注意，模板被裁剪为只显示对象，不显示其他内容。

图 8-21。

(Top to bottom) Templates and training images for categories ”a,” “b,” and “c”

现在来看看文件CmakeLists.txt和Config.h.in，分别如清单 8-5 和 8-6 所示。CmakeLists.txt是主文件，在我们的例子中，它被我的 CMake 用来设置项目的所有属性和链接库(OpenCV 和 Boost)的位置。Config.h.in设置我们的文件夹路径配置。这将在“include”文件夹中自动生成文件Config.h，该文件将包含在我们的源 CPP 文件中，并将在变量TEMPLATE_FOLDER和TRAIN_FOLDER中分别定义存储我们的模板和训练图像的文件夹的路径。

清单 8-5: CmakeLists.txt

# Minimum required CMake version

cmake_minimum_required(VERSION 2.8)

# Project name

project(object_categorization)

# Find the OpenCV installation

find_package(OpenCV REQUIRED)

# Find the Boost installation, specifically the components 'system' and 'filesystem'

find_package(Boost COMPONENTS system filesystem REQUIRED)

# ${PROJECT_SOURCE_DIR} is the name of the root directory of the project

# TO_NATIVE_PATH converts the path ${PROJECT_SOURCE_DIR}/data/ to a full path and the file() command stores it in DATA_FOLDER

file(TO_NATIVE_PATH "${PROJECT_SOURCE_DIR}/data/" DATA_FOLDER)

# set TRAIN_FOLDER to DATA_FOLDER/train_images - this is where we will put our templates for constructing the vocabulary

set(TRAIN_FOLDER "${DATA_FOLDER}train_images/")

# set TEMPLATE_FOLDER to DATA_FOLDER/templates - this is where we will put our traininfg images, in folders organized by category

set(TEMPLATE_FOLDER "${DATA_FOLDER}templates/")

# set the configuration input file to ${PROJECT_SOURCE_DIR}/Config.h.in and the includable header file holding configuration information to ${PROJECT_SOURCE_DIR}/include/Config.h

configure_file("${PROJECT_SOURCE_DIR}/Config.h.in" "${PROJECT_SOURCE_DIR}/include/Config.h")

# Other directories where header files for linked libraries can be found

include_directories(${OpenCV_INCLUDE_DIRS} "${PROJECT_SOURCE_DIR}/include" ${Boost_INCLUDE_DIRS})

# executable produced as a result of compilation

add_executable(code8-5 src/code8-5.cpp)

# libraries to be linked with this executable - OpenCV and Boost (system and filesystem components)

target_link_libraries(code8-5 ${OpenCV_LIBS} ${Boost_SYSTEM_LIBRARY} ${Boost_FILESYSTEM_LIBRARY})

代码 8-6:配置文件

// Preprocessor directives to set variables from values in the CMakeLists.txt files

#define DATA_FOLDER "@DATA_FOLDER@"

#define TRAIN_FOLDER "@TRAIN_FOLDER@"

#define TEMPLATE_FOLDER "@TEMPLATE_FOLDER@"

文件夹的这种组织方式，结合配置文件，将允许我们自动有效地管理和读取我们的数据集，并使整个算法可扩展到任何数量的对象类别，当您通读主代码时，您会发现这一点。

主代码应该放在名为“src”的文件夹中，并根据CMakeLists.txt文件中的add_executable()命令中提到的文件名命名，如清单 8-7 所示。它被大量注释以帮助你理解正在发生的事情。我将在后面讨论代码的一些特性，但是和往常一样，我们鼓励您在在线 OpenCV 文档中查找函数！

清单 8-7。演示 BOW 对象分类的程序

// Program to illustrate BOW object categorization

// Author: Samarth Manoj Brahmbhatt, University of Pennsylvania

#include <opencv2/opencv.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/features2d/features2d.hpp>

#include <opencv2/nonfree/features2d.hpp>

#include <opencv2/ml/ml.hpp>

#include <boost/filesystem.hpp>

#include "Config.h"

using namespace cv;

using namespace std;

using namespace boost::filesystem3;

class categorizer {

private:

map<string, Mat> templates, objects, positive_data, negative_data; //maps from category names to data

multimap<string, Mat> train_set; //training images, mapped by category name

map<string, CvSVM> svms; //trained SVMs, mapped by category name

vector<string> category_names; //names of the categories found in TRAIN_FOLDER

int categories; //number of categories

int clusters; //number of clusters for SURF features to build vocabulary

Mat vocab; //vocabulary

// Feature detectors and descriptor extractors

Ptr<FeatureDetector> featureDetector;

Ptr<DescriptorExtractor> descriptorExtractor;

Ptr<BOWKMeansTrainer> bowtrainer;

Ptr<BOWImgDescriptorExtractor> bowDescriptorExtractor;

Ptr<FlannBasedMatcher> descriptorMatcher;

void make_train_set(); //function to build the training set multimap

void make_pos_neg(); //function to extract BOW features from training images and organize them into positive and negative samples

string remove_extension(string); //function to remove extension from file name, used for organizing templates into categories

public:

categorizer(int); //constructor

void build_vocab(); //function to build the BOW vocabulary

void train_classifiers(); //function to train the one-vs-all SVM classifiers for all categories

void categorize(VideoCapture); //function to perform real-time object categorization on camera frames

};

string categorizer::remove_extension(string full) {

int last_idx = full.find_last_of(".");

string name = full.substr(0, last_idx);

return name;

}

categorizer::categorizer(int _clusters) {

clusters = _clusters;

// Initialize pointers to all the feature detectors and descriptor extractors

featureDetector = (new SurfFeatureDetector());

descriptorExtractor = (new SurfDescriptorExtractor());

bowtrainer = (new BOWKMeansTrainer(clusters));

descriptorMatcher = (new FlannBasedMatcher());

bowDescriptorExtractor = (new BOWImgDescriptorExtractor(descriptorExtractor, descriptorMatcher));

// Organize the object templates by category

// Boost::filesystem directory iterator

for(directory_iterator i(TEMPLATE_FOLDER), end_iter; i != end_iter; i++) {

// Prepend full path to the file name so we can imread() it

string filename = string(TEMPLATE_FOLDER) + i->path().filename().string();

// Get category name by removing extension from name of file

string category = remove_extension(i->path().filename().string());

Mat im = imread(filename), templ_im;

objects[category] = im;

cvtColor(im, templ_im, CV_BGR2GRAY);

templates[category] = templ_im;

}

cout << "Initialized" << endl;

// Organize training images by category

make_train_set();

}

void categorizer::make_train_set() {

string category;

// Boost::filesystem recursive directory iterator to go through all contents of TRAIN_FOLDER

for(recursive_directory_iterator i(TRAIN_FOLDER), end_iter; i != end_iter; i++) {

// Level 0 means a folder, since there are only folders in TRAIN_FOLDER at the zeroth level

if(i.level() == 0) {

// Get category name from name of the folder

category = (i -> path()).filename().string();

category_names.push_back(category);

}

// Level 1 means a training image, map that by the current category

else {

// File name with path

string filename = string(TRAIN_FOLDER) + category + string("/") + (i -> path()).filename().string();

// Make a pair of string and Mat to insert into multimap

pair<string, Mat> p(category, imread(filename, CV_LOAD_IMAGE_GRAYSCALE));

train_set.insert(p);

}

// Number of categories

categories = category_names.size();

cout << "Discovered " << categories << " categories of objects" << endl;

}

void categorizer::make_pos_neg() {

// Iterate through the whole training set of images

for(multimap<string, Mat>::iterator i = train_set.begin(); i != train_set.end(); i++) {

// Category name is the first element of each entry in train_set

string category = (*i).first;

// Training image is the second elemnt

Mat im = (*i).second, feat;

// Detect keypoints, get the image BOW descriptor

vector<KeyPoint> kp;

featureDetector -> detect(im, kp);

bowDescriptorExtractor -> compute(im, kp, feat);

// Mats to hold the positive and negative training data for current category

Mat pos, neg;

for(int cat_index = 0; cat_index < categories; cat_index++) {

string check_category = category_names[cat_index];

// Add BOW feature as positive sample for current category ...

if(check_category.compare(category) == 0)

positive_data[check_category].push_back(feat);

//... and negative sample for all other categories

else

negative_data[check_category].push_back(feat);

}

// Debug message

for(int i = 0; i < categories; i++) {

string category = category_names[i];

cout << "Category " << category << ": " << positive_data[category].rows << " Positives, " << negative_data[category].rows << " Negatives" << endl;

}

void categorizer::build_vocab() {

// Mat to hold SURF descriptors for all templates

Mat vocab_descriptors;

// For each template, extract SURF descriptors and pool them into vocab_descriptors

for(map<string, Mat>::iterator i = templates.begin(); i != templates.end(); i++) {

vector<KeyPoint> kp; Mat templ = (*i).second, desc;

featureDetector -> detect(templ, kp);

descriptorExtractor -> compute(templ, kp, desc);

vocab_descriptors.push_back(desc);

}

// Add the descriptors to the BOW trainer to cluster

bowtrainer -> add(vocab_descriptors);

// cluster the SURF descriptors

vocab = bowtrainer->cluster();

// Save the vocabulary

FileStorage fs(DATA_FOLDER "vocab.xml", FileStorage::WRITE);

fs << "vocabulary" << vocab;

fs.release();

cout << "Built vocabulary" << endl;

}

void categorizer::train_classifiers() {

// Set the vocabulary for the BOW descriptor extractor

bowDescriptorExtractor -> setVocabulary(vocab);

// Extract BOW descriptors for all training images and organize them into positive and negative samples for each category

make_pos_neg();

for(int i = 0; i < categories; i++) {

string category = category_names[i];

// Postive training data has labels 1

Mat train_data = positive_data[category], train_labels = Mat::ones(train_data.rows, 1, CV_32S);

// Negative training data has labels 0

train_data.push_back(negative_data[category]);

Mat m = Mat::zeros(negative_data[category].rows, 1, CV_32S);

train_labels.push_back(m);

// Train SVM!

svms[category].train(train_data, train_labels);

// Save SVM to file for possible reuse

string svm_filename = string(DATA_FOLDER) + category + string("SVM.xml");

svms[category].save(svm_filename.c_str());

cout << "Trained and saved SVM for category " << category << endl;

}

void categorizer::categorize(VideoCapture cap) {

cout << "Starting to categorize objects" << endl;

namedWindow("Image");

while(char(waitKey(1)) != 'q') {

Mat frame, frame_g;

cap >> frame;

imshow("Image", frame);

cvtColor(frame, frame_g, CV_BGR2GRAY);

// Extract frame BOW descriptor

vector<KeyPoint> kp;

Mat test;

featureDetector -> detect(frame_g, kp);

bowDescriptorExtractor -> compute(frame_g, kp, test);

// Predict using SVMs for all catgories, choose the prediction with the most negative signed distance measure

float best_score = 777;

string predicted_category;

for(int i = 0; i < categories; i++) {

string category = category_names[i];

float prediction = svms[category].predict(test, true);

//cout << category << " " << prediction << " ";

if(prediction < best_score) {

best_score = prediction;

predicted_category = category;

}

//cout << endl;

// Pull up the object template for the detected category and show it in a separate window

imshow("Detected object", objects[predicted_category]);

}

int main() {

// Number of clusters for building BOW vocabulary from SURF features

int clusters = 1000;

categorizer c(clusters);

c.build_vocab();

c.train_classifiers();

VideoCapture cap(0);

namedWindow("Detected object");

c.categorize(cap);

return 0;

}

我想在代码中强调的唯一语法概念是 STL 数据结构map和multimap的使用。它们允许您以(键，值)对的形式存储数据，其中键和值几乎可以是任何数据类型。通过用键索引映射，可以访问与键关联的值。在这里，我们将关键字设置为对象类别名称，这样我们就可以使用类别名称作为映射的索引，轻松地访问特定于类别的模板、训练图像和 SVM。我们必须使用multimap来组织训练图像，因为一个类别可以有多个训练图像。这个技巧允许程序很容易地使用任意数量的对象类别！

摘要

这是本书的主要章节之一，原因有二。首先，我希望，它向您展示了 OpenCV、STL 数据结构和面向对象编程方法在一起使用时的全部威力。第二，您现在拥有了能够完成任何中等物体探测器应用的工具。在许多机器人应用中，基于 SIFT、SURF 和 ORB 关键点的对象检测器是非常常见的基本级对象检测器。我们看到 SIFT 是三者中最慢的(但也是匹配最准确的，能够提取最多数量的有意义的关键点)，而计算和匹配 ORB 描述符非常快(但 ORB 往往会错过一些关键点)。SURF 介于两者之间，但更倾向于准确性而不是速度。一个有趣的事实是，由于 OpenCV 的features2d模块的结构，你可以使用一种关键点提取器和另一种描述符提取器和匹配器。例如，可以使用 SURF 关键点来提高速度，然后在这些关键点处计算和匹配 SIFT 描述符来提高匹配精度。

我还讨论了基本的机器学习，这是我认为每个计算机视觉科学家都必须具备的技能，因为你的视觉程序越自动化和智能化，你的机器人就越酷！

CMake 构建系统也是大型项目中简化构建管理的标准，现在您已经知道了使用它的基本知识！

下一章将讨论如何使用关键点描述符匹配的知识来找出从不同视角观察到的物体图像之间的对应关系。你还将看到这些投影关系如何让你从一堆图像中制作出美丽的无缝全景图！

Footnotes 1

http://www.cs.ubc.ca/∼lowe/papers/ijcv04.pdf

九、仿射和透视变换及其在图像全景中的应用

Abstract

在这一章中，你将学习两种重要的几何图像变换——仿射和透视——以及如何在你的代码中用矩阵来表示和使用它们。这将作为下一章的基础知识，下一章涉及立体视觉和大量的 3D 图像几何。

几何图像变换只是遵循几何规则的图像变换。最简单的几何变换是旋转和缩放图像。还可以有其他更复杂的几何变换。这些变换的另一个特性是它们都是线性的，因此可以表示为矩阵，图像的变换相当于矩阵乘法。可以想象，给定两幅图像(一幅是原始的，另一幅是变换的)，如果两幅图像之间有足够的点对应，就可以恢复变换矩阵。您将学习如何通过使用用户点击的图像之间的点对应来恢复仿射和透视变换。稍后，您还将学习如何通过在关键点匹配描述符来自动完成寻找对应的过程。哦，通过学习如何使用 OpenCV 的优秀拼接模块将一堆图像拼接在一起制作美丽的全景图，你将能够运用所有这些知识！

仿射变换

仿射变换是在变换后保持线条“平行性”的任何线性变换。它还保留点作为点、直线作为直线以及点沿直线的距离比。它不保持直线之间的角度。仿射变换包括图像的所有类型的旋转、平移和镜像。现在让我们看看仿射变换是如何用矩阵表示的。

设(x, y)是原始图像中某点的坐标，而(x', y')是变换后该点在变换图像中的坐标。不同的转换包括:

缩放:x' = a*x, y' = b*y
翻转 X 和 Y 坐标:x' = -x, y' = -y
绕原点逆时针旋转角度θ: x' = x*cos( θ )—y*sin( θ ), y' = x*sin( θ ) + y*cos( θ )

因为所有的几何变换都是线性的，我们可以通过一个 2x2 矩阵M的矩阵乘法将(x', y')与(x, y)联系起来:

(x', y') = M * (x, y)

对于上述三种变换，矩阵 M 采用以下形式:

缩放:，其中a是 X 坐标的缩放因子，而b是 Y 坐标的缩放因子
翻转 X 和 Y 坐标:
围绕原点逆时针旋转角度θ:

除了翻转矩阵，所有仿射变换矩阵的 2×2 部分的行列式必须是+1。

应用仿射变换

在 OpenCV 中，很容易构建仿射变换矩阵并将该变换应用于图像。让我们首先看看应用仿射变换的函数，以便我们可以更好地理解 OpenCV 仿射变换矩阵的结构。函数warpAffine()获取一幅源图像和一个 2×3 矩阵M，并给出一幅变换后的输出图像。假设 M 的形式为:

warpAffine()应用以下变换:

手动构造要给warpAffine()的矩阵时，一个潜在的错误来源是 OpenCV 将原点放在图像的左上角。这一事实不会影响缩放变换，但会影响翻转和旋转变换。具体来说，为了成功翻转，warpAffine()的M输入必须为:

OpenCV 函数getRotationMatrix2D()给出一个进行旋转的 2×3 仿射变换矩阵。它将旋转角度(从水平轴逆时针测量)和旋转中心作为输入。对于正常旋转，您可能希望旋转的中心位于图像的中心。清单 9-1 展示了如何使用getRotationMatrix2D()获得一个旋转矩阵，并使用warpAffine()将其应用到一幅图像上。图 9-1 显示了原始图像和仿射变换图像。

清单 9-1。程序来说明一个简单的仿射变换

//Program to illustrate a simple affine transform

//Author: Samarth Manoj Brahmbhatt, University of Pennsyalvania

#include <opencv2/opencv.hpp>

#include <opencv2/stitching/stitcher.hpp>

#include <opencv2/stitching/warpers.hpp>

#include "Config.h"

using namespace std;

using namespace cv;

int main() {

Mat im = imread(DATA_FOLDER_1 + string("/image.jpg")), im_transformed;

imshow("Original", im);

int rotation_degrees = 30;

// Construct Affine rotation matrix

Mat M = getRotationMatrix2D(Point(im.cols/2, im.rows/2), rotation_degrees, 1);

cout << M << endl;

// Apply Affine transform

warpAffine(im, im_transformed, M, im.size(), INTER_LINEAR);

imshow("Transformed", im_transformed);

while(char(waitKey(1)) != 'q') {}

return 0;

}

图 9-1。

Applying simple Affine transforms

估计仿射变换

有时，您知道一个图像通过仿射(或近似仿射)变换与另一个图像相关，并且您想要获得仿射变换矩阵以用于一些其他计算(例如，估计相机的旋转)。OpenCV 函数getAffineTransform()对于这样的应用来说很方便。这个想法是，如果你在两幅图像中有三对对应点，你可以使用简单的数学方法恢复它们之间的仿射变换。这是因为每一对给你两个方程(一个与 X 坐标相关，一个与 Y 坐标相关)。因此，你需要三个这样的对来求解 2×3 仿射变换矩阵的所有六个元素。getAffineTransform()通过引入两个各含三个点 2f 的向量来为您解方程——一个是原始点，一个是变换点。在清单 9-2 中，用户被要求点击两幅图像中相应的点。这些点用于恢复仿射变换。为了验证恢复的变换是正确的，还向用户显示原始变换图像和由恢复的仿射变换变换的未变换图像之间的差异。图 9-2 显示恢复的仿射变换实际上是正确的(差分图像几乎全是零——黑色)。

清单 9-2。说明仿射变换恢复的程序

//Program to illustrate affine transform recovery

//Author: Samarth Manoj Brahmbhatt, University of Pennsyalvania

#include <opencv2/opencv.hpp>

#include <opencv2/stitching/stitcher.hpp>

#include <opencv2/stitching/warpers.hpp>

#include <opencv2/highgui/highgui.hpp>

#include "Config.h"

using namespace std;

using namespace cv;

// Mouse callback function

void on_mouse(int event, int x, int y, int, void* _p) {

Point2f* p = (Point2f *)_p;

if (event == CV_EVENT_LBUTTONUP) {

p->x = x;

p->y = y;

}

class affine_transformer {

private:

Mat im, im_transformed, im_affine_transformed, im_show, im_transformed_show;

vector<Point2f> points, points_transformed;

Mat M; // Estimated Affine transformation matrix

Point2f get_click(string, Mat);

public:

affine_transformer(); //constructor

void estimate_affine();

void show_diff();

};

affine_transformer::affine_transformer() {

im = imread(DATA_FOLDER_2 + string("/image.jpg"));

im_transformed = imread(DATA_FOLDER_2 + string("/transformed.jpg"));

}

// Function to get location clicked by user on a specific window

Point2f affine_transformer::get_click(string window_name, Mat im) {

Point2f p(-1, -1);

setMouseCallback(window_name, on_mouse, (void *)&p);

while(p.x == -1 && p.y == -1) {

imshow(window_name, im);

waitKey(20);

}

return p;

}

void affine_transformer::estimate_affine() {

imshow("Original", im);

imshow("Transformed", im_transformed);

cout << "To estimate the Affine transform between the original and transformed images you will have to click on 3 matching pairs of points" << endl;

im_show = im.clone();

im_transformed_show = im_transformed.clone();

Point2f p;

// Get 3 pairs of matching points from user

for(int i = 0; i < 3; i++) {

cout << "Click on a distinguished point in the ORIGINAL image" << endl;

p = get_click("Original", im_show);

cout << p << endl;

points.push_back(p);

circle(im_show, p, 2, Scalar(0, 0, 255), -1);

imshow("Original", im_show);

cout << "Click on a distinguished point in the TRANSFORMED image" << endl;

p = get_click("Transformed", im_transformed_show);

cout << p << endl;

points_transformed.push_back(p);

circle(im_transformed_show, p, 2, Scalar(0, 0, 255), -1);

imshow("Transformed", im_transformed_show);

}

// Estimate Affine transform

M = getAffineTransform(points, points_transformed);

cout << "Estimated Affine transform = " << M << endl;

// Apply estimates Affine transfrom to check its correctness

warpAffine(im, im_affine_transformed, M, im.size());

imshow("Estimated Affine transform", im_affine_transformed);

}

void affine_transformer::show_diff() {

imshow("Difference", im_transformed - im_affine_transformed);

}

int main() {

affine_transformer a;

a.estimate_affine();

cout << "Press 'd' to show difference, 'q' to end" << endl;

if(char(waitKey(-1)) == 'd') {

a.show_diff();

cout << "Press 'q' to end" << endl;

if(char(waitKey(-1)) == 'q') return 0;

}

else

return 0;

}

图 9-2。

Affine transform recovery using three pairs of matching points

透视变换

透视变换比仿射变换更普遍。它们不一定保持线条的“平行性”。但是因为它们更通用，它们也更实用——日常图像中遇到的几乎所有变换都是透视变换。有没有想过为什么两条铁轨似乎在远处相遇？这是因为你眼睛的图像平面以一种透视方式观察它们，并且透视变换不一定保持平行线平行。如果你从上面看这些铁轨，它们似乎根本不会相交。

给定 3×3 透视变换矩阵M，warpPerspective()应用以下变换:

注意，透视变换矩阵的左上角 2×2 部分的行列式不必是+1。此外，由于前面显示的变换中的除法，将透视变换矩阵的所有元素乘以一个常数不会对所表示的变换产生任何影响。因此，通常要计算透视变换矩阵，使得 M33 = 1。这给我们留下了 M 中的八个自由数，因此四对对应点足以恢复两幅图像之间的透视变换。OpenCV 函数findHomography()会帮你做到这一点。有趣的是，如果您在调用此函数时指定了标志 CV_RANSAC(参见在线文档)，它甚至可以接受四个以上的点，并使用 RANSAC 算法从所有这些点稳健地估计变换。RANSAC 使得变换估计过程不受噪声“错误”对应的影响。清单 9-3 读取两幅图像(通过透视变换关联)，要求用户点击八对点，使用 RANSAC 稳健地估计透视变换，并显示原始和新透视变换图像之间的差异以验证估计的变换。同样，差异图像在相关区域中主要是黑色的，这意味着估计的变换是正确的。

清单 9-3。程序演示了一个简单的透视变换恢复和应用

//Program to illustrate a simple perspective transform recovery and application

//Author: Samarth Manoj Brahmbhatt, University of Pennsyalvania

#include <opencv2/opencv.hpp>

#include <opencv2/stitching/stitcher.hpp>

#include <opencv2/stitching/warpers.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/calib3d/calib3d.hpp>

#include "Config.h"

using namespace std;

using namespace cv;

void on_mouse(int event, int x, int y, int, void* _p) {

Point2f* p = (Point2f *)_p;

if (event == CV_EVENT_LBUTTONUP) {

p->x = x;

p->y = y;

}

class perspective_transformer {

private:

Mat im, im_transformed, im_perspective_transformed, im_show, im_transformed_show;

vector<Point2f> points, points_transformed;

Mat M;

Point2f get_click(string, Mat);

public:

perspective_transformer();

void estimate_perspective();

void show_diff();

};

perspective_transformer::perspective_transformer() {

im = imread(DATA_FOLDER_3 + string("/image.jpg"));

im_transformed = imread(DATA_FOLDER_3 + string("/transformed.jpg"));

cout << DATA_FOLDER_3 + string("/transformed.jpg") << endl;

}

Point2f perspective_transformer::get_click(string window_name, Mat im) {

Point2f p(-1, -1);

setMouseCallback(window_name, on_mouse, (void *)&p);

while(p.x == -1 && p.y == -1) {

imshow(window_name, im);

waitKey(20);

}

return p;

}

void perspective_transformer::estimate_perspective() {

imshow("Original", im);

imshow("Transformed", im_transformed);

cout << "To estimate the Perspective transform between the original and transformed images you will have to click on 8 matching pairs of points" << endl;

im_show = im.clone();

im_transformed_show = im_transformed.clone();

Point2f p;

for(int i = 0; i < 8; i++) {

cout << "POINT " << i << endl;

cout << "Click on a distinguished point in the ORIGINAL image" << endl;

p = get_click("Original", im_show);

cout << p << endl;

points.push_back(p);

circle(im_show, p, 2, Scalar(0, 0, 255), -1);

imshow("Original", im_show);

cout << "Click on a distinguished point in the TRANSFORMED image" << endl;

p = get_click("Transformed", im_transformed_show);

cout << p << endl;

points_transformed.push_back(p);

circle(im_transformed_show, p, 2, Scalar(0, 0, 255), -1);

imshow("Transformed", im_transformed_show);

}

// Estimate perspective transform

M = findHomography(points, points_transformed, CV_RANSAC, 2);

cout << "Estimated Perspective transform = " << M << endl;

// Apply estimated perspecive trasnform

warpPerspective(im, im_perspective_transformed, M, im.size());

imshow("Estimated Perspective transform", im_perspective_transformed);

}

void perspective_transformer::show_diff() {

imshow("Difference", im_transformed - im_perspective_transformed);

}

int main() {

perspective_transformer a;

a.estimate_perspective();

cout << "Press 'd' to show difference, 'q' to end" << endl;

if(char(waitKey(-1)) == 'd') {

a.show_diff();

cout << "Press 'q' to end" << endl;

if(char(waitKey(-1)) == 'q') return 0;

}

else

return 0;

}

图 9-3。

Perspective transform recovery by clicking matching points

到目前为止，您一定已经意识到，通过使用高距离阈值匹配两幅图像之间的图像特征，整个配对过程也可以实现自动化。这正是清单 9-4 所做的。它计算 ORB 关键点和描述符(我们在第八章中学到了这一点)，匹配它们，并使用匹配来稳健地估计图像之间的透视变换。图 9-4 显示了运行中的代码。请注意 RANSAC 如何使变换估计过程对错误的 ORB 特征匹配具有鲁棒性。差异图像几乎是黑色的，这意味着估计的变换是正确的。

清单 9-4。通过匹配 ORB 特征来说明透视变换恢复的程序

//Program to illustrate perspective transform recovery by matching ORB features

//Author: Samarth Manoj Brahmbhatt, University of Pennsyalvania

#include <opencv2/opencv.hpp>

#include <opencv2/stitching/stitcher.hpp>

#include <opencv2/stitching/warpers.hpp>

#include <opencv2/highgui/highgui.hpp>

#include <opencv2/calib3d/calib3d.hpp>

#include "Config.h"

using namespace std;

using namespace cv;

class perspective_transformer {

private:

Mat im, im_transformed, im_perspective_transformed;

vector<Point2f> points, points_transformed;

Mat M;

public:

perspective_transformer();

void estimate_perspective();

void show_diff();

};

perspective_transformer::perspective_transformer() {

im = imread(DATA_FOLDER_3 + string("/image.jpg"));

im_transformed = imread(DATA_FOLDER_3 + string("/transformed.jpg"));

}

void perspective_transformer::estimate_perspective() {

// Match ORB features to point correspondences between the images

vector<KeyPoint> kp, t_kp;

Mat desc, t_desc, im_g, t_im_g;

cvtColor(im, im_g, CV_BGR2GRAY);

cvtColor(im_transformed, t_im_g, CV_BGR2GRAY);

OrbFeatureDetector featureDetector;

OrbDescriptorExtractor featureExtractor;

featureDetector.detect(im_g, kp);

featureDetector.detect(t_im_g, t_kp);

featureExtractor.compute(im_g, kp, desc);

featureExtractor.compute(t_im_g, t_kp, t_desc);

flann::Index flannIndex(desc, flann::LshIndexParams(12, 20, 2), cvflann::FLANN_DIST_HAMMING);

Mat match_idx(t_desc.rows, 2, CV_32SC1), match_dist(t_desc.rows, 2, CV_32FC1);

flannIndex.knnSearch(t_desc, match_idx, match_dist, 2, flann::SearchParams());

vector<DMatch> good_matches;

for(int i = 0; i < match_dist.rows; i++) {

if(match_dist.at<float>(i, 0) < 0.6 * match_dist.at<float>(i, 1)) {

DMatch dm(i, match_idx.at<int>(i, 0), match_dist.at<float>(i, 0));

good_matches.push_back(dm);

points.push_back((kp[dm.trainIdx]).pt);

points_transformed.push_back((t_kp[dm.queryIdx]).pt);

}

Mat im_show;

drawMatches(im_transformed, t_kp, im, kp, good_matches, im_show);

imshow("ORB matches", im_show);

M = findHomography(points, points_transformed, CV_RANSAC, 2);

cout << "Estimated Perspective transform = " << M << endl;

warpPerspective(im, im_perspective_transformed, M, im.size());

imshow("Estimated Perspective transform", im_perspective_transformed);

}

void perspective_transformer::show_diff() {

imshow("Difference", im_transformed - im_perspective_transformed);

}

int main() {

perspective_transformer a;

a.estimate_perspective();

cout << "Press 'd' to show difference, 'q' to end" << endl;

if(char(waitKey(-1)) == 'd') {

a.show_diff();

cout << "Press 'q' to end" << endl;

if(char(waitKey(-1)) == 'q') return 0;

}

else

return 0;

}

图 9-4。

Perspective transform recovery by matching ORB features

全景照片

制作全景图是自动恢复透视变换的主要应用之一。先前讨论的技术可以用于估计由旋转/回转(但不是平移)相机捕获的一组图像之间的透视变换。然后，人们可以通过在一个大的空白“画布”图像上“排列”所有这些图像来构建一个全景。根据估计的透视变换完成排列。尽管这是最常用于制作全景图的高级算法，但为了制作无缝的全景图，还需要注意一些小细节:

估计的透视变换很可能不完美。因此，如果仅通过估计的变换来在画布上排列图像，则在两幅图像重叠的区域中会观察到小的不连续。因此，在估计成对变换之后，必须进行第二次“全局”估计，这将干扰各个变换，以使所有变换彼此很好地一致
必须实施某种形式的接缝混合来消除重叠区域中的不连续性。大多数现代相机都有自动曝光设置。因此，不同的图像可能是在不同的曝光下拍摄的，因此它们可能比全景图中的相邻图像更暗或更亮。曝光的差异必须在所有相邻的图像中被中和

OpenCV stitching模块出色地内置了所有这些功能。它使用图 9-5 中概述的高级算法将图像拼接成视觉上正确的全景图。

图 9-5。

OpenCV image stitching pipeline, taken from OpenCV online documentation

从制作全景图的角度来看，stitching模块使用起来非常简单，只需创建一个stitching对象，并传递给它一个包含你想要拼接的图像的Mat向量。清单 9-5 显示了用于从图 9-6 所示的六幅图像中生成美丽全景图的简单代码。注意，这段代码要求图像出现在名为DATA_FOLDER_1的位置，并在Config.h头文件中定义。它使用CMake将可执行文件链接到 Boost 文件系统库。你可以使用第八章的末尾解释的架构和CMake组织来编译代码。

清单 9-5。从图像集合创建全景的代码

//Code to create a panorama from a collection of images

//Author: Samarth Manoj Brahmbhatt, University of Pennsyalvania

#include <opencv2/opencv.hpp>

#include <opencv2/stitching/stitcher.hpp>

#include <opencv2/stitching/warpers.hpp>

#include "Config.h"

#include <boost/filesystem.hpp>

using namespace std;

using namespace cv;

using namespace boost::filesystem;

int main() {

vector<Mat> images;

// Read images

for(directory_iterator i(DATA_FOLDER_5), end_iter; i != end_iter; i++) {

string im_name = i->path().filename().string();

string filename = string(DATA_FOLDER_5) + im_name;

Mat im = imread(filename);

if(!im.empty())

images.push_back(im);

}

cout << "Read " << images.size() << " images" << endl << "Now making panorama..." << endl;

Mat panorama;

Stitcher stitcher = Stitcher::createDefault();

stitcher.stitch(images, panorama);

namedWindow("Panorama", CV_WINDOW_NORMAL);

imshow("Panorama", panorama);

while(char(waitKey(1)) != 'q') {}

return 0;

}

图 9-6。

6 images (top) used to generate the Golden Gate panorama (bottom)

全景代码也可以很好地扩展。图 9-7 显示了使用相同代码从 23 幅图像生成的全景图。

图 9-7。

Panorama made by stitching 23 images

位于 http://docs.opencv.org/modules/stitching/doc/stitching.html 的拼接模块的在线文档显示，流水线的不同部分有很多选项。例如，您可以:

使用 SURF 或 ORB 作为您选择的图像特征
平面、球形或圆柱形作为全景图的形状(排列所有图像的画布的形状)
作为寻找需要混合的接缝区域的方法的图切割或 Voronoi 图

清单 9-6 展示了如何使用stitching类的各种“setter”函数插入和拔出流水线的不同模块。它将全景的形状从默认的平面更改为圆柱形。图 9-8 显示了这样得到的柱面全景图。

清单 9-6。从图像集合中创建具有圆柱形扭曲的全景图的代码

//Code to create a panorama with cylindrical warping from a collection of images

//Author: Samarth Manoj Brahmbhatt, University of Pennsyalvania

#include <opencv2/opencv.hpp>

#include <opencv2/stitching/stitcher.hpp>

#include <opencv2/stitching/warpers.hpp>

#include "Config.h"

#include <boost/filesystem.hpp>

using namespace std;

using namespace cv;

using namespace boost::filesystem;

int main() {

vector<Mat> images;

for(directory_iterator i(DATA_FOLDER_5), end_iter; i != end_iter; i++) {

string im_name = i->path().filename().string();

string filename = string(DATA_FOLDER_5) + im_name;

Mat im = imread(filename);

if(!im.empty())

images.push_back(im);

}

cout << "Read " << images.size() << " images" << endl << "Now making panorama..." << endl;

Mat panorama;

Stitcher stitcher = Stitcher::createDefault();

CylindricalWarper* warper = new CylindricalWarper();

stitcher.setWarper(warper);

// Estimate perspective transforms between images

Stitcher::Status status = stitcher.estimateTransform(images);

if (status != Stitcher::OK) {

cout << "Can't stitch images, error code = " << int(status) << endl;

return -1;

}

// Make panorama

status = stitcher.composePanorama(panorama);

if (status != Stitcher::OK) {

cout << "Can't stitch images, error code = " << int(status) << endl;

return -1;

}

namedWindow("Panorama", CV_WINDOW_NORMAL);

imshow("Panorama", panorama);

while(char(waitKey(1)) != 'q') {}

return 0;

}

图 9-8。

Cylindrical panorama made from seven images

摘要

几何图像变换是处理真实世界的所有计算机视觉程序的重要部分，因为在世界和相机的图像平面之间以及相机的两个位置的图像平面之间总是存在透视变换。他们也很酷，因为他们可以用来制作全景图！在本章中，您学习了如何编写实现透视变换的代码，以及如何恢复两个给定图像之间的变换，这在许多实际的计算机视觉项目中是一项有用的技能。下一章是关于立体视觉的。我们将使用透视变换矩阵的知识来表示立体摄像机的左右摄像机之间的变换，这将是学习如何校准立体摄像机的重要一步。