Machine-Learning-Mastery-计算机视觉教程-六-Machine Learning Mastery

Machine Learning Mastery 计算机视觉教程（六）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

如何在 Keras 中将 VGGFace2 用于人脸识别

原文：machinelearningmastery.com/how-to-perf…

最后更新于 2020 年 8 月 24 日

人脸识别是一项基于人脸照片识别和验证人的计算机视觉任务。

最近，深度学习卷积神经网络已经超越了经典方法，并在标准人脸识别数据集上取得了最先进的结果。最先进模型的一个例子是牛津视觉几何小组的研究人员开发的 VGGFace 和 VGGFace2 模型。

尽管该模型实现起来很有挑战性，训练起来也很耗费资源，但通过使用免费提供的预训练模型和第三方开源库，它可以很容易地在标准深度学习库中使用，例如 Keras。

在本教程中，您将发现如何使用 VGGFace2 深度学习模型开发用于人脸识别和验证的人脸识别系统。

完成本教程后，您将知道:

关于用于人脸识别的 VGGFace 和 VGGFace2 模型，以及如何安装 keras_vggface 库，以便在 Python 中使用这些模型和 keras。
如何开发人脸识别系统来预测给定照片中的名人姓名？
如何开发一个人脸验证系统来确认给定人脸照片的人的身份。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

**2019 年 11 月更新:**针对 TensorFlow v2.0、VGGFace v0.6 和 MTCNN v0.1.0 进行了更新。

How to Perform Face Recognition With VGGFace2 Convolutional Neural Network in Keras

如何在 Keras 中使用 VGGFace2 卷积神经网络进行人脸识别。

教程概述

本教程分为六个部分；它们是:

人脸识别
VGGFace 和 VGGFace2 型号
如何安装 keras-vggface 库
如何检测人脸进行人脸识别
如何使用 VGGFace2 进行人脸识别
如何使用 VGGFace2 进行人脸验证

人脸识别

人脸识别是从人脸照片中识别和验证人的一般任务。

2011 年出版的名为《人脸识别手册》的人脸识别书籍描述了人脸识别的两种主要模式，如下所示:

人脸验证。给定人脸与已知身份(例如的一对一映射是这个人吗？)。
人脸识别。给定人脸与已知人脸数据库的一对多映射(例如这个人是谁？)。

人脸识别系统有望自动识别图像和视频中的人脸。它可以在两种模式中的一种或两种模式下工作:(1)人脸验证(或认证)和(2)人脸识别(或识别)。

—第 1 页，人脸识别手册。2011.

在本教程中，我们将探索这两种人脸识别任务。

VGGFace 和 VGGFace2 型号

VGGFace 是指为人脸识别开发的一系列模型，由牛津大学视觉几何组(VGG)的成员在基准计算机视觉数据集上演示。

在撰写本文时，有两种主要的 VGG 人脸识别模型；它们是 VGGFace 和 VGGFace2。让我们依次仔细看看每一个。

VGGFace 模型

后来命名的 VGGFace 模型是由 Omkar Parkhi 在 2015 年的论文《深度人脸识别》中描述的

论文的一个贡献是描述了如何开发一个非常大的训练数据集，这是训练基于现代卷积神经网络的人脸识别系统所必需的，以便与脸书和谷歌用来训练模型的大数据集竞争。

…[我们]提出了一种创建相当大的人脸数据集的方法，同时只需要有限的人力来进行注释。为此，我们提出了一种利用网络上的知识来源收集人脸数据的方法(第三部分)。我们使用这个过程来构建一个拥有超过 200 万张脸的数据集，并将它免费提供给研究团体。

——深度人脸识别，2015 年。

然后，该数据集被用作开发深度 CNN 的基础，用于人脸识别任务，如人脸识别和验证。具体来说，模型在非常大的数据集上训练，然后在基准人脸识别数据集上评估，证明该模型在从人脸生成广义特征方面是有效的。

他们描述了首先训练人脸分类器的过程，该分类器使用输出层中的 softmax 激活函数将人脸分类为人。然后移除该层，使得网络的输出是面部的矢量特征表示，称为面部嵌入。然后，通过微调进一步训练该模型，以便使为同一身份生成的向量之间的欧几里德距离更小，而为不同身份生成的向量更大。这是使用三重态损失函数实现的。

三元组丢失训练旨在学习在最终应用中表现良好的分数向量，即通过比较欧氏空间中的人脸描述符来进行身份验证。[……]一个三元组(A，p，n)包含一个锚脸图像和一个正 p！=主播身份的 a 和负 n 个例子。投影 W’是在目标数据集上学习的

——深度人脸识别，2015 年。

在 VGG 风格中使用了深度卷积神经网络架构，具有小核的卷积层块和 ReLU 激活，随后是最大池层，以及在网络的分类器端使用完全连接的层。

VGGFace2 型号

来自 VGG 的曹琼等人在他们 2017 年的论文《VGGFace2:跨姿势和年龄识别人脸的数据集》中描述了一项后续工作

他们将 VGGFace2 描述为一个大得多的数据集，这些数据集是他们为了训练和评估更有效的人脸识别模型而收集的。

本文介绍了一个新的大规模人脸数据集 VGGFace2。该数据集包含 9131 名受试者的 331 万张图像，平均每个受试者有 362.6 张图像。图片从谷歌图像搜索下载，在姿势、年龄、光照、种族和职业(如演员、运动员、政治家)方面有很大差异。

——vggface 2:一个跨姿势和年龄识别人脸的数据集，2017。

这篇论文的重点是这个数据集是如何收集、整理的，以及在建模之前图像是如何准备的。然而，VGGFace2 已经成为一个名称，指的是在这个数据集上训练的预先训练的模型，这些模型已经提供了人脸识别。

模型是在数据集上训练的，特别是 ResNet-50 和 SqueezeNet-ResNet-50 模型(称为 SE-ResNet-50 或 SENet)，作者提供了这些模型的变体以及相关代码。这些模型在标准人脸识别数据集上进行评估，展示了最先进的表现。

……我们证明，在 VGGFace2 上训练的深度模型(ResNet-50 和 SENet)在[…]基准测试中取得了最先进的表现。

——vggface 2:一个跨姿势和年龄识别人脸的数据集，2017。

具体来说，基于挤压网的模型总体上提供了更好的表现。

从零开始学习的 ResNet-50 和 SENet 之间的比较表明，SENet 在验证和识别方面具有一贯的卓越表现。[……]此外，通过在两个数据集 VGGFace2 和 MS1M 上进行训练，利用各自提供的不同优势，可以进一步提高 SENet 的表现。

——vggface 2:一个跨姿势和年龄识别人脸的数据集，2017。

人脸嵌入由给定模型预测为 2048 长度向量。向量的长度然后被归一化，例如使用 L2 向量范数(到原点的欧几里德距离)被归一化为长度 1 或单位范数。这被称为“面部描述符”。使用余弦相似度计算面部描述符(或称为“主题模板”的面部描述符组)之间的距离。

面部描述符是从与分类器层相邻的层中提取的。这导致 2048 维描述符，然后 L2 归一化

——vggface 2:一个跨姿势和年龄识别人脸的数据集，2017。

如何安装 keras-vggface 库

VGFFace2 的作者为他们的模型提供了源代码，以及可以用标准深度学习框架(如 Caffe 和 PyTorch)下载的预训练模型，尽管没有 TensorFlow 或 Keras 的例子。

我们可以将提供的模型转换为 TensorFlow 或 Keras 格式，并开发模型定义，以便加载和使用这些预先训练好的模型。谢天谢地，这项工作已经完成，可以直接被第三方项目和库使用。

或许在 Keras 中使用 VGGFace2(和 VGGFace)模型的最佳第三方库是 keras-vggface 项目和由瑞菲克·坎马利创建的库。

鉴于这是一个第三方开源项目，可能会有变化，我在这里创建了一个叉的项目。

该库可以通过 pip 安装；例如:

sudo pip install git+https://github.com/rcmalli/keras-vggface.git

成功安装后，您应该会看到如下消息:

Successfully installed keras-vggface-0.6

您可以通过查询已安装的软件包来确认库安装正确:

pip show keras-vggface

这将总结包的细节；例如:

Name: keras-vggface
Version: 0.6
Summary: VGGFace implementation with Keras framework
Home-page: https://github.com/rcmalli/keras-vggface
Author: Refik Can MALLI
Author-email: mallir@itu.edu.tr
License: MIT
Location: ...
Requires: numpy, scipy, h5py, pillow, keras, six, pyyaml
Required-by:

您也可以通过将库加载到脚本中并打印当前版本来确认库加载正确；例如:

# check version of keras_vggface
import keras_vggface
# print version
print(keras_vggface.__version__)

运行该示例将加载库并打印当前版本。

0.6

如何检测人脸进行人脸识别

在我们能够进行人脸识别之前，我们需要检测人脸。

人脸检测是在照片中自动定位人脸，并通过在人脸范围周围画一个边界框来定位人脸的过程。

在本教程中，我们还将使用多任务级联卷积神经网络(MTCNN)进行人脸检测，例如从照片中查找和提取人脸。这是一个最先进的人脸检测深度学习模型，在 2016 年发表的题为“使用多任务级联卷积网络的联合人脸检测和对齐”的论文中有所描述

我们将在 ipazc/mtcnn 项目中使用 Iván de Paz Centeno 提供的实现。这也可以通过 pip 安装，如下所示:

sudo pip install mtcnn

我们可以通过导入库并打印版本来确认库安装正确；比如说。

# confirm mtcnn was installed correctly
import mtcnn
# print version
print(mtcnn.__version__)

运行该示例将打印库的当前版本。

0.1.0

我们可以使用 mtcnn 库来创建一个人脸检测器，并提取人脸供我们在后续章节中与 VGGFace 人脸检测器模型一起使用。

第一步是加载一个图像作为 NumPy 数组，我们可以使用 Matplotlib imread()函数来实现。

# load image from file
pixels = pyplot.imread(filename)

接下来，我们可以创建一个 MTCNN 人脸检测器类，并使用它来检测加载的照片中的所有人脸。

# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
results = detector.detect_faces(pixels)

结果是一个边界框列表，其中每个边界框定义了边界框的左下角，以及宽度和高度。

如果我们假设照片中只有一张脸用于实验，我们可以如下确定边界框的像素坐标。

# extract the bounding box from the first face
x1, y1, width, height = results[0]['box']
x2, y2 = x1 + width, y1 + height

我们可以用这些坐标提取人脸。

# extract the face
face = pixels[y1:y2, x1:x2]

然后，我们可以使用 PIL 图书馆来调整这个小图像的脸所需的大小；具体来说，该模型期望形状为 224×224 的正方形输入面。

# resize pixels to the model size
image = Image.fromarray(face)
image = image.resize((224, 224))
face_array = asarray(image)

将所有这些结合在一起，函数 extract_face() 将从加载的文件名中加载一张照片，并返回提取的人脸。

它假设照片包含一张脸，并将返回检测到的第一张脸。

# extract a single face from a given photograph
def extract_face(filename, required_size=(224, 224)):
	# load image from file
	pixels = pyplot.imread(filename)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

我们可以用照片来测试这个功能。

从维基百科下载莎朗斯通 2013 年拍摄的照片，该照片是在许可许可下发布的。

下载照片并将其放入当前工作目录，文件名为“ sharon_stone1.jpg ”。

Photograph of Sharon

莎伦的照片(sharon_stone1.jpg) Stone，来自维基百科。

下载莎朗斯通照片(莎朗斯通 1.jpg)

下面列出了加载莎朗·斯通的照片、提取面部并绘制结果的完整示例。

# example of face detection with mtcnn
from matplotlib import pyplot
from PIL import Image
from numpy import asarray
from mtcnn.mtcnn import MTCNN

# extract a single face from a given photograph
def extract_face(filename, required_size=(224, 224)):
	# load image from file
	pixels = pyplot.imread(filename)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

# load the photo and extract the face
pixels = extract_face('sharon_stone1.jpg')
# plot the extracted face
pyplot.imshow(pixels)
# show the plot
pyplot.show()

运行该示例加载照片，提取人脸，并绘制结果。

我们可以看到人脸被正确检测和提取。

结果表明，我们可以使用开发的 extract_face() 函数作为后续章节中使用 VGGFace 人脸识别模型进行示例的基础。

Face Detected From a Photograph of Sharon Stone Using an MTCNN Model

使用有线电视新闻网模型从莎朗·斯通的照片中检测人脸

如何使用 VGGFace2 进行人脸识别

在这一节中，我们将使用 VGGFace2 模型对维基百科中的名人照片进行人脸识别。

可以使用 VGGFace() 构造函数并通过“模型参数指定要创建的模型类型来创建 VGGFace 模型。

model = VGGFace(model='...')

keras-vggface 库提供了三个预训练的 VGGModels，一个通过model = ' vgg 16′(默认)的 VGGFace1 模型，以及两个 VGGFace2 模型' resnet50 和' senet50 。

下面的例子创建了一个“ resnet50 ”的 VGGFace2 模型，并总结了输入和输出的形状。

# example of creating a face embedding
from keras_vggface.vggface import VGGFace
# create a vggface2 model
model = VGGFace(model='resnet50')
# summarize input and output shape
print('Inputs: %s' % model.inputs)
print('Outputs: %s' % model.outputs)

第一次创建模型时，库会下载模型权重并保存在中。/keras/models/vggface/ 目录在你的主目录中。 resnet50 型号的权重大小约为 158 兆字节，因此下载可能需要几分钟，具体取决于您的互联网连接速度。

运行该示例会打印模型的输入和输出张量的形状。

我们可以看到，该模型期望输入 244×244 形状的人脸彩色图像，输出将是 8631 人的类预测。这是有意义的，因为预训练模型是在 MS-Celeb-1M 数据集 ( 列在这个 CSV 文件中)的 8631 个身份上训练的。

Inputs: [<tf.Tensor 'input_1:0' shape=(?, 224, 224, 3) dtype=float32>]
Outputs: [<tf.Tensor 'classifier/Softmax:0' shape=(?, 8631) dtype=float32>]

这个 Keras 模型可以直接用来预测一张给定的脸属于八千多个已知名人中的一个或多个的概率；例如:

# perform prediction
yhat = model.predict(samples)

一旦做出预测，就可以将类整数映射到名人的名字，并且可以检索出概率最高的前五个名字。

该行为由 keras-vggface 库中的 decode_predictions() 函数提供。

# convert prediction into names
results = decode_predictions(yhat)
# display most likely results
for result in results[0]:
	print('%s: %.3f%%' % (result[0], result[1]*100))

在我们可以用人脸进行预测之前，像素值必须以与 VGGFace 模型拟合时准备的数据相同的方式进行缩放。具体来说，像素值必须使用训练数据集中的平均值以每个通道为中心。

这可以通过使用keras-vgf ace库中提供的*prepare _ input()*功能并指定“版本=2 ”来实现，以便使用用于训练 vgf ace 2 模型而不是 vgf ace 1 模型(默认)的平均值来缩放图像。

# convert one face into samples
pixels = pixels.astype('float32')
samples = expand_dims(pixels, axis=0)
# prepare the face for the model, e.g. center pixels
samples = preprocess_input(samples, version=2)

我们可以将所有这些联系在一起，并预测我们在前一节下载的香农·斯通照片的身份，特别是“ sharon_stone1.jpg ”。

下面列出了完整的示例。

# Example of face detection with a vggface2 model
from numpy import expand_dims
from matplotlib import pyplot
from PIL import Image
from numpy import asarray
from mtcnn.mtcnn import MTCNN
from keras_vggface.vggface import VGGFace
from keras_vggface.utils import preprocess_input
from keras_vggface.utils import decode_predictions

# extract a single face from a given photograph
def extract_face(filename, required_size=(224, 224)):
	# load image from file
	pixels = pyplot.imread(filename)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

# load the photo and extract the face
pixels = extract_face('sharon_stone1.jpg')
# convert one face into samples
pixels = pixels.astype('float32')
samples = expand_dims(pixels, axis=0)
# prepare the face for the model, e.g. center pixels
samples = preprocess_input(samples, version=2)
# create a vggface model
model = VGGFace(model='resnet50')
# perform prediction
yhat = model.predict(samples)
# convert prediction into names
results = decode_predictions(yhat)
# display most likely results
for result in results[0]:
	print('%s: %.3f%%' % (result[0], result[1]*100))

运行该示例加载照片，提取我们知道存在的单个人脸，然后预测人脸的身份。

然后显示前五个概率最高的名称。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到，该模型正确地将人脸识别为属于莎朗·斯通，可能性为 99.642%。

b' Sharon_Stone': 99.642%
b' Noelle_Reno': 0.085%
b' Elisabeth_R\xc3\xb6hm': 0.033%
b' Anita_Lipnicka': 0.026%
b' Tina_Maze': 0.019%

我们可以用另一个名人来测试这个模型，在这个例子中，是一个男性，查宁·塔图姆。

根据许可协议，2017 年拍摄的查宁·塔图姆照片可以在维基百科上获得。

下载照片并保存在当前工作目录下，文件名为“”channing _ Tatum . jpg。

Photograph of Channing Tatum

查宁·塔图姆的照片，来自维基百科。

下载查宁·塔图姆照片

请更改代码以加载查宁·塔图姆的照片；例如:

pixels = extract_face('channing_tatum.jpg')

用新照片运行该示例，我们可以看到该模型正确地将人脸识别为属于查宁·塔图姆，可能性为 94.432%。

b' Channing_Tatum': 94.432%
b' Eoghan_Quigg': 0.146%
b' Les_Miles': 0.113%
b' Ibrahim_Afellay': 0.072%
b' Tovah_Feldshuh': 0.070%

你可能想用维基百科上的名人照片来尝试这个例子。尝试不同的性别、种族和年龄。你会发现这个模式并不完美，但是对于那些它确实很了解的名人来说，它可以是有效的。

您可能想尝试该模型的其他版本，如“ vgg16 ”和“ senet50 ”，然后比较结果。比如我发现有一张奥斯卡·伊萨克的照片， vgg16 是有效的，而 VGGFace2 车型则没有。

该模型可用于识别新面孔。一种方法是用新的人脸数据集重新训练模型，也许只是模型的分类器部分。

如何使用 VGGFace2 进行人脸验证

VGGFace2 模型可用于人脸验证。

这包括计算新的给定面部的面部嵌入，并将该嵌入与系统已知的面部的单个例子的嵌入进行比较。

人脸嵌入是表示从人脸中提取的特征的向量。然后可以将其与为其他面生成的矢量进行比较。例如，另一个靠近的向量(以某种度量)可能是同一个人，而另一个远的向量(以某种度量)可能是不同的人。

欧几里德距离和余弦距离等典型度量是在两个嵌入之间计算的，如果距离低于预定义的阈值(通常针对特定数据集或应用程序进行调整)，则称这些面匹配或验证。

首先，我们可以通过将“ include_top ”参数设置为“ False ”来加载不带分类器的 VGGFace 模型，通过“ input_shape 指定输出的形状，并将“ pooling ”设置为“ avg ”，从而使用全局平均池将模型输出端的过滤器映射简化为一个向量。

# create a vggface model
model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')

然后，该模型可用于进行预测，这将返回作为输入提供的一个或多个面部的面部嵌入。

# perform prediction
yhat = model.predict(samples)

我们可以定义一个新的函数，在给定包含人脸的照片的文件名列表的情况下，该函数将通过上一节中开发的 extract_face() 函数从每张照片中提取一张人脸，VGGFace2 模型的输入需要预处理，可以通过调用*prepare _ input()*来实现，然后为每张照片预测人脸嵌入。

下面的*get _ embedding()*函数实现了这一点，为每个提供的照片文件名返回一个包含一个人脸嵌入的数组。

# extract faces and calculate face embeddings for a list of photo files
def get_embeddings(filenames):
	# extract faces
	faces = [extract_face(f) for f in filenames]
	# convert into an array of samples
	samples = asarray(faces, 'float32')
	# prepare the face for the model, e.g. center pixels
	samples = preprocess_input(samples, version=2)
	# create a vggface model
	model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')
	# perform prediction
	yhat = model.predict(samples)
	return yhat

我们可以通过计算和存储该照片中面部的面部嵌入来拍摄我们之前使用的莎朗·斯通的照片(例如莎朗·斯通 1.jpg )作为我们对莎朗·斯通身份的定义。

然后，我们可以计算莎朗·斯通的其他照片中人脸的嵌入，并测试我们是否能有效地验证她的身份。我们也可以用其他人照片中的脸来确认他们不是莎朗·斯通。

可以通过计算已知身份的嵌入和候选面部的嵌入之间的余弦距离来执行验证。这可以使用余弦()SciPy 函数来实现。两个嵌入之间的最大距离为 1.0，而最小距离为 0.0。用于面部识别的常用截止值介于 0.4 和 0.6 之间，例如 0.5，尽管这应该针对应用进行调整。

下面的 is_match() 函数实现了这一点，计算两个嵌入之间的距离并解释结果。

# determine if a candidate face is a match for a known face
def is_match(known_embedding, candidate_embedding, thresh=0.5):
	# calculate distance between embeddings
	score = cosine(known_embedding, candidate_embedding)
	if score <= thresh:
		print('>face is a Match (%.3f <= %.3f)' % (score, thresh))
	else:
		print('>face is NOT a Match (%.3f > %.3f)' % (score, thresh))

我们可以通过从维基百科下载更多莎朗·斯通的照片来测试一些正面的例子。

具体来说，2002 年拍摄的照片(下载并另存为“ sharon_stone2.jpg ”)，2017 年拍摄的照片(下载并另存为“ sharon_stone3.jpg ”)

我们将测试这两个阳性病例，并将前一部分的查宁·塔图姆照片作为阴性样本。

下面列出了人脸验证的完整代码示例。

# face verification with the VGGFace2 model
from matplotlib import pyplot
from PIL import Image
from numpy import asarray
from scipy.spatial.distance import cosine
from mtcnn.mtcnn import MTCNN
from keras_vggface.vggface import VGGFace
from keras_vggface.utils import preprocess_input

# extract a single face from a given photograph
def extract_face(filename, required_size=(224, 224)):
	# load image from file
	pixels = pyplot.imread(filename)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

# extract faces and calculate face embeddings for a list of photo files
def get_embeddings(filenames):
	# extract faces
	faces = [extract_face(f) for f in filenames]
	# convert into an array of samples
	samples = asarray(faces, 'float32')
	# prepare the face for the model, e.g. center pixels
	samples = preprocess_input(samples, version=2)
	# create a vggface model
	model = VGGFace(model='resnet50', include_top=False, input_shape=(224, 224, 3), pooling='avg')
	# perform prediction
	yhat = model.predict(samples)
	return yhat

# determine if a candidate face is a match for a known face
def is_match(known_embedding, candidate_embedding, thresh=0.5):
	# calculate distance between embeddings
	score = cosine(known_embedding, candidate_embedding)
	if score <= thresh:
		print('>face is a Match (%.3f <= %.3f)' % (score, thresh))
	else:
		print('>face is NOT a Match (%.3f > %.3f)' % (score, thresh))

# define filenames
filenames = ['sharon_stone1.jpg', 'sharon_stone2.jpg',
	'sharon_stone3.jpg', 'channing_tatum.jpg']
# get embeddings file filenames
embeddings = get_embeddings(filenames)
# define sharon stone
sharon_id = embeddings[0]
# verify known photos of sharon
print('Positive Tests')
is_match(embeddings[0], embeddings[1])
is_match(embeddings[0], embeddings[2])
# verify known photos of other people
print('Negative Tests')
is_match(embeddings[0], embeddings[3])

第一张照片作为莎朗·斯通的模板，列表中剩余的照片为正反照片，以供测试验证。

运行该示例，我们可以看到系统正确验证了莎朗·斯通早期和晚期照片中的两个阳性病例。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们还可以看到，查宁·塔图姆的照片没有被正确地验证为莎朗·斯通。探索其他负面照片的验证，如其他女性名人的照片，将是一个有趣的扩展。

Positive Tests
>face is a Match (0.418 <= 0.500)
>face is a Match (0.295 <= 0.500)
Negative Tests
>face is NOT a Match (0.709 > 0.500)

注:模型生成的嵌入并不是针对用来训练模型的名人照片。该模型被认为可以为任何人脸生成有用的嵌入；或许可以用自己的照片对比亲戚朋友的照片来尝试一下。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

书

人脸识别手册，第二版，2011 年。

应用程序接口

摘要

在本教程中，您发现了如何使用 VGGFace2 深度学习模型开发用于人脸识别和验证的人脸识别系统。

具体来说，您了解到:

关于用于人脸识别的 VGGFace 和 VGGFace2 模型，以及如何安装 keras_vggface 库，以便在 Python 中使用这些模型和 keras。
如何开发人脸识别系统来预测给定照片中的名人姓名？
如何开发一个人脸验证系统来确认给定人脸照片的人的身份。

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何在 Keras 中将 Mask RCNN 用于照片中的对象检测

原文：machinelearningmastery.com/how-to-perf…

最后更新于 2020 年 9 月 2 日

对象检测是计算机视觉中的一项任务，包括识别给定照片中一个或多个目标的存在、位置和类型。

这是一个具有挑战性的问题，涉及建立在对象识别(例如，它们在哪里)、对象定位(例如，它们的范围是什么)和对象分类(例如，它们是什么)的方法之上。

近年来，深度学习技术在对象检测方面取得了最先进的成果，例如在标准基准数据集和计算机视觉竞赛中。最值得注意的是基于区域的卷积神经网络，以及最近的一种称为掩模的技术，该技术能够在一系列对象检测任务中获得最先进的结果。

在本教程中，您将发现如何使用遮罩 R-CNN 模型来检测新照片中的对象。

完成本教程后，您将知道:

用于对象检测的基于区域的卷积神经网络系列模型，以及最近的一个变种，称为掩蔽神经网络。
Keras 深度学习库的 Mask R-CNN 的最佳开源库实现。
如何使用预先训练好的 Mask R-CNN 对新照片进行对象定位和检测。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

How to Perform Object Detection in Photographs With Mask R-CNN in Keras

教程概述

本教程分为三个部分；它们是:

美国有线电视新闻网和面具美国有线电视新闻网
马特波特口罩研究项目
基于掩模的对象检测

注:本教程需要 TensorFlow 1 . 15 . 3 版和 Keras 2.2.4 版。它不适用于 TensorFlow 2.0+或 Keras 2.2.5+，因为在编写本文时，第三方库尚未更新。

您可以按如下方式安装这些特定版本的库:

sudo pip install --no-deps tensorflow==1.15.3
sudo pip install --no-deps keras==2.2.4

用于对象检测的掩模

对象检测是一项计算机视觉任务，包括定位图像中的一个或多个目标，并对图像中的每个目标进行分类。

这是一项具有挑战性的计算机视觉任务，既需要成功的对象定位来定位和绘制图像中每个对象周围的边界框，又需要对象分类来预测被定位的对象的正确类别。

对象检测的扩展包括标记图像中属于每个检测到的对象的特定像素，而不是在对象定位期间使用粗略的边界框。这个更难的问题通常被称为对象分割或语义分割。

基于区域的卷积神经网络，或称 R-CNN，是一个为对象检测而设计的卷积神经网络模型家族，由 Ross Girshick 等人开发。

这种方法可能有四种主要的变化，导致了目前被称为面具的顶峰。每种变化的突出方面可以总结如下:

R-CNN :包围盒由“选择性搜索算法提出，在用线性支持向量机进行最后一组对象分类之前，每个包围盒被拉伸，并通过深度卷积神经网络提取特征，例如 AlexNet 。
快速 R-CNN :单一模型的简化设计，仍然指定边界框作为输入，但是在深度 CNN 之后使用感兴趣区域池层来合并区域，模型直接预测类标签和感兴趣区域。
更快的 R-CNN :增加了区域提议网络，解释从深层 CNN 提取的特征，学习直接提议感兴趣的区域。
Mask RCNN:fast R-CNN 的扩展，增加了一个输出模型，用于预测每个检测对象的蒙版。

在 2018 年发表的题为“ Mask R-CNN ”的论文中介绍的 Mask R-CNN 模型是该系列模型的最新变体，同时支持对象检测和对象分割。这篇论文很好地总结了这方面的模型:

基于区域的包围盒对象检测方法是关注可管理数量的候选对象区域，并在每个 RoI 上独立评估卷积网络。美国有线电视新闻网得到了扩展，允许使用感兴趣区域池关注要素地图上的感兴趣区域，从而提高了速度和准确性。更快的美国有线电视新闻网通过区域建议网络学习注意力机制来推进这一趋势。更快的 R-CNN 对许多后续改进是灵活和健壮的，并且是几个基准中的当前领先框架。

——口罩 R-CNN ，2018 年。

该系列方法可能是最有效的对象检测方法之一，可以在计算机视觉基准数据集上获得最先进的结果。虽然准确，但与 YOLO 等替代模型相比，这些模型在进行预测时可能会很慢，后者可能不太准确，但设计用于实时预测。

马特波特口罩研究项目

Mask R-CNN 是一个需要实现的复杂模型，尤其是与简单甚至最先进的深度卷积神经网络模型相比。

R-CNN 模型的每个版本都有源代码，在独立的 GitHub 存储库中提供，原型模型基于 Caffe 深度学习框架。例如:

我们可以使用建立在 Keras 深度学习框架之上的可靠的第三方实现，而不是从零开始开发 R-CNN 或 Mask R-CNN 模型的实现。

口罩 R-CNN 最好的第三方实现是由马特波特开发的口罩 R-CNN 项目。该项目是在许可许可(即麻省理工学院许可)下发布的开源项目，该代码已被广泛用于各种项目和卡格尔竞赛。

然而，它是一个开源项目，受制于项目开发者的奇思妙想。因此，我有一个可用项目的分叉，以防将来 API 有重大变化。

该项目不依赖于应用编程接口文档，尽管它确实以 Python 笔记本的形式提供了许多示例，您可以使用这些示例来理解如何使用该库。可能有助于复习的两本笔记本是:

在 Matterport 库中使用 Mask R-CNN 模型可能有三个主要用例；它们是:

对象检测应用:使用预先训练好的模型对新图像进行对象检测。
通过迁移学习的新模型:在为新的对象检测数据集开发模型时，使用预先训练的模型作为起点。
从零开始新模型:为对象检测数据集从零开始开发新模型。

为了熟悉模型和库，我们将在下一节中查看第一个示例。

带掩模的对象检测

在本节中，我们将使用 Matterport Mask R-CNN 库对任意照片执行对象检测。

很像使用预先训练的深度 CNN 进行图像分类，例如在 ImageNet 数据集上训练的 VGG-16，我们可以使用预先训练的 Mask R-CNN 模型来检测新照片中的对象。在这种情况下，我们将使用 Mask RCNN 上训练的 MS COCO 对象检测问题。

面罩安装

第一步是安装库。

在撰写本文时，该库没有分布式版本，所以我们必须手动安装。好消息是，这非常容易。

安装包括克隆 GitHub 存储库和在您的工作站上运行安装脚本。如果您遇到问题，请参阅藏在库的自述文件中的安装说明。

第一步。克隆面具

这就像从命令行运行以下命令一样简单:

git clone https://github.com/matterport/Mask_RCNN.git

这将创建一个名为 Mask_RCNN 的新本地目录，如下所示:

Mask_RCNN
├── assets
├── build
│   ├── bdist.macosx-10.13-x86_64
│   └── lib
│       └── mrcnn
├── dist
├── images
├── mask_rcnn.egg-info
├── mrcnn
└── samples
    ├── balloon
    ├── coco
    ├── nucleus
    └── shapes

第二步。安装屏蔽

该库可以通过 pip 直接安装。

将目录改为 Mask_RCNN 目录，运行安装脚本。

在命令行中，键入以下内容:

cd Mask_RCNN
python setup.py install

在 Linux 或 MacOS 上，您可能需要安装具有 sudo 权限的软件；例如，您可能会看到如下错误:

error: can't create or remove files in install directory

在这种情况下，使用 sudo 安装软件:

sudo python setup.py install

然后，该库将直接安装，您将看到许多成功安装的消息，以下列内容结尾:

...
Finished processing dependencies for mask-rcnn==2.1

这确认您成功地安装了库，并且您拥有最新的版本，在撰写本文时是 2.1 版。

步骤 3:确认库已安装

确认库安装正确总是一个好主意。

您可以通过 pip 命令查询库来确认库安装正确；例如:

pip show mask-rcnn

您应该会看到通知您版本和安装位置的输出；例如:

Name: mask-rcnn
Version: 2.1
Summary: Mask R-CNN for object detection and instance segmentation
Home-page: https://github.com/matterport/Mask_RCNN
Author: Matterport
Author-email: waleed.abdulla@gmail.com
License: MIT
Location: ...
Requires:
Required-by:

我们现在可以使用图书馆了。

对象定位示例

我们将使用预先训练好的 Mask RCNN 模型来检测新照片上的对象。

第一步。下载模型权重

首先，下载预训练模型的权重，特别是在 MS Coco 数据集上训练的 Mask R-CNN。

权重可从 GitHub 项目中获得，文件约为 250 兆字节。将模型权重下载到当前工作目录中名为“ mask_rcnn_coco.h5 的文件中。

下载权重(mask_rcnn_coco.h5) (246 兆)

第二步。下载照片样本

我们还需要一张照片来探测对象。

我们将使用 Flickr 在许可许可下发布的照片，特别是曼迪·戈德堡拍摄的大象照片。

将照片下载到您当前的工作目录中，文件名为“elephant.jpg”。

Elephant

下载照片(elephant.jpg)

第三步。负荷建模和预测

首先，必须通过实例 MaskRCNN 类定义模型。

此类需要一个配置对象作为参数。配置对象定义了在训练或推理过程中如何使用模型。

在这种情况下，配置将只指定每批图像的数量(一个)和要预测的类的数量。

您可以在 config.py 文件中看到配置对象的完整范围和可以覆盖的属性。

# define the test configuration
class TestConfig(Config):
     NAME = "test"
     GPU_COUNT = 1
     IMAGES_PER_GPU = 1
     NUM_CLASSES = 1 + 80

我们现在可以定义 MaskRCNN 实例。

我们将模型定义为类型“推断”，表示我们感兴趣的是做预测而不是训练。我们还必须指定一个可以写入任何日志消息的目录，在这种情况下，它将是当前的工作目录。

# define the model
rcnn = MaskRCNN(mode='inference', model_dir='./', config=TestConfig())

下一步是加载我们下载的权重。

# load coco model weights
rcnn.load_weights('mask_rcnn_coco.h5', by_name=True)

现在我们可以对我们的图像进行预测。首先，我们可以加载图像并将其转换为 NumPy 数组。

# load photograph
img = load_img('elephant.jpg')
img = img_to_array(img)

然后我们可以用这个模型做一个预测。我们将调用*检测()*函数，并将单个图像传递给它，而不是像在正常的 Keras 模型上那样调用 predict() 。

# make prediction
results = rcnn.detect([img], verbose=0)

结果包含我们传递到 detect() 函数的每个图像的字典，在本例中，是一个图像的单个字典的列表。

字典中有边界框、遮罩等的键，每个键指向图像中检测到的多个可能对象的列表。

笔记词典的关键字如下:

感兴趣区域:检测到的对象的边界框或感兴趣区域。
遮罩:检测对象的遮罩。
class_ids ':检测到的对象的类整数。
得分':每个预测类的概率或置信度。

我们可以通过首先获取第一幅图像的字典(例如结果【0】)，然后检索边界框列表(例如 ['rois'] )来绘制图像中检测到的每个框。

boxes = results[0]['rois']

每个边界框都是根据图像中边界框的左下角和右上角坐标来定义的

y1, x1, y2, x2 = boxes[0]

我们可以使用这些坐标从 matplotlib API 创建一个矩形()，并在图像顶部绘制每个矩形。

# get coordinates
y1, x1, y2, x2 = box
# calculate width and height of the box
width, height = x2 - x1, y2 - y1
# create the shape
rect = Rectangle((x1, y1), width, height, fill=False, color='red')
# draw the box
ax.add_patch(rect)

为了保持整洁，我们可以创建一个函数来做到这一点，它将获取照片的文件名和要绘制的边界框列表，并将显示带有框的照片。

# draw an image with detected objects
def draw_image_with_boxes(filename, boxes_list):
     # load the image
     data = pyplot.imread(filename)
     # plot the image
     pyplot.imshow(data)
     # get the context for drawing boxes
     ax = pyplot.gca()
     # plot each box
     for box in boxes_list:
          # get coordinates
          y1, x1, y2, x2 = box
          # calculate width and height of the box
          width, height = x2 - x1, y2 - y1
          # create the shape
          rect = Rectangle((x1, y1), width, height, fill=False, color='red')
          # draw the box
          ax.add_patch(rect)
     # show the plot
     pyplot.show()

我们现在可以将所有这些联系在一起，加载预训练的模型，并使用它来检测我们的大象照片中的对象，然后用所有检测到的对象绘制照片。

下面列出了完整的示例。

# example of inference with a pre-trained coco model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from matplotlib import pyplot
from matplotlib.patches import Rectangle

# draw an image with detected objects
def draw_image_with_boxes(filename, boxes_list):
     # load the image
     data = pyplot.imread(filename)
     # plot the image
     pyplot.imshow(data)
     # get the context for drawing boxes
     ax = pyplot.gca()
     # plot each box
     for box in boxes_list:
          # get coordinates
          y1, x1, y2, x2 = box
          # calculate width and height of the box
          width, height = x2 - x1, y2 - y1
          # create the shape
          rect = Rectangle((x1, y1), width, height, fill=False, color='red')
          # draw the box
          ax.add_patch(rect)
     # show the plot
     pyplot.show()

# define the test configuration
class TestConfig(Config):
     NAME = "test"
     GPU_COUNT = 1
     IMAGES_PER_GPU = 1
     NUM_CLASSES = 1 + 80

# define the model
rcnn = MaskRCNN(mode='inference', model_dir='./', config=TestConfig())
# load coco model weights
rcnn.load_weights('mask_rcnn_coco.h5', by_name=True)
# load photograph
img = load_img('elephant.jpg')
img = img_to_array(img)
# make prediction
results = rcnn.detect([img], verbose=0)
# visualize the results
draw_image_with_boxes('elephant.jpg', results[0]['rois'])

运行该示例会加载模型并执行对象检测。更准确地说，我们已经执行了对象定位，只在检测到的对象周围绘制边界框。

在这种情况下，我们可以看到模型已经正确定位了照片中的单个对象，大象，并在它周围画了一个红色的方框。

Photograph of an Elephant With All Objects Localized With a Bounding Box

一只大象的照片，所有对象都用边界框定位

对象检测示例

现在我们知道如何加载模型并使用它进行预测，让我们更新示例来执行真实对象检测。

也就是说，除了定位对象，我们还想知道它们是什么。

Mask_RCNN API 提供了一个名为 display_instances() 的函数，该函数将获取加载图像的像素值数组和预测字典的各个方面，如边界框、分数和类别标签，并将绘制带有所有这些注释的照片。

参数之一是字典的“ class_ids ”键中可用的预测类标识符列表。该函数还需要 id 到类标签的映射。预训练的模型适合具有 80 个(81 个包括背景)类别标签的数据集，在下面列出的面具 R-CNN 演示，笔记本教程中作为列表提供是有帮助的。

# define 81 classes that the coco model knowns about
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']

然后我们可以向*display _ instance()*函数提供大象照片的预测细节；例如:

# get dictionary for first prediction
r = results[0]
# show photo with bounding boxes, masks, class labels and scores
display_instances(img, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])

display_instances() 功能非常灵活，允许您只绘制蒙版或边界框。您可以在 visualize.py 源文件中了解该功能的更多信息。

下面列出了使用 display_instances() 功能进行此更改的完整示例。

# example of inference with a pre-trained coco model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from mrcnn.visualize import display_instances
from mrcnn.config import Config
from mrcnn.model import MaskRCNN

# define 81 classes that the coco model knowns about
class_names = ['BG', 'person', 'bicycle', 'car', 'motorcycle', 'airplane',
               'bus', 'train', 'truck', 'boat', 'traffic light',
               'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird',
               'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant', 'bear',
               'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie',
               'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
               'kite', 'baseball bat', 'baseball glove', 'skateboard',
               'surfboard', 'tennis racket', 'bottle', 'wine glass', 'cup',
               'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
               'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza',
               'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',
               'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote',
               'keyboard', 'cell phone', 'microwave', 'oven', 'toaster',
               'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
               'teddy bear', 'hair drier', 'toothbrush']

# define the test configuration
class TestConfig(Config):
     NAME = "test"
     GPU_COUNT = 1
     IMAGES_PER_GPU = 1
     NUM_CLASSES = 1 + 80

# define the model
rcnn = MaskRCNN(mode='inference', model_dir='./', config=TestConfig())
# load coco model weights
rcnn.load_weights('mask_rcnn_coco.h5', by_name=True)
# load photograph
img = load_img('elephant.jpg')
img = img_to_array(img)
# make prediction
results = rcnn.detect([img], verbose=0)
# get dictionary for first prediction
r = results[0]
# show photo with bounding boxes, masks, class labels and scores
display_instances(img, r['rois'], r['masks'], r['class_ids'], class_names, r['scores'])

运行该示例显示了大象的照片，其注释由遮罩模型预测，具体如下:

边界框。每个检测到的对象周围的虚线边界框。
类别标签。为每个检测到的对象分配的类标签写在边界框的左上角。
预测置信度。写在边界框左上角的每个检测到的对象的类别标签预测的置信度。
对象蒙版轮廓。每个检测到的对象的遮罩的多边形轮廓。
对象遮罩。每个检测到的对象的蒙版的多边形填充。

结果给人留下了深刻的印象，并激发了许多关于如何在实践中使用如此强大的预训练模型的想法。

Photograph of an Elephant With All Objects Detected With a Bounding Box and Mask

用边界框和遮罩检测到所有对象的大象照片

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

应用程序接口

matplotlib . patches . rectangle API

资源

美国有线电视新闻网代码库

摘要

在本教程中，您发现了如何使用遮罩模型来检测新照片中的对象。

具体来说，您了解到:

用于对象检测的基于区域的卷积神经网络系列模型，以及最近的一个变种，称为掩蔽神经网络。
Keras 深度学习库的 Mask R-CNN 的最佳开源库实现。
如何使用预先训练好的 Mask R-CNN 对新照片进行对象定位和检测。

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何在 Keras 中将 YOLOv3 用于对象检测

原文：machinelearningmastery.com/how-to-perf…

最后更新于 2019 年 10 月 8 日

对象检测是计算机视觉中的一项任务，包括识别给定照片中一个或多个目标的存在、位置和类型。

近年来，深度学习技术在对象检测方面取得了最先进的成果，例如在标准基准数据集和计算机视觉竞赛中。值得注意的是“你只看一次”，或 YOLO，卷积神经网络家族，实现了接近最先进的结果与单一的端到端模型，可以实时执行对象检测。

在本教程中，您将发现如何在新照片上开发用于对象检测的 YOLOv3 模型。

完成本教程后，您将知道:

基于 YOLO 的卷积神经网络模型系列，用于对象检测，最近的变体称为 YOLOv3。
Keras 深度学习库的 YOLOv3 的最佳开源库实现。
如何使用预先训练好的 YOLOv3 对新照片进行对象定位和检测。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

2019 年 10 月更新:针对 Keras 2.3.0 API 和 TensorFlow 2.0.0 进行了更新和测试。

How to Perform Object Detection With YOLOv3 in Keras

如何在 Keras 中使用 YOLOv3 执行对象检测大卫·伯克维茨摄，保留部分权利。

教程概述

本教程分为三个部分；它们是:

对象探测 YOLO
Experiencor YOLO3 项目
用 YOLOv3 进行对象检测

对象探测 YOLO

对象检测是一项计算机视觉任务，包括定位图像中的一个或多个目标，并对图像中的每个目标进行分类。

“你只看一次”或“YOLO”系列模型是一系列为快速对象检测而设计的端到端深度学习模型，由 Joseph Redmon 等人开发，并在 2015 年发表的题为“你只看一次:统一、实时的对象检测”的论文中首次进行了描述

这种方法包括一个深度卷积神经网络(最初是谷歌网的一个版本，后来根据 VGG 进行了更新，并被称为暗网)，它将输入分成一个单元网格，每个单元直接预测一个边界框和对象分类。结果是通过后处理步骤将大量候选边界框合并成最终预测。

在撰写本文时，该方法主要有三种变体；它们是 YOLOv1、YOLOv2 和 YOLOv3。第一个版本提出了通用架构，而第二个版本完善了设计，并使用预定义的锚点框来改进边界框建议，第三个版本进一步完善了模型架构和训练过程。

虽然模型的准确率接近但不如基于区域的卷积神经网络，但由于其检测速度，它们在对象检测中很受欢迎，通常在视频或摄像机输入下实时演示。

单个神经网络在一次评估中直接从完整图像中预测边界框和类概率。由于整个检测流水线是单个网络，因此可以直接在检测表现上进行端到端优化。

——你只看一次:统一实时对象检测，2015。

在本教程中，我们将重点介绍如何使用 YOLOv3。

Keras 项目的 Experiencor YOLO3

每个版本的 YOLO 源代码，以及预先训练的模型。

官方的 DarkNet GitHub 存储库包含论文中提到的 YOLO 版本的源代码，用 c 语言编写。该存储库提供了如何使用代码进行对象检测的逐步教程。

从零开始实现这是一个具有挑战性的模型，尤其是对于初学者，因为它需要开发许多定制的模型元素来进行训练和预测。例如，即使直接使用预先训练的模型，也需要复杂的代码来提取和解释模型输出的预测边界框。

我们可以使用第三方实现，而不是从零开始开发这些代码。有许多第三方实现是为将 YOLO 与 Keras 一起使用而设计的，但没有一个是标准化的，并且被设计成用作库。

yadk 项目是 YOLOv2 事实上的标准，并提供脚本将预先训练的权重转换为 Keras 格式，使用预先训练的模型进行预测，并提供提取和解释预测的边界框所需的代码。许多其他第三方开发人员已经将此代码作为起点，并对其进行了更新以支持 YOLOv3。

使用预先训练的 YOLO 模型的最广泛使用的项目可能是由胡恩·恩高克·安(Huynh Ngoc Anh)或 experiencor 开发的名为“ keras-yolo3:用 YOLO3 训练和检测对象”的项目。项目中的代码已经在一个许可的麻省理工开源许可下可用。像 YAD2K 一样，它提供脚本来加载和使用预先训练的 YOLO 模型，以及在新的数据集上开发 YOLOv3 模型的迁移学习。

他还有一个 keras-yolo2 项目，为 YOLOv2 提供类似的代码，以及如何在存储库中使用代码的详细教程。 keras-yolo3 项目似乎是该项目的更新版本。

有趣的是，experiencor 已经将该模型用作一些实验的基础，并对标准对象检测问题(如袋鼠数据集、浣熊数据集、红细胞检测等)训练了 YOLOv3 版本。他列出了模型表现，提供了模型权重供下载，并提供了模型行为的 YouTube 视频。例如:

使用 YOLO 3 检测浣熊

如何使用 Keras 训练对象检测模型

原文：machinelearningmastery.com/how-to-trai…

最后更新于 2020 年 9 月 2 日

对象检测是一项具有挑战性的计算机视觉任务，包括预测目标在图像中的位置以及检测到的目标类型。

基于掩模区域的卷积神经网络模型是目标识别任务的最先进方法之一。Matterport Mask R-CNN 项目提供了一个库，允许您为自己的对象检测任务开发和训练 Mask R-CNN Keras 模型。对于初学者来说，使用该库可能很棘手，需要仔细准备数据集，尽管它允许通过转移学习进行快速训练，使用在具有挑战性的对象检测任务上训练的顶级模型，如 MS COCO。

在本教程中，您将发现如何为照片中的袋鼠对象检测开发一个 Mask R-CNN 模型。

完成本教程后，您将知道:

如何准备一个对象检测数据集，准备用 R-CNN 建模。
如何利用迁移学习在新的数据集上训练对象检测模型？
如何在测试数据集上评估拟合 Mask R-CNN 模型，并对新照片进行预测。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

How to Train an Object Detection Model to Find Kangaroos in Photographs (R-CNN with Keras)

教程概述

本教程分为五个部分；它们是:

如何为 Keras 安装口罩
如何准备用于对象检测的数据集
如何训练一个用于袋鼠检测的掩蔽模型
如何评估一个面具
如何在新照片中检测袋鼠

注:本教程需要 TensorFlow 1 . 15 . 3 版和 Keras 2.2.4 版。它不适用于 TensorFlow 2.0+或 Keras 2.2.5+，因为在编写本文时，第三方库尚未更新。

您可以按如下方式安装这些特定版本的库:

sudo pip install --no-deps tensorflow==1.15.3
sudo pip install --no-deps keras==2.2.4

如何为 Keras 安装口罩

对象检测是计算机视觉中的一项任务，包括识别给定图像中一个或多个目标的存在、位置和类型。

基于区域的卷积神经网络，或称 R-CNN，是一个为对象检测而设计的卷积神经网络模型家族，由 Ross Girshick 等人开发。该方法可能有四种主要变体，导致了目前被称为 Mask R-CNN 的顶峰。在 2018 年发表的题为“ Mask R-CNN ”的论文中介绍的 Mask R-CNN 是该系列模型的最新变体，同时支持对象检测和对象分割。对象分割不仅包括定位图像中的对象，还为图像指定一个遮罩，精确指示图像中哪些像素属于该对象。

Mask R-CNN 是一个需要实现的复杂模型，尤其是与简单甚至最先进的深度卷积神经网络模型相比。我们可以使用建立在 Keras 深度学习框架之上的可靠的第三方实现，而不是从零开始开发 R-CNN 或 Mask R-CNN 模型的实现。

Mask R-CNN 最好的第三方实现是由马特波特开发的 Mask R-CNN 项目。该项目是在许可许可(例如麻省理工学院许可)下发布的开源项目，该代码已被广泛用于各种项目和卡格尔竞赛。

第一步是安装库。

在撰写本文时，该库没有分布式版本，所以我们必须手动安装。好消息是，这非常容易。

安装包括克隆 GitHub 存储库和在您的工作站上运行安装脚本。如果您遇到问题，请参阅藏在库的自述文件中的安装说明。

第一步。克隆面具

这就像从命令行运行以下命令一样简单:

git clone https://github.com/matterport/Mask_RCNN.git

这将创建一个名为 Mask_RCNN 的新本地目录，如下所示:

Mask_RCNN
├── assets
├── build
│   ├── bdist.macosx-10.13-x86_64
│   └── lib
│       └── mrcnn
├── dist
├── images
├── mask_rcnn.egg-info
├── mrcnn
└── samples
    ├── balloon
    ├── coco
    ├── nucleus
    └── shapes

第二步。安装屏蔽

该库可以通过 pip 直接安装。

将目录改为 Mask_RCNN 目录，运行安装脚本。

在命令行中，键入以下内容:

cd Mask_RCNN
python setup.py install

在 Linux 或 MacOS 上，可能需要安装有 sudo 权限的软件；例如，您可能会看到如下错误:

error: can't create or remove files in install directory

在这种情况下，使用 sudo 安装软件:

sudo python setup.py install

如果您使用的是 Python 虚拟环境( virtualenv )，例如在 EC2 深度学习 AMI 实例(本教程推荐)上，您可以按如下方式将 Mask_RCNN 安装到您的环境中:

sudo ~/anaconda3/envs/tensorflow_p36/bin/python setup.py install

然后，该库将直接安装，您将看到许多成功安装的消息，以下列内容结尾:

...
Finished processing dependencies for mask-rcnn==2.1

这确认您成功地安装了库，并且您拥有最新的版本，在撰写本文时是 2.1 版。

步骤 3:确认库已安装

确认库安装正确总是一个好主意。

您可以通过 pip 命令查询库来确认库安装正确；例如:

pip show mask-rcnn

您应该会看到通知您版本和安装位置的输出；例如:

Name: mask-rcnn
Version: 2.1
Summary: Mask R-CNN for object detection and instance segmentation
Home-page: https://github.com/matterport/Mask_RCNN
Author: Matterport
Author-email: waleed.abdulla@gmail.com
License: MIT
Location: ...
Requires:
Required-by:

我们现在可以使用图书馆了。

如何准备用于对象检测的数据集

接下来，我们需要一个数据集来建模。

在本教程中，我们将使用Huynh Ngoc Anh(experiencor)提供的袋鼠数据集。数据集由包含袋鼠的 183 张照片和为每张照片中的袋鼠提供边界框的 XML 注释文件组成。

掩码 R-CNN 旨在学习预测对象的边界框以及检测到的对象的掩码，袋鼠数据集不提供掩码。因此，我们将使用数据集来学习袋鼠对象检测任务，并忽略遮罩，不关注模型的图像分割能力。

为了准备这个数据集进行建模，需要几个步骤，我们将在本节中依次完成每个步骤，包括下载数据集、解析注释文件、开发一个可由 Mask_RCNN 库使用的kangaroodaset对象，然后测试数据集对象以确认我们正确加载了图像和注释。

安装数据集

第一步是将数据集下载到您当前的工作目录中。

这可以通过直接克隆 GitHub 存储库来实现，如下所示:

git clone https://github.com/experiencor/kangaroo.git

这将创建一个名为“袋鼠”的新目录，其中一个子目录名为“ images/ ”，包含袋鼠的所有 JPEG 照片，另一个子目录名为“ annotes/ ”，包含描述每张照片中袋鼠位置的所有 XML 文件。

kangaroo
├── annots
└── images

在每个子目录中，您可以看到照片和注释文件使用一致的命名约定，文件名使用 5 位零填充的编号系统；例如:

images/00001.jpg
images/00002.jpg
images/00003.jpg
...
annots/00001.xml
annots/00002.xml
annots/00003.xml
...

这使得将照片和注释文件匹配在一起变得非常容易。

我们还可以看到编号系统不是连续的，有一些照片丢失了，例如没有“00007”JPG 或者 XML。

这意味着我们应该集中精力加载目录中的实际文件列表，而不是使用编号系统。

解析注释文件

下一步是找出如何加载注释文件。

首先打开第一个标注文件( annots/00001.xml )看一下；你应该看到:

<annotation>
	<folder>Kangaroo</folder>
	<filename>00001.jpg</filename>
	<path>...</path>
	<source>
		<database>Unknown</database>
	</source>
	<size>
		<width>450</width>
		<height>319</height>
		<depth>3</depth>
	</size>
	<segmented>0</segmented>
	<object>
		<name>kangaroo</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>233</xmin>
			<ymin>89</ymin>
			<xmax>386</xmax>
			<ymax>262</ymax>
		</bndbox>
	</object>
	<object>
		<name>kangaroo</name>
		<pose>Unspecified</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>134</xmin>
			<ymin>105</ymin>
			<xmax>341</xmax>
			<ymax>253</ymax>
		</bndbox>
	</object>
</annotation>

我们可以看到，注释文件包含一个描述照片形状的“大小”元素，以及一个或多个描述照片中袋鼠对象的边界框的“对象”元素。

大小和边界框是我们从每个注释文件中需要的最小信息。我们可以编写一些仔细的 XML 解析代码来处理这些注释文件，这对生产系统来说是个好主意。相反，我们将走捷径开发，使用 XPath 查询直接从每个文件中提取我们需要的数据，例如一个 //size 查询提取大小元素，一个*//对象*或一个 //bndbox 查询提取包围盒元素。

Python 提供了 ElementTree API ，可以用来加载和解析一个 XML 文件，我们可以使用 find() 和 findall() 函数对加载的文档执行 XPath 查询。

首先，注释文件必须作为 ElementTree 对象加载和解析。

# load and parse the file
tree = ElementTree.parse(filename)

加载后，我们可以检索文档的根元素，并从中执行 XPath 查询。

# get the root of the document
root = tree.getroot()

我们可以使用 findall()函数查询“”。//bndbox '查找所有' bndbox 元素，然后枚举每个元素以提取定义每个边界框的 x 和 *y、*T8】min 和 max 值。

元素文本也可以解析为整数值。

# extract each bounding box
for box in root.findall('.//bndbox'):
	xmin = int(box.find('xmin').text)
	ymin = int(box.find('ymin').text)
	xmax = int(box.find('xmax').text)
	ymax = int(box.find('ymax').text)
	coors = [xmin, ymin, xmax, ymax]

然后，我们可以将每个边界框的定义收集到一个列表中。

图像的尺寸也可能有帮助，可以直接查询。

# extract image dimensions
width = int(root.find('.//size/width').text)
height = int(root.find('.//size/height').text)

我们可以将所有这些绑定到一个函数中，该函数将注释文件名作为参数，提取边界框和图像尺寸细节，并返回它们供使用。

下面的*extract _ box()*函数实现了这个行为。

# function to extract bounding boxes from an annotation file
def extract_boxes(filename):
	# load and parse the file
	tree = ElementTree.parse(filename)
	# get the root of the document
	root = tree.getroot()
	# extract each bounding box
	boxes = list()
	for box in root.findall('.//bndbox'):
		xmin = int(box.find('xmin').text)
		ymin = int(box.find('ymin').text)
		xmax = int(box.find('xmax').text)
		ymax = int(box.find('ymax').text)
		coors = [xmin, ymin, xmax, ymax]
		boxes.append(coors)
	# extract image dimensions
	width = int(root.find('.//size/width').text)
	height = int(root.find('.//size/height').text)
	return boxes, width, height

我们可以在我们的注释文件上测试这个函数，例如，在目录中的第一个注释文件上。

下面列出了完整的示例。

# example of extracting bounding boxes from an annotation file
from xml.etree import ElementTree

# function to extract bounding boxes from an annotation file
def extract_boxes(filename):
	# load and parse the file
	tree = ElementTree.parse(filename)
	# get the root of the document
	root = tree.getroot()
	# extract each bounding box
	boxes = list()
	for box in root.findall('.//bndbox'):
		xmin = int(box.find('xmin').text)
		ymin = int(box.find('ymin').text)
		xmax = int(box.find('xmax').text)
		ymax = int(box.find('ymax').text)
		coors = [xmin, ymin, xmax, ymax]
		boxes.append(coors)
	# extract image dimensions
	width = int(root.find('.//size/width').text)
	height = int(root.find('.//size/height').text)
	return boxes, width, height

# extract details form annotation file
boxes, w, h = extract_boxes('kangaroo/annots/00001.xml')
# summarize extracted details
print(boxes, w, h)

运行该示例会返回一个列表，其中包含注释文件中每个边界框的详细信息，以及照片宽度和高度的两个整数。

[[233, 89, 386, 262], [134, 105, 341, 253]] 450 319

现在我们知道了如何加载注释文件，我们可以考虑使用这个功能来开发一个数据集对象。

开发袋鼠机器人对象

mask-rcnn 库要求训练、验证和测试数据集由 mrcnn.utils.Dataset 对象管理。

这意味着必须定义一个新的类来扩展 mrcnn.utils.Dataset 类，并定义一个函数来加载数据集，使用您喜欢的任何名称，如 load_dataset() ，并覆盖两个函数，一个用于加载名为 load_mask() 的掩码，另一个用于加载名为 image_reference() 的图像引用(路径或 URL)。

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# ...

	# load the masks for an image
	def load_mask(self, image_id):
		# ...

	# load an image reference
	def image_reference(self, image_id):
		# ...

要使用数据集对象，先实例化它，然后必须调用您的自定义加载函数，最后调用内置的 prepare() 函数。

例如，我们将创建一个名为kangarodatasset的新类，使用如下:

# prepare the dataset
train_set = KangarooDataset()
train_set.load_dataset(...)
train_set.prepare()

自定义加载函数，例如 load_dataset() 负责定义类和定义数据集中的图像。

通过调用内置的 add_class() 函数并指定“源”(数据集的名称)、“ class_id 或类的整数(例如第一个类的 1 作为 0 保留给背景类)以及“ class_name ”(例如“袋鼠)来定义类。

# define one class
self.add_class("dataset", 1, "kangaroo")

通过调用内置的 add_image() 函数并指定“源”(数据集的名称)、唯一的“ image_id ”(例如不带像“ 00001 ”这样的文件扩展名的文件名)以及可以加载图像的路径(例如“袋鼠/images/00001.jpg ”)来定义对象。

这将为图像定义一个“图像信息”字典，以后可以通过将图像添加到数据集的索引或顺序来检索该字典。您还可以指定将添加到图像信息字典中的其他参数，例如定义注释路径的“注释”。

# add to dataset
self.add_image('dataset', image_id='00001', path='kangaroo/images/00001.jpg', annotation='kangaroo/annots/00001.xml')

例如，我们可以实现一个 load_dataset() 函数，该函数获取数据集目录的路径并加载数据集中的所有图像。

注意，测试显示图像编号“ 00090 ”存在问题，因此我们将从数据集中将其排除。

# load the dataset definitions
def load_dataset(self, dataset_dir):
	# define one class
	self.add_class("dataset", 1, "kangaroo")
	# define data locations
	images_dir = dataset_dir + '/images/'
	annotations_dir = dataset_dir + '/annots/'
	# find all images
	for filename in listdir(images_dir):
		# extract image id
		image_id = filename[:-4]
		# skip bad images
		if image_id in ['00090']:
			continue
		img_path = images_dir + filename
		ann_path = annotations_dir + image_id + '.xml'
		# add to dataset
		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

我们可以更进一步，给函数增加一个参数来定义数据集实例是用于训练还是测试/验证。我们有大约 160 张照片，因此我们可以使用大约 20%或最后 32 张照片作为测试或验证数据集，使用前 131 张或 80%作为训练数据集。

这种划分可以使用文件名中的整数来进行，其中照片号 150 之前的所有照片都将被训练并等于或在 150 之后用于测试。更新后的 load_dataset() 支持训练和测试数据集，如下所示。

# load the dataset definitions
def load_dataset(self, dataset_dir, is_train=True):
	# define one class
	self.add_class("dataset", 1, "kangaroo")
	# define data locations
	images_dir = dataset_dir + '/images/'
	annotations_dir = dataset_dir + '/annots/'
	# find all images
	for filename in listdir(images_dir):
		# extract image id
		image_id = filename[:-4]
		# skip bad images
		if image_id in ['00090']:
			continue
		# skip all images after 150 if we are building the train set
		if is_train and int(image_id) >= 150:
			continue
		# skip all images before 150 if we are building the test/val set
		if not is_train and int(image_id) < 150:
			continue
		img_path = images_dir + filename
		ann_path = annotations_dir + image_id + '.xml'
		# add to dataset
		self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

接下来，我们需要定义 load_mask() 函数，为给定的“ image_id 加载蒙版。

在这种情况下，' image_id '是数据集中某个图像的整数索引，它是根据加载数据集时通过调用 add_image() 添加图像的顺序来分配的。该函数必须为与 image_id 关联的照片返回一个或多个遮罩的数组，以及每个遮罩的类别。

我们没有面具，但我们有边界框。我们可以为给定的照片加载边界框，并将它们作为蒙版返回。然后，该库将从我们的“遮罩”中推断边界框，这些遮罩将具有相同的大小。

首先，我们必须为 image_id 加载注释文件。这包括首先检索图像 _id 的“图像信息字典，然后检索我们通过先前调用 add_image() 为图像存储的注释路径。然后，我们可以使用前面部分开发的*extract _ box()*调用中的路径来获取边界框列表和图像的尺寸。

# get details of image
info = self.image_info[image_id]
# define box file location
path = info['annotation']
# load XML
boxes, w, h = self.extract_boxes(path)

我们现在可以为每个边界框定义一个遮罩和一个关联的类。

蒙版是一个二维数组，其尺寸与照片相同，对象不在时所有值为零，对象在照片中时所有值为一。

我们可以通过创建一个 NumPy 数组来实现这一点，该数组对于已知大小的图像具有所有零值，对于每个边界框具有一个通道。

# create one array for all masks, each on a different channel
masks = zeros([h, w, len(boxes)], dtype='uint8')

每个包围盒定义为盒的 min 和 max 、 x 和 y 坐标。

这些可以直接用于定义数组中的行和列范围，然后可以标记为 1。

# create masks
for i in range(len(boxes)):
	box = boxes[i]
	row_s, row_e = box[1], box[3]
	col_s, col_e = box[0], box[2]
	masks[row_s:row_e, col_s:col_e, i] = 1

该数据集中的所有对象都具有相同的类。我们可以通过“类名”字典来检索类索引，然后将其添加到要与掩码一起返回的列表中。

self.class_names.index('kangaroo')

将这些结合在一起，完整的 load_mask() 功能如下所示。

# load the masks for an image
def load_mask(self, image_id):
	# get details of image
	info = self.image_info[image_id]
	# define box file location
	path = info['annotation']
	# load XML
	boxes, w, h = self.extract_boxes(path)
	# create one array for all masks, each on a different channel
	masks = zeros([h, w, len(boxes)], dtype='uint8')
	# create masks
	class_ids = list()
	for i in range(len(boxes)):
		box = boxes[i]
		row_s, row_e = box[1], box[3]
		col_s, col_e = box[0], box[2]
		masks[row_s:row_e, col_s:col_e, i] = 1
		class_ids.append(self.class_names.index('kangaroo'))
	return masks, asarray(class_ids, dtype='int32')

最后，我们必须实现 image_reference() 功能。

这个函数负责返回给定的“ image_id ”的路径或 URL，我们知道这只是“image infodict”上的“ path 属性。

# load an image reference
def image_reference(self, image_id):
	info = self.image_info[image_id]
	return info['path']

就这样。我们已经为袋鼠数据集的遮罩-rcnn 库成功定义了数据集对象。

下面提供了该类的完整列表以及一个训练和测试数据集。

# split into train and test set
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))

# test/val set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))

运行该示例成功加载和准备了列车和测试数据集，并打印了每个数据集的图像数。

Train: 131
Test: 32

现在我们已经定义了数据集，让我们确认图像、遮罩和边界框处理正确。

测试袋鼠对象

第一个有用的测试是确认图像和遮罩可以正确加载。

我们可以通过使用 image_id 调用 load_image() 函数来创建数据集并加载图像，然后使用相同的 image_id 调用 load_mask() 函数来加载图像的遮罩。

# load an image
image_id = 0
image = train_set.load_image(image_id)
print(image.shape)
# load image mask
mask, class_ids = train_set.load_mask(image_id)
print(mask.shape)

接下来，我们可以使用 Matplotlib API 绘制照片，然后在顶部绘制带有 alpha 值的第一个蒙版，这样下面的照片仍然可以看到

# plot image
pyplot.imshow(image)
# plot mask
pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
pyplot.show()

下面列出了完整的示例。

# plot one photograph and mask
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from matplotlib import pyplot

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
# load an image
image_id = 0
image = train_set.load_image(image_id)
print(image.shape)
# load image mask
mask, class_ids = train_set.load_mask(image_id)
print(mask.shape)
# plot image
pyplot.imshow(image)
# plot mask
pyplot.imshow(mask[:, :, 0], cmap='gray', alpha=0.5)
pyplot.show()

运行该示例首先打印照片的形状并屏蔽 NumPy 数组。

我们可以确认两个阵列的宽度和高度相同，只是通道数量不同。我们还可以看到，这种情况下的第一张照片(例如 image_id=0 )只有一个蒙版。

(626, 899, 3)
(626, 899, 1)

在第一个蒙版被覆盖的情况下，照片的情节也被创建。

在这种情况下，我们可以看到照片中有一只袋鼠，面具正确地限制了袋鼠。

Photograph of Kangaroo With Object Detection Mask Overlaid

覆盖对象检测面具的袋鼠照片

我们可以对数据集中的前九张照片重复这一过程，将一幅图中的每张照片绘制为一个子图，并为每张照片绘制所有遮罩。

# plot first few images
for i in range(9):
	# define subplot
	pyplot.subplot(330 + 1 + i)
	# plot raw pixel data
	image = train_set.load_image(i)
	pyplot.imshow(image)
	# plot all masks
	mask, _ = train_set.load_mask(i)
	for j in range(mask.shape[2]):
		pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
# show the figure
pyplot.show()

运行该示例表明照片加载正确，并且那些具有多个对象的照片正确定义了单独的遮罩。

Plot of First Nine Photos of Kangaroos in the Training Dataset With Object Detection Masks

带有对象检测遮罩的训练数据集中袋鼠前九张照片的绘图

另一个有用的调试步骤可能是加载数据集中所有的“图像信息”对象，并将它们打印到控制台。

这有助于确认对 load_dataset() 函数中的 add_image() 函数的所有调用都按预期工作。

# enumerate all images in the dataset
for image_id in train_set.image_ids:
	# load image info
	info = train_set.image_info[image_id]
	# display on the console
	print(info)

在加载的训练数据集上运行该代码将显示所有的“图像信息”字典，显示数据集中每个图像的路径和 id。

{'id': '00132', 'source': 'dataset', 'path': 'kangaroo/images/00132.jpg', 'annotation': 'kangaroo/annots/00132.xml'}
{'id': '00046', 'source': 'dataset', 'path': 'kangaroo/images/00046.jpg', 'annotation': 'kangaroo/annots/00046.xml'}
{'id': '00052', 'source': 'dataset', 'path': 'kangaroo/images/00052.jpg', 'annotation': 'kangaroo/annots/00052.xml'}
...

最后，蒙版-rcnn 库提供了显示图像和蒙版的实用程序。我们可以使用其中一些内置函数来确认数据集是否正常运行。

例如，蒙版-rcnn 库提供了*MRC nn . visualize . display _ instances()功能，该功能将显示带有边界框、蒙版和类别标签的照片。这要求通过提取 _ bbox()*功能从遮罩中提取边界框。

# define image id
image_id = 1
# load the image
image = train_set.load_image(image_id)
# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
# extract bounding boxes from the masks
bbox = extract_bboxes(mask)
# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

为了完整起见，下面提供了完整的代码列表。

# display image with masks and bounding boxes
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.visualize import display_instances
from mrcnn.utils import extract_bboxes

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
# define image id
image_id = 1
# load the image
image = train_set.load_image(image_id)
# load the masks and the class ids
mask, class_ids = train_set.load_mask(image_id)
# extract bounding boxes from the masks
bbox = extract_bboxes(mask)
# display image with masks and bounding boxes
display_instances(image, bbox, mask, class_ids, train_set.class_names)

运行该示例会创建一个绘图，用单独的颜色显示带有每个对象的遮罩的照片。

根据设计，边界框与遮罩完全匹配，并以虚线轮廓显示。最后，每个对象都标有类标签，在本例中是“袋鼠”。

Photograph Showing Object Detection Masks, Bounding Boxes, and Class Labels

显示对象检测遮罩、边界框和类别标签的照片

现在我们确信我们的数据集被正确加载，我们可以使用它来拟合一个遮罩模型。

如何训练用于袋鼠检测的掩码 R-CNN 模型

Mask R-CNN 模型可以从零开始拟合，尽管像其他计算机视觉应用程序一样，使用迁移学习可以节省时间并提高表现。

预拟合在 MS COCO 对象检测数据集上的 Mask R-CNN 模型可以用作起点，然后根据特定数据集进行定制，在这种情况下是袋鼠数据集。

第一步是下载预拟合 Mask R-CNN 模型的模型文件(架构和权重)。权重可从 GitHub 项目获得，文件约为 250 兆字节。

将模型权重下载到当前工作目录中名为“ mask_rcnn_coco.h5 的文件中。

下载权重(mask_rcnn_coco.h5) 246M

接下来，必须定义模型的配置对象。

这是一个新的类，扩展了 mrcnn.config.Config 类，定义了预测问题的属性(如类的名称和数量)和训练模型的算法(如学习率)。

配置必须通过“名称”属性定义配置名称，例如“袋鼠 _cfg ”，该属性将用于在运行期间将细节和模型保存到文件中。配置还必须通过“ NUM_CLASSES ”属性定义预测问题中的类的数量。在这种情况下，我们只有一个袋鼠的对象类型，尽管背景总是有一个额外的类。

最后，我们必须定义每个训练时期使用的样本(照片)数量。这将是训练数据集中的照片数量，在本例中为 131 张。

将这些联系在一起，我们的自定义类定义如下。

# define a configuration for the model
class KangarooConfig(Config):
	# Give the configuration a recognizable name
	NAME = "kangaroo_cfg"
	# Number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# Number of training steps per epoch
	STEPS_PER_EPOCH = 131

# prepare config
config = KangarooConfig()

接下来，我们可以定义我们的模型。

这是通过创建一个 mrcnn.model.MaskRCNN 类的实例，并通过将“模式参数设置为“训练来指定用于训练的模型来实现的。

“配置参数也必须用我们的袋鼠配置类的实例来指定。

最后，需要一个目录来保存配置文件，并在每个时期结束时保存检查点模型。我们将使用当前的工作目录。

# define the model
model = MaskRCNN(mode='training', model_dir='./', config=config)

接下来，可以加载预定义的模型架构和权重。这可以通过在模型上调用 load_weights() 函数并指定下载的“ mask_rcnn_coco.h5 文件的路径来实现。

该模型将按原样使用，尽管特定于类的输出层将被移除，以便可以定义和训练新的输出层。这可以通过指定“ exclude ”参数并列出模型加载后要排除或移除的所有输出层来实现。这包括分类标签、边界框和遮罩的输出层。

# load weights (mscoco)
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])

接下来，可以通过调用 train() 函数并传入训练数据集和验证数据集来将模型拟合到训练数据集上。我们还可以在配置中将学习率指定为默认学习率(0.001)。

我们还可以指定要训练哪些层。在这种情况下，我们将只训练头部，也就是模型的输出层。

# train weights (output layers or 'heads')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

我们可以在这个训练的基础上进一步调整模型中的所有权重。这可以通过使用较小的学习率并将“层”参数从“头”更改为“全部”来实现。

下面列出了在袋鼠数据集上训练面罩的完整示例。

这可能需要一些时间在中央处理器上执行，即使使用现代硬件。我建议用 GPU 运行代码，比如在亚马逊 EC2 上运行，在 P3 类型的硬件上大约五分钟就能完成。

# fit a mask rcnn on the kangaroo dataset
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from mrcnn.utils import Dataset
from mrcnn.config import Config
from mrcnn.model import MaskRCNN

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define a configuration for the model
class KangarooConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# number of training steps per epoch
	STEPS_PER_EPOCH = 131

# prepare train set
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# prepare test/val set
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# prepare config
config = KangarooConfig()
config.display()
# define the model
model = MaskRCNN(mode='training', model_dir='./', config=config)
# load weights (mscoco) and exclude the output layers
model.load_weights('mask_rcnn_coco.h5', by_name=True, exclude=["mrcnn_class_logits", "mrcnn_bbox_fc",  "mrcnn_bbox", "mrcnn_mask"])
# train weights (output layers or 'heads')
model.train(train_set, test_set, learning_rate=config.LEARNING_RATE, epochs=5, layers='heads')

运行该示例将使用标准的 Keras 进度条报告进度。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到，对于网络的每个输出头，报告了许多不同的训练和测试损失分数。注意哪一项损失可能会很令人困惑。

在这个我们感兴趣的对象检测而不是对象分割的例子中，我建议关注训练和验证数据集上分类输出的损失(例如 mrcnn_class_loss 和 val_mrcnn_class_loss ，以及训练和验证数据集的包围盒输出的损失( mrcnn_bbox_loss 和 val_mrcnn_bbox_loss )。

Epoch 1/5
131/131 [==============================] - 106s 811ms/step - loss: 0.8491 - rpn_class_loss: 0.0044 - rpn_bbox_loss: 0.1452 - mrcnn_class_loss: 0.0420 - mrcnn_bbox_loss: 0.2874 - mrcnn_mask_loss: 0.3701 - val_loss: 1.3402 - val_rpn_class_loss: 0.0160 - val_rpn_bbox_loss: 0.7913 - val_mrcnn_class_loss: 0.0092 - val_mrcnn_bbox_loss: 0.2263 - val_mrcnn_mask_loss: 0.2975
Epoch 2/5
131/131 [==============================] - 69s 526ms/step - loss: 0.4774 - rpn_class_loss: 0.0025 - rpn_bbox_loss: 0.1159 - mrcnn_class_loss: 0.0170 - mrcnn_bbox_loss: 0.1134 - mrcnn_mask_loss: 0.2285 - val_loss: 0.6261 - val_rpn_class_loss: 8.9502e-04 - val_rpn_bbox_loss: 0.1624 - val_mrcnn_class_loss: 0.0197 - val_mrcnn_bbox_loss: 0.2148 - val_mrcnn_mask_loss: 0.2282
Epoch 3/5
131/131 [==============================] - 67s 515ms/step - loss: 0.4471 - rpn_class_loss: 0.0029 - rpn_bbox_loss: 0.1153 - mrcnn_class_loss: 0.0234 - mrcnn_bbox_loss: 0.0958 - mrcnn_mask_loss: 0.2097 - val_loss: 1.2998 - val_rpn_class_loss: 0.0144 - val_rpn_bbox_loss: 0.6712 - val_mrcnn_class_loss: 0.0372 - val_mrcnn_bbox_loss: 0.2645 - val_mrcnn_mask_loss: 0.3125
Epoch 4/5
131/131 [==============================] - 66s 502ms/step - loss: 0.3934 - rpn_class_loss: 0.0026 - rpn_bbox_loss: 0.1003 - mrcnn_class_loss: 0.0171 - mrcnn_bbox_loss: 0.0806 - mrcnn_mask_loss: 0.1928 - val_loss: 0.6709 - val_rpn_class_loss: 0.0016 - val_rpn_bbox_loss: 0.2012 - val_mrcnn_class_loss: 0.0244 - val_mrcnn_bbox_loss: 0.1942 - val_mrcnn_mask_loss: 0.2495
Epoch 5/5
131/131 [==============================] - 65s 493ms/step - loss: 0.3357 - rpn_class_loss: 0.0024 - rpn_bbox_loss: 0.0804 - mrcnn_class_loss: 0.0193 - mrcnn_bbox_loss: 0.0616 - mrcnn_mask_loss: 0.1721 - val_loss: 0.8878 - val_rpn_class_loss: 0.0030 - val_rpn_bbox_loss: 0.4409 - val_mrcnn_class_loss: 0.0174 - val_mrcnn_bbox_loss: 0.1752 - val_mrcnn_mask_loss: 0.2513

模型文件在每个纪元结束时创建并保存在一个子目录中，该子目录以“袋鼠 _cfg 开头，后跟随机字符。

必须选择要使用的模型；在这种情况下，每个时期的边界框的损失继续减少，因此我们将在运行结束时使用最终模型(' T0 ' mask _ rcnn _ 袋鼠 _cfg_0005.h5 ')。

将模型文件从配置目录复制到当前工作目录中。我们将在下面的部分中使用它来评估模型并进行预测。

结果表明，也许更多的训练时期可能是有用的，也许微调模型中的所有层；这可能会对教程进行有趣的扩展。

接下来，让我们看看如何评估这个模型的表现。

如何评估一个面具

对象识别任务的模型表现通常使用平均绝对准确率(mAP)来评估。

我们正在预测边界框，因此我们可以根据预测的边界框和实际边界框重叠的程度来确定边界框预测是否良好。这可以通过将重叠区域除以两个边界框的总面积来计算，或者将交集除以并集来计算，称为并集上的“交集”或 IoU。完美的边界框预测的 IoU 为 1。

如果 IoU 大于 0.5，则假设边界框为正预测是标准的，例如，它们重叠 50%或更多。

准确率是指所有预测的边界框中正确预测的边界框的百分比(IoU > 0.5)。回忆是照片中所有对象中正确预测的边界框的百分比(IoU > 0.5)。

随着我们做出更多的预测，召回率将会增加，但随着我们开始做出假阳性预测，精确度将会下降或变得不稳定。召回( x )可以相对于每一个预测数的准确率( y )来绘制，以创建曲线或直线。我们可以最大化这条线上每个点的值，并为每个召回值计算准确率或 AP 的平均值。

注:AP 的计算方式存在差异，例如广泛使用的 PASCAL VOC 数据集和 MS COCO 数据集的计算方式不同。

数据集中所有图像的平均准确率(AP)的平均值或平均值称为平均准确率(mAP)。

mask-rcnn 库提供了一个 mrcnn.utils.compute_ap 来计算给定图像的 ap 和其他度量。这些应用程序分数可以在整个数据集上收集，并计算平均值，以了解该模型在检测数据集中的对象方面有多好。

首先，我们必须定义一个新的配置对象来进行预测，而不是训练。我们可以扩展我们之前定义的袋鼠配置来重用参数。相反，我们将定义一个具有相同值的新对象，以保持代码紧凑。配置必须改变使用图形处理器进行推理的一些默认值，这些默认值不同于为训练模型而设置的默认值(无论您是在图形处理器上运行还是在中央处理器上运行)。

# define the prediction configuration
class PredictionConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# simplify GPU config
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

接下来，我们可以使用配置定义模型，并将“模式”参数设置为“推理，而不是“训练”。

# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)

接下来，我们可以从保存的模型中加载权重。

我们可以通过指定模型文件的路径来做到这一点。在这种情况下，模型文件在当前工作目录中是'mask _ rcnn _ 袋鼠 _cfg_0005.h5 '。

# load model weights
model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)

接下来，我们可以对模型进行评估。这包括枚举数据集中的图像，进行预测，并在预测所有图像的平均 AP 之前计算预测的 AP。

首先，对于给定的图像 _id ，可以从数据集中加载图像和地面真实遮罩。这可以通过 load_image_gt() 便利功能来实现。

# load image, bounding boxes and masks for the image id
image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)

接下来，加载图像的像素值必须以与训练数据相同的方式进行缩放，例如居中。这可以使用 mold_image() 便利功能来实现。

# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)

然后需要将图像的维度在数据集中扩展一个样本，并将其用作模型预测的输入。

sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)
# extract results for first sample
r = yhat[0]

接下来，可以将预测与使用 compute_ap() 函数计算的地面事实和度量进行比较。

# calculate statistics, including AP
AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])

可以将 AP 值添加到列表中，然后计算平均值。

将这些联系在一起，下面的 evaluate_model() 函数实现了这一点，并计算给定数据集、模型和配置的 mAP。

# calculate the mAP for a model on a given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		# load image, bounding boxes and masks for the image id
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)
		# extract results for first sample
		r = yhat[0]
		# calculate statistics, including AP
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		# store
		APs.append(AP)
	# calculate the mean AP across all images
	mAP = mean(APs)
	return mAP

我们现在可以在训练和测试数据集上计算模型的 mAP。

# evaluate model on training dataset
train_mAP = evaluate_model(train_set, model, cfg)
print("Train mAP: %.3f" % train_mAP)
# evaluate model on test dataset
test_mAP = evaluate_model(test_set, model, cfg)
print("Test mAP: %.3f" % test_mAP)

为了完整起见，下面提供了完整的代码列表。

# evaluate the mask rcnn model on the kangaroo dataset
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from numpy import mean
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.utils import Dataset
from mrcnn.utils import compute_ap
from mrcnn.model import load_image_gt
from mrcnn.model import mold_image

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# extract bounding boxes from an annotation file
	def extract_boxes(self, filename):
		# load and parse the file
		tree = ElementTree.parse(filename)
		# get the root of the document
		root = tree.getroot()
		# extract each bounding box
		boxes = list()
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define the prediction configuration
class PredictionConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# simplify GPU config
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

# calculate the mAP for a model on a given dataset
def evaluate_model(dataset, model, cfg):
	APs = list()
	for image_id in dataset.image_ids:
		# load image, bounding boxes and masks for the image id
		image, image_meta, gt_class_id, gt_bbox, gt_mask = load_image_gt(dataset, cfg, image_id, use_mini_mask=False)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)
		# extract results for first sample
		r = yhat[0]
		# calculate statistics, including AP
		AP, _, _, _ = compute_ap(gt_bbox, gt_class_id, gt_mask, r["rois"], r["class_ids"], r["scores"], r['masks'])
		# store
		APs.append(AP)
	# calculate the mean AP across all images
	mAP = mean(APs)
	return mAP

# load the train dataset
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# load the test dataset
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
# load model weights
model.load_weights('mask_rcnn_kangaroo_cfg_0005.h5', by_name=True)
# evaluate model on training dataset
train_mAP = evaluate_model(train_set, model, cfg)
print("Train mAP: %.3f" % train_mAP)
# evaluate model on test dataset
test_mAP = evaluate_model(test_set, model, cfg)
print("Test mAP: %.3f" % test_mAP)

运行该示例将对训练和测试数据集中的每个图像进行预测，并计算每个图像的 mAP。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

mAP 高于 90%或 95%是一个很好的分数。我们可以看到，mAP 分数在两个数据集上都很好，在测试数据集上可能比在训练数据集上稍好。

这可能是因为数据集非常小，和/或因为模型可以从进一步的训练中受益。

Train mAP: 0.929
Test mAP: 0.958

既然我们对这个模型是合理的有了一些信心，我们就可以用它来做一些预测。

如何在新照片中检测袋鼠

我们可以使用训练好的模型来检测新照片中的袋鼠，特别是在我们期望有袋鼠的照片中。

首先，我们需要一张新的袋鼠照片。

我们可以去 Flickr 上随便找一张袋鼠的照片。或者，我们可以使用测试数据集中未用于训练模型的任何照片。

我们在上一节已经看到了如何用图像进行预测。具体来说，就是缩放像素值，调用 model.detect() 。例如:

# example of making a prediction
...
# load image
image = ...
# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# convert image into one sample
sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)
...

让我们更进一步，对数据集中的多个图像进行预测，然后将带有边框的照片与照片和预测的边框并排绘制出来。这将提供一个直观的指南，说明模型在预测方面有多好。

第一步是从数据集中加载图像和遮罩。

# load the image and mask
image = dataset.load_image(image_id)
mask, _ = dataset.load_mask(image_id)

接下来，我们可以对图像进行预测。

# convert pixel values (e.g. center)
scaled_image = mold_image(image, cfg)
# convert image into one sample
sample = expand_dims(scaled_image, 0)
# make prediction
yhat = model.detect(sample, verbose=0)[0]

接下来，我们可以为地面真相创建一个子场景，并用已知的边界框绘制图像。

# define subplot
pyplot.subplot(n_images, 2, i*2+1)
# plot raw pixel data
pyplot.imshow(image)
pyplot.title('Actual')
# plot masks
for j in range(mask.shape[2]):
	pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)

然后，我们可以在第一个子图的基础上创建第二个子图，绘制第一个子图，再次绘制照片，这次用红色绘制预测的边界框。

# get the context for drawing boxes
pyplot.subplot(n_images, 2, i*2+2)
# plot raw pixel data
pyplot.imshow(image)
pyplot.title('Predicted')
ax = pyplot.gca()
# plot each box
for box in yhat['rois']:
	# get coordinates
	y1, x1, y2, x2 = box
	# calculate width and height of the box
	width, height = x2 - x1, y2 - y1
	# create the shape
	rect = Rectangle((x1, y1), width, height, fill=False, color='red')
	# draw the box
	ax.add_patch(rect)

我们可以将所有这些绑定到一个函数中，该函数获取数据集、模型和配置，并用地面真实和预测边界框创建数据集中前五张照片的图。

# plot a number of photos with ground truth and predictions
def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
	# load image and mask
	for i in range(n_images):
		# load the image and mask
		image = dataset.load_image(i)
		mask, _ = dataset.load_mask(i)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)[0]
		# define subplot
		pyplot.subplot(n_images, 2, i*2+1)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Actual')
		# plot masks
		for j in range(mask.shape[2]):
			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
		# get the context for drawing boxes
		pyplot.subplot(n_images, 2, i*2+2)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Predicted')
		ax = pyplot.gca()
		# plot each box
		for box in yhat['rois']:
			# get coordinates
			y1, x1, y2, x2 = box
			# calculate width and height of the box
			width, height = x2 - x1, y2 - y1
			# create the shape
			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
			# draw the box
			ax.add_patch(rect)
	# show the figure
	pyplot.show()

下面列出了加载训练好的模型并对训练和测试数据集中的前几幅图像进行预测的完整示例。

# detect kangaroos in photos with mask rcnn model
from os import listdir
from xml.etree import ElementTree
from numpy import zeros
from numpy import asarray
from numpy import expand_dims
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from mrcnn.config import Config
from mrcnn.model import MaskRCNN
from mrcnn.model import mold_image
from mrcnn.utils import Dataset

# class that defines and loads the kangaroo dataset
class KangarooDataset(Dataset):
	# load the dataset definitions
	def load_dataset(self, dataset_dir, is_train=True):
		# define one class
		self.add_class("dataset", 1, "kangaroo")
		# define data locations
		images_dir = dataset_dir + '/images/'
		annotations_dir = dataset_dir + '/annots/'
		# find all images
		for filename in listdir(images_dir):
			# extract image id
			image_id = filename[:-4]
			# skip bad images
			if image_id in ['00090']:
				continue
			# skip all images after 150 if we are building the train set
			if is_train and int(image_id) >= 150:
				continue
			# skip all images before 150 if we are building the test/val set
			if not is_train and int(image_id) < 150:
				continue
			img_path = images_dir + filename
			ann_path = annotations_dir + image_id + '.xml'
			# add to dataset
			self.add_image('dataset', image_id=image_id, path=img_path, annotation=ann_path)

	# load all bounding boxes for an image
	def extract_boxes(self, filename):
		# load and parse the file
		root = ElementTree.parse(filename)
		boxes = list()
		# extract each bounding box
		for box in root.findall('.//bndbox'):
			xmin = int(box.find('xmin').text)
			ymin = int(box.find('ymin').text)
			xmax = int(box.find('xmax').text)
			ymax = int(box.find('ymax').text)
			coors = [xmin, ymin, xmax, ymax]
			boxes.append(coors)
		# extract image dimensions
		width = int(root.find('.//size/width').text)
		height = int(root.find('.//size/height').text)
		return boxes, width, height

	# load the masks for an image
	def load_mask(self, image_id):
		# get details of image
		info = self.image_info[image_id]
		# define box file location
		path = info['annotation']
		# load XML
		boxes, w, h = self.extract_boxes(path)
		# create one array for all masks, each on a different channel
		masks = zeros([h, w, len(boxes)], dtype='uint8')
		# create masks
		class_ids = list()
		for i in range(len(boxes)):
			box = boxes[i]
			row_s, row_e = box[1], box[3]
			col_s, col_e = box[0], box[2]
			masks[row_s:row_e, col_s:col_e, i] = 1
			class_ids.append(self.class_names.index('kangaroo'))
		return masks, asarray(class_ids, dtype='int32')

	# load an image reference
	def image_reference(self, image_id):
		info = self.image_info[image_id]
		return info['path']

# define the prediction configuration
class PredictionConfig(Config):
	# define the name of the configuration
	NAME = "kangaroo_cfg"
	# number of classes (background + kangaroo)
	NUM_CLASSES = 1 + 1
	# simplify GPU config
	GPU_COUNT = 1
	IMAGES_PER_GPU = 1

# plot a number of photos with ground truth and predictions
def plot_actual_vs_predicted(dataset, model, cfg, n_images=5):
	# load image and mask
	for i in range(n_images):
		# load the image and mask
		image = dataset.load_image(i)
		mask, _ = dataset.load_mask(i)
		# convert pixel values (e.g. center)
		scaled_image = mold_image(image, cfg)
		# convert image into one sample
		sample = expand_dims(scaled_image, 0)
		# make prediction
		yhat = model.detect(sample, verbose=0)[0]
		# define subplot
		pyplot.subplot(n_images, 2, i*2+1)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Actual')
		# plot masks
		for j in range(mask.shape[2]):
			pyplot.imshow(mask[:, :, j], cmap='gray', alpha=0.3)
		# get the context for drawing boxes
		pyplot.subplot(n_images, 2, i*2+2)
		# plot raw pixel data
		pyplot.imshow(image)
		pyplot.title('Predicted')
		ax = pyplot.gca()
		# plot each box
		for box in yhat['rois']:
			# get coordinates
			y1, x1, y2, x2 = box
			# calculate width and height of the box
			width, height = x2 - x1, y2 - y1
			# create the shape
			rect = Rectangle((x1, y1), width, height, fill=False, color='red')
			# draw the box
			ax.add_patch(rect)
	# show the figure
	pyplot.show()

# load the train dataset
train_set = KangarooDataset()
train_set.load_dataset('kangaroo', is_train=True)
train_set.prepare()
print('Train: %d' % len(train_set.image_ids))
# load the test dataset
test_set = KangarooDataset()
test_set.load_dataset('kangaroo', is_train=False)
test_set.prepare()
print('Test: %d' % len(test_set.image_ids))
# create config
cfg = PredictionConfig()
# define the model
model = MaskRCNN(mode='inference', model_dir='./', config=cfg)
# load model weights
model_path = 'mask_rcnn_kangaroo_cfg_0005.h5'
model.load_weights(model_path, by_name=True)
# plot predictions for train dataset
plot_actual_vs_predicted(train_set, model, cfg)
# plot predictions for test dataset
plot_actual_vs_predicted(test_set, model, cfg)

运行该示例首先创建一个图形，显示训练数据集中的五张照片，这些照片带有地面真实边界框，同一张照片和预测边界框在旁边。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到模型在这些例子中做得很好，找到了所有的袋鼠，即使在一张照片中有两三只袋鼠的情况下。第二张向下的照片(在右栏)确实显示了一个失误，模型已经预测了同一个袋鼠周围的边界框两次。

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

用地面真实度和预测边界框绘制训练数据集中袋鼠的照片

创建第二个图形，显示测试数据集中的五张照片，包括地面真实边界框和预测边界框。

这些图像在训练中是看不到的，同样，在每张照片中，模型都检测到了袋鼠。我们可以看到，在第二张最后一张照片的情况下，出现了一个小错误。具体来说，同一只袋鼠被检测了多次。

毫无疑问，这些差异可以通过更多的训练来消除，也许可以通过更大的数据集和/或数据扩充来鼓励模型检测人作为背景，并且只检测一次给定的袋鼠。

Plot of Photos of Kangaroos From the Training Dataset With Ground Truth and Predicted Bounding Boxes

用地面真实度和预测边界框绘制训练数据集中袋鼠的照片

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

口罩 R-CNN，2017 。

项目

袋鼠数据集，GitHub 。
屏蔽 RCNN 项目，GitHub 。

蜜蜂

文章

摘要

在本教程中，您发现了如何为照片中的袋鼠对象检测开发一个 Mask R-CNN 模型。

具体来说，您了解到:

如何准备一个对象检测数据集，准备用 R-CNN 建模。
如何利用迁移学习在新的数据集上训练对象检测模型？
如何在测试数据集上评估拟合 Mask R-CNN 模型，并对新照片进行预测。

你有什么问题吗？在下面的评论中提问，我会尽力回答。