Machine-Learning-Mastery-计算机视觉教程-四-Machine Learning Mastery

Machine Learning Mastery 计算机视觉教程（四）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

如何使用 FaceNet 在 Keras 中开发人脸识别系统

原文：machinelearningmastery.com/how-to-deve…

最后更新于 2020 年 8 月 24 日

人脸识别是一项基于人脸照片识别和验证人的计算机视觉任务。

FaceNet 是谷歌研究人员于 2015 年开发的人脸识别系统，在一系列人脸识别基准数据集上取得了当时最先进的结果。由于模型的多个第三方开源实现和预训练模型的可用性，FaceNet 系统可以被广泛使用。

FaceNet 系统可以用来从人脸中提取高质量的特征，称为人脸嵌入，然后可以用来训练人脸识别系统。

在本教程中，您将发现如何使用 FaceNet 和 SVM 分类器开发人脸检测系统，以从照片中识别人。

完成本教程后，您将知道:

关于由谷歌开发的 FaceNet 人脸识别系统以及开源实现和预训练模型。
如何准备人脸检测数据集包括首先通过人脸检测系统提取人脸，然后通过人脸嵌入提取人脸特征。
如何拟合，评估和演示一个 SVM 模型来预测人脸嵌入的身份。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

**2019 年 11 月更新:**针对 TensorFlow v2.0 和 MTCNN v0.1.0 进行了更新。

How to Develop a Face Recognition System Using FaceNet in Keras and an SVM Classifier

教程概述

本教程分为五个部分；它们是:

人脸识别
FaceNet 模型
如何在 Keras 中加载 FaceNet 模型
如何检测人脸进行人脸识别
如何开发人脸分类系统

人脸识别

人脸识别是从人脸照片中识别和验证人的一般任务。

2011 年出版的名为《人脸识别手册》的人脸识别书籍描述了人脸识别的两种主要模式，如下所示:

人脸验证。给定人脸与已知身份(例如的一对一映射是这个人吗？)。
人脸识别。给定人脸与已知人脸数据库的一对多映射(例如这个人是谁？)。

人脸识别系统有望自动识别图像和视频中的人脸。它可以在两种模式中的一种或两种模式下工作:(1)人脸验证(或认证)和(2)人脸识别(或识别)。

—第 1 页，人脸识别手册。2011.

我们将在本教程中重点介绍人脸识别任务。

FaceNet 模型

FaceNet 是一个人脸识别系统，由谷歌的 Florian schrovf 等人在他们 2015 年发表的题为“ FaceNet:人脸识别和聚类的统一嵌入的论文中描述

这是一个系统，给定一张人脸图片，将从人脸中提取高质量的特征，并预测这些特征的 128 元素向量表示，称为人脸嵌入。

FaceNet，它直接学习从人脸图像到紧致欧氏空间的映射，其中距离直接对应于人脸相似性的度量。

——Face net:人脸识别和聚类的统一嵌入，2015。

该模型是通过三元组损失函数训练的深度卷积神经网络，鼓励相同身份的向量变得更相似(更小的距离)，而不同身份的向量预期变得不太相似(更大的距离)。专注于训练模型来直接创建嵌入(而不是从模型的中间层提取它们)是这项工作的一个重要创新。

我们的方法使用深度卷积网络来直接优化嵌入本身，而不是像以前的深度学习方法那样使用中间瓶颈层。

——Face net:人脸识别和聚类的统一嵌入，2015。

这些人脸嵌入随后被用作在标准人脸识别基准数据集上训练分类器系统的基础，获得当时最先进的结果。

与最佳公布结果相比，我们的系统将错误率降低了 30% …

——Face net:人脸识别和聚类的统一嵌入，2015。

论文还探索了嵌入的其他用途，例如根据提取的特征对相似人脸进行聚类。

这是一个健壮和有效的人脸识别系统，提取的人脸嵌入的一般性质使该方法具有广泛的应用。

如何在 Keras 中加载 FaceNet 模型

有许多项目提供工具来训练基于 FaceNet 的模型，并利用预先训练的模型。

也许最突出的是名为的 OpenFace ，它提供了使用 PyTorch 深度学习框架构建和训练的 FaceNet 模型。Keras 有一个 OpenFace 的端口，叫做 Keras OpenFace ，但是在撰写本文的时候，模型似乎需要 Python 2，这是相当有局限性的。

另一个著名的项目是由大卫·桑德伯格开发的 FaceNet，它提供了使用 TensorFlow 构建和训练的 FaceNet 模型。该项目看起来很成熟，尽管在撰写本文时并未提供基于库的安装，也没有干净的 API。有用的是，大卫的项目提供了许多高表现的预先训练的 FaceNet 模型，并且有许多项目移植或转换这些模型用于 Keras。

一个显著的例子是谷井弘的 Kerasface net。他的项目提供了一个脚本，用于将 Inception ResNet v1 模型从 TensorFlow 转换为 Keras。他还提供了一个预先训练好的 Keras 模型以备使用。

我们将在本教程中使用谷井博树提供的预先训练好的 Keras FaceNet 模型。它在 MS-Celeb-1M 数据集上训练，期望输入图像是彩色的，像素值变白(在所有三个通道上标准化)，并且具有 160×160 像素的正方形形状。

模型可从这里下载:

Keras FaceNet 预训练模型(88 兆字节)

下载模型文件，并将其放入当前工作目录中，文件名为“ facenet_keras.h5 ”。

我们可以使用 load_model() 函数直接在 Keras 中加载模型；例如:

# example of loading the keras facenet model
from keras.models import load_model
# load the model
model = load_model('facenet_keras.h5')
# summarize input and output shape
print(model.inputs)
print(model.outputs)

运行该示例加载模型并打印输入和输出张量的形状。

我们可以看到，该模型确实期望以形状为 160×160 的正方形彩色图像作为输入，并且将输出作为 128 元素向量的面部嵌入。

# [<tf.Tensor 'input_1:0' shape=(?, 160, 160, 3) dtype=float32>]
# [<tf.Tensor 'Bottleneck_BatchNorm/cond/Merge:0' shape=(?, 128) dtype=float32>]

现在我们有了一个 FaceNet 模型，我们可以探索使用它。

如何检测人脸进行人脸识别

在我们能够进行人脸识别之前，我们需要检测人脸。

人脸检测是在照片中自动定位人脸，并通过在人脸范围周围画一个边界框来定位人脸的过程。

在本教程中，我们还将使用多任务级联卷积神经网络(MTCNN)进行人脸检测，例如从照片中查找和提取人脸。这是一个最先进的人脸检测深度学习模型，在 2016 年发表的题为“使用多任务级联卷积网络的联合人脸检测和对齐”的论文中有所描述

我们将在 ipazc/mtcnn 项目中使用 Iván de Paz Centeno 提供的实现。这也可以通过 pip 安装，如下所示:

sudo pip install mtcnn

我们可以通过导入库并打印版本来确认库安装正确；例如:

# confirm mtcnn was installed correctly
import mtcnn
# print version
print(mtcnn.__version__)

运行该示例将打印库的当前版本。

0.1.0

我们可以使用 mtcnn 库来创建一个人脸检测器，并提取人脸，以便在后续部分中与 FaceNet 人脸检测器模型一起使用。

第一步是将图像加载为 NumPy 数组，我们可以使用 PIL 库和 open() 函数来实现。我们还会将图像转换为 RGB，以防图像有 alpha 通道或者是黑白的。

# load image from file
image = Image.open(filename)
# convert to RGB, if needed
image = image.convert('RGB')
# convert to array
pixels = asarray(image)

接下来，我们可以创建一个 MTCNN 人脸检测器类，并使用它来检测加载照片中的所有人脸。

# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
results = detector.detect_faces(pixels)

结果是一个边界框列表，其中每个边界框定义了边界框的左下角，以及宽度和高度。

如果我们假设照片中只有一张脸用于实验，我们可以如下确定边界框的像素坐标。有时候库会返回负像素索引，我觉得这是一个 bug。我们可以通过取坐标的绝对值来解决这个问题。

# extract the bounding box from the first face
x1, y1, width, height = results[0]['box']
# bug fix
x1, y1 = abs(x1), abs(y1)
x2, y2 = x1 + width, y1 + height

我们可以用这些坐标提取人脸。

# extract the face
face = pixels[y1:y2, x1:x2]

然后，我们可以使用 PIL 图书馆来调整这个小图像的脸所需的大小；具体来说，该模型期望形状为 160×160 的正方形输入面。

# resize pixels to the model size
image = Image.fromarray(face)
image = image.resize((160, 160))
face_array = asarray(image)

将所有这些结合在一起，函数 extract_face() 将从加载的文件名中加载一张照片，并返回提取的人脸。它假设照片包含一张脸，并将返回检测到的第一张脸。

# function for face detection with mtcnn
from PIL import Image
from numpy import asarray
from mtcnn.mtcnn import MTCNN

# extract a single face from a given photograph
def extract_face(filename, required_size=(160, 160)):
	# load image from file
	image = Image.open(filename)
	# convert to RGB, if needed
	image = image.convert('RGB')
	# convert to array
	pixels = asarray(image)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	# bug fix
	x1, y1 = abs(x1), abs(y1)
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

# load the photo and extract the face
pixels = extract_face('...')

我们可以在下一节中根据需要使用这个函数来提取人脸，这些人脸可以作为输入提供给 FaceNet 模型。

如何开发人脸分类系统

在本节中，我们将开发一个人脸检测系统来预测给定人脸的身份。

该模型将使用“ 5 名人脸数据集”进行训练和测试，该数据集包含五位不同名人的许多照片。

我们将使用 MTCNN 模型进行人脸检测，FaceNet 模型将用于为每个检测到的人脸创建人脸嵌入，然后我们将开发线性支持向量机(SVM) 分类器模型来预测给定人脸的身份。

5 个名人面孔数据集

5 名人脸数据集是一个包含名人照片的小数据集。

其中包括本·阿弗莱克埃尔顿·约翰杰瑞·宋飞麦当娜和敏迪·卡灵的照片。

数据集由丹·贝克尔准备并提供给在卡格尔上免费下载。注意，下载数据集需要一个 Kaggle 帐户。

5 名人脸数据集，卡格尔。

下载数据集(这可能需要 Kaggle 登录)、data.zip (2.5 兆字节)，并将其解压缩到本地目录中，文件夹名称为“5-名人脸-数据集”。

现在，您应该有一个具有以下结构的目录(注意，有些目录名存在拼写错误，在本例中它们保持原样):

5-celebrity-faces-dataset
├── train
│   ├── ben_afflek
│   ├── elton_john
│   ├── jerry_seinfeld
│   ├── madonna
│   └── mindy_kaling
└── val
    ├── ben_afflek
    ├── elton_john
    ├── jerry_seinfeld
    ├── madonna
    └── mindy_kaling

我们可以看到有一个训练数据集和一个验证或测试数据集。

查看目录中的一些照片，我们可以看到这些照片提供了各种方向、照明和各种大小的人脸。重要的是，每张照片都包含了这个人的一张脸。

我们将使用这个数据集作为我们的分类器的基础，只在“ train ”数据集上进行训练，并在“ val 数据集上对人脸进行分类。你可以用同样的结构用你自己的照片开发一个分类器。

检测人脸

第一步是检测每张照片中的人脸，并将数据集简化为一系列人脸。

让我们测试一下上一节定义的人脸检测器功能，具体是 extract_face() 。

在“5-名人脸-数据集/train/Ben _ a fleek/”目录中查看，可以看到训练数据集中有 14 张 Ben Affleck 的照片。我们可以检测每张照片中的人脸，并创建一个包含 14 张人脸的图，每行有两行 7 张图像。

下面列出了完整的示例。

# demonstrate face detection on 5 Celebrity Faces Dataset
from os import listdir
from PIL import Image
from numpy import asarray
from matplotlib import pyplot
from mtcnn.mtcnn import MTCNN

# extract a single face from a given photograph
def extract_face(filename, required_size=(160, 160)):
	# load image from file
	image = Image.open(filename)
	# convert to RGB, if needed
	image = image.convert('RGB')
	# convert to array
	pixels = asarray(image)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	# bug fix
	x1, y1 = abs(x1), abs(y1)
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

# specify folder to plot
folder = '5-celebrity-faces-dataset/train/ben_afflek/'
i = 1
# enumerate files
for filename in listdir(folder):
	# path
	path = folder + filename
	# get face
	face = extract_face(path)
	print(i, face.shape)
	# plot
	pyplot.subplot(2, 7, i)
	pyplot.axis('off')
	pyplot.imshow(face)
	i += 1
pyplot.show()

运行该示例需要一点时间，并报告一路上每个加载照片的进度以及包含面部像素数据的 NumPy 数组的形状。

1 (160, 160, 3)
2 (160, 160, 3)
3 (160, 160, 3)
4 (160, 160, 3)
5 (160, 160, 3)
6 (160, 160, 3)
7 (160, 160, 3)
8 (160, 160, 3)
9 (160, 160, 3)
10 (160, 160, 3)
11 (160, 160, 3)
12 (160, 160, 3)
13 (160, 160, 3)
14 (160, 160, 3)

将创建一个包含在本·阿弗莱克目录中检测到的人脸的图形。

我们可以看到，每个面部都被正确检测到，并且在检测到的面部中，我们有一系列的照明、肤色和方向。

Plot of 14 Faces of Ben Affleck Detected From the Training Dataset of the 5 Celebrity Faces Dataset

从 5 张名人脸数据集的训练数据集中检测到的本·阿弗莱克的 14 张脸的图

目前为止，一切顺利。

接下来，我们可以扩展这个示例，为给定的数据集遍历每个子目录(例如“ train ”或“ val ”)，提取人脸，并为每个检测到的人脸准备一个名称作为输出标签的数据集。

下面的 load_faces() 函数会将所有的人脸加载到给定目录的列表中，例如“5-名人脸-数据集/火车/Ben _ a fleek/”。

# load images and extract faces for all images in a directory
def load_faces(directory):
	faces = list()
	# enumerate files
	for filename in listdir(directory):
		# path
		path = directory + filename
		# get face
		face = extract_face(path)
		# store
		faces.append(face)
	return faces

我们可以为“列车或“ val 文件夹中的每个子目录调用 load_faces() 函数。每张脸都有一个标签，名人的名字，我们可以从目录名中取。

下面的 load_dataset() 函数取一个目录名，如“5-名人脸-数据集/火车/ ”，为每个子目录(名人)检测人脸，为每个检测到的人脸分配标签。

它将数据集的 X 和 y 元素作为 NumPy 数组返回。

# load a dataset that contains one subdir for each class that in turn contains images
def load_dataset(directory):
	X, y = list(), list()
	# enumerate folders, on per class
	for subdir in listdir(directory):
		# path
		path = directory + subdir + '/'
		# skip any files that might be in the dir
		if not isdir(path):
			continue
		# load all faces in the subdirectory
		faces = load_faces(path)
		# create labels
		labels = [subdir for _ in range(len(faces))]
		# summarize progress
		print('>loaded %d examples for class: %s' % (len(faces), subdir))
		# store
		X.extend(faces)
		y.extend(labels)
	return asarray(X), asarray(y)

然后，我们可以为“train”和“val”文件夹调用该函数来加载所有数据，然后通过 savez_compressed()函数将结果保存在一个压缩的 NumPy 数组文件中。

# load train dataset
trainX, trainy = load_dataset('5-celebrity-faces-dataset/train/')
print(trainX.shape, trainy.shape)
# load test dataset
testX, testy = load_dataset('5-celebrity-faces-dataset/val/')
print(testX.shape, testy.shape)
# save arrays to one file in compressed format
savez_compressed('5-celebrity-faces-dataset.npz', trainX, trainy, testX, testy)

将所有这些结合起来，下面列出了检测 5 张名人脸数据集中所有人脸的完整示例。

# face detection for the 5 Celebrity Faces Dataset
from os import listdir
from os.path import isdir
from PIL import Image
from matplotlib import pyplot
from numpy import savez_compressed
from numpy import asarray
from mtcnn.mtcnn import MTCNN

# extract a single face from a given photograph
def extract_face(filename, required_size=(160, 160)):
	# load image from file
	image = Image.open(filename)
	# convert to RGB, if needed
	image = image.convert('RGB')
	# convert to array
	pixels = asarray(image)
	# create the detector, using default weights
	detector = MTCNN()
	# detect faces in the image
	results = detector.detect_faces(pixels)
	# extract the bounding box from the first face
	x1, y1, width, height = results[0]['box']
	# bug fix
	x1, y1 = abs(x1), abs(y1)
	x2, y2 = x1 + width, y1 + height
	# extract the face
	face = pixels[y1:y2, x1:x2]
	# resize pixels to the model size
	image = Image.fromarray(face)
	image = image.resize(required_size)
	face_array = asarray(image)
	return face_array

# load images and extract faces for all images in a directory
def load_faces(directory):
	faces = list()
	# enumerate files
	for filename in listdir(directory):
		# path
		path = directory + filename
		# get face
		face = extract_face(path)
		# store
		faces.append(face)
	return faces

# load a dataset that contains one subdir for each class that in turn contains images
def load_dataset(directory):
	X, y = list(), list()
	# enumerate folders, on per class
	for subdir in listdir(directory):
		# path
		path = directory + subdir + '/'
		# skip any files that might be in the dir
		if not isdir(path):
			continue
		# load all faces in the subdirectory
		faces = load_faces(path)
		# create labels
		labels = [subdir for _ in range(len(faces))]
		# summarize progress
		print('>loaded %d examples for class: %s' % (len(faces), subdir))
		# store
		X.extend(faces)
		y.extend(labels)
	return asarray(X), asarray(y)

# load train dataset
trainX, trainy = load_dataset('5-celebrity-faces-dataset/train/')
print(trainX.shape, trainy.shape)
# load test dataset
testX, testy = load_dataset('5-celebrity-faces-dataset/val/')
# save arrays to one file in compressed format
savez_compressed('5-celebrity-faces-dataset.npz', trainX, trainy, testX, testy)

运行该示例可能需要一些时间。

首先加载“训练”数据集中的所有照片，然后提取人脸，得到 93 个输入为正方形人脸、输出为类别标签串的样本。然后加载“ val 数据集，提供 25 个样本作为测试数据集。

然后，这两个数据集都被保存到一个名为“5-名人脸-数据集. npz ”的压缩 NumPy 数组文件中，该文件约为 3 兆字节，存储在当前工作目录中。

>loaded 14 examples for class: ben_afflek
>loaded 19 examples for class: madonna
>loaded 17 examples for class: elton_john
>loaded 22 examples for class: mindy_kaling
>loaded 21 examples for class: jerry_seinfeld
(93, 160, 160, 3) (93,)
>loaded 5 examples for class: ben_afflek
>loaded 5 examples for class: madonna
>loaded 5 examples for class: elton_john
>loaded 5 examples for class: mindy_kaling
>loaded 5 examples for class: jerry_seinfeld
(25, 160, 160, 3) (25,)

该数据集已准备好提供给人脸检测模型。

创建面嵌入

下一步是创建人脸嵌入。

人脸嵌入是表示从人脸中提取的特征的向量。然后可以将其与为其他面生成的矢量进行比较。例如，另一个靠近的向量(以某种度量)可能是同一个人，而另一个远的向量(以某种度量)可能是不同的人。

我们要开发的分类器模型将人脸嵌入作为输入，并预测人脸的身份。FaceNet 模型将为给定的人脸图像生成这种嵌入。

FaceNet 模型可以用作分类器本身的一部分，或者我们可以使用 FaceNet 模型对人脸进行预处理，以创建人脸嵌入，该嵌入可以被存储并用作我们的分类器模型的输入。后一种方法是首选的，因为 FaceNet 模型既大又慢，无法创建人脸嵌入。

因此，我们可以预先计算列车中所有人脸的人脸嵌入，并在我们的 5 个名人人脸数据集中测试(形式上为' val '集合)。

首先，我们可以使用 load() NumPy 函数加载我们检测到的人脸数据集。

# load the face dataset
data = load('5-celebrity-faces-dataset.npz')
trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
print('Loaded: ', trainX.shape, trainy.shape, testX.shape, testy.shape)

接下来，我们可以加载我们的 FaceNet 模型，准备将面转换为面嵌入。

# load the facenet model
model = load_model('facenet_keras.h5')
print('Loaded Model')

然后，我们可以枚举训练和测试数据集中的每个人脸来预测嵌入。

为了预测嵌入，首先需要适当地准备图像的像素值，以满足 FaceNet 模型的期望。FaceNet 模型的这个具体实现期望像素值是标准化的。

# scale pixel values
face_pixels = face_pixels.astype('float32')
# standardize pixel values across channels (global)
mean, std = face_pixels.mean(), face_pixels.std()
face_pixels = (face_pixels - mean) / std

为了对 Keras 中的一个示例进行预测，我们必须扩展维度，以便人脸数组是一个样本。

# transform face into one sample
samples = expand_dims(face_pixels, axis=0)

然后，我们可以使用该模型进行预测并提取嵌入向量。

# make prediction to get embedding
yhat = model.predict(samples)
# get embedding
embedding = yhat[0]

下面定义的 get_embedding() 函数实现了这些行为，并将返回给定一张人脸图像和加载的 FaceNet 模型的人脸嵌入。

# get the face embedding for one face
def get_embedding(model, face_pixels):
	# scale pixel values
	face_pixels = face_pixels.astype('float32')
	# standardize pixel values across channels (global)
	mean, std = face_pixels.mean(), face_pixels.std()
	face_pixels = (face_pixels - mean) / std
	# transform face into one sample
	samples = expand_dims(face_pixels, axis=0)
	# make prediction to get embedding
	yhat = model.predict(samples)
	return yhat[0]

将所有这些结合在一起，下面列出了将每个人脸转换为嵌入在训练和测试数据集中的人脸的完整示例。

# calculate a face embedding for each face in the dataset using facenet
from numpy import load
from numpy import expand_dims
from numpy import asarray
from numpy import savez_compressed
from keras.models import load_model

# get the face embedding for one face
def get_embedding(model, face_pixels):
	# scale pixel values
	face_pixels = face_pixels.astype('float32')
	# standardize pixel values across channels (global)
	mean, std = face_pixels.mean(), face_pixels.std()
	face_pixels = (face_pixels - mean) / std
	# transform face into one sample
	samples = expand_dims(face_pixels, axis=0)
	# make prediction to get embedding
	yhat = model.predict(samples)
	return yhat[0]

# load the face dataset
data = load('5-celebrity-faces-dataset.npz')
trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
print('Loaded: ', trainX.shape, trainy.shape, testX.shape, testy.shape)
# load the facenet model
model = load_model('facenet_keras.h5')
print('Loaded Model')
# convert each face in the train set to an embedding
newTrainX = list()
for face_pixels in trainX:
	embedding = get_embedding(model, face_pixels)
	newTrainX.append(embedding)
newTrainX = asarray(newTrainX)
print(newTrainX.shape)
# convert each face in the test set to an embedding
newTestX = list()
for face_pixels in testX:
	embedding = get_embedding(model, face_pixels)
	newTestX.append(embedding)
newTestX = asarray(newTestX)
print(newTestX.shape)
# save arrays to one file in compressed format
savez_compressed('5-celebrity-faces-embeddings.npz', newTrainX, trainy, newTestX, testy)

运行该示例会报告一路上的进度。

我们可以看到，人脸数据集加载正确，模型也是如此。然后将训练数据集转换成 93 个人脸嵌入，每个嵌入由 128 个元素向量组成。测试数据集中的 25 个示例也被适当地转换为面嵌入。

然后将结果数据集保存到一个压缩的 NumPy 数组中，该数组约为 50 千字节，在当前工作目录中的名称为“5-名人脸-嵌入. npz ”。

Loaded:  (93, 160, 160, 3) (93,) (25, 160, 160, 3) (25,)
Loaded Model
(93, 128)
(25, 128)

我们现在准备开发我们的人脸分类系统。

执行人脸分类

在本节中，我们将开发一个模型，将人脸嵌入分类为 5 个名人人脸数据集中已知的名人之一。

首先，我们必须加载人脸嵌入数据集。

# load dataset
data = load('5-celebrity-faces-embeddings.npz')
trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
print('Dataset: train=%d, test=%d' % (trainX.shape[0], testX.shape[0]))

接下来，数据需要在建模之前做一些小的准备。

首先，对人脸嵌入向量进行归一化是一种很好的做法。这是一个很好的实践，因为向量经常使用距离度量来相互比较。

在本文中，向量归一化意味着缩放值，直到向量的长度或大小为 1 或单位长度。这可以使用 Sklearn 中的规格化器类来实现。在前一步中创建面嵌入时，执行这一步可能会更方便。

# normalize input vectors
in_encoder = Normalizer(norm='l2')
trainX = in_encoder.transform(trainX)
testX = in_encoder.transform(testX)

接下来，每个名人名字的字符串目标变量需要转换成整数。

这可以通过 Sklearn 中的标签编码器类来实现。

# label encode targets
out_encoder = LabelEncoder()
out_encoder.fit(trainy)
trainy = out_encoder.transform(trainy)
testy = out_encoder.transform(testy)

接下来，我们可以拟合一个模型。

在处理标准化人脸嵌入输入时，通常使用线性支持向量机(SVM) 。这是因为该方法在分离人脸嵌入向量方面非常有效。我们可以使用 Sklearn 中的 SVC 类将线性 SVM 拟合到训练数据，并将“内核属性设置为“线性”。我们以后做预测的时候可能也要概率，可以通过设置“概率”为“真”来配置。

# fit model
model = SVC(kernel='linear')
model.fit(trainX, trainy)

接下来，我们可以对模型进行评估。

这可以通过使用拟合模型对训练和测试数据集中的每个示例进行预测，然后计算分类准确率来实现。

# predict
yhat_train = model.predict(trainX)
yhat_test = model.predict(testX)
# score
score_train = accuracy_score(trainy, yhat_train)
score_test = accuracy_score(testy, yhat_test)
# summarize
print('Accuracy: train=%.3f, test=%.3f' % (score_train*100, score_test*100))

将所有这些结合在一起，下面列出了在 5 个名人脸数据集的脸嵌入上拟合线性 SVM 的完整示例。

# develop a classifier for the 5 Celebrity Faces Dataset
from numpy import load
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import Normalizer
from sklearn.svm import SVC
# load dataset
data = load('5-celebrity-faces-embeddings.npz')
trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
print('Dataset: train=%d, test=%d' % (trainX.shape[0], testX.shape[0]))
# normalize input vectors
in_encoder = Normalizer(norm='l2')
trainX = in_encoder.transform(trainX)
testX = in_encoder.transform(testX)
# label encode targets
out_encoder = LabelEncoder()
out_encoder.fit(trainy)
trainy = out_encoder.transform(trainy)
testy = out_encoder.transform(testy)
# fit model
model = SVC(kernel='linear', probability=True)
model.fit(trainX, trainy)
# predict
yhat_train = model.predict(trainX)
yhat_test = model.predict(testX)
# score
score_train = accuracy_score(trainy, yhat_train)
score_test = accuracy_score(testy, yhat_test)
# summarize
print('Accuracy: train=%.3f, test=%.3f' % (score_train*100, score_test*100))

运行该示例首先确认训练和测试数据集中的样本数量与我们预期的一样

接下来，在训练和测试数据集上对模型进行评估，显示出完美的分类准确率。考虑到数据集的大小以及所使用的人脸检测和人脸识别模型的能力，这并不奇怪。

Dataset: train=93, test=25
Accuracy: train=100.000, test=100.000

我们可以通过绘制原始人脸和预测来使它更有趣。

首先，我们需要加载人脸数据集，特别是测试数据集中的人脸。我们还可以加载原始照片，使其更加有趣。

# load faces
data = load('5-celebrity-faces-dataset.npz')
testX_faces = data['arr_2']

在我们适应模型之前，示例的其余部分都是一样的。

首先，我们需要从测试集中选择一个随机的例子，然后得到嵌入、人脸像素、期望的类预测以及类的对应名称。

# test model on a random example from the test dataset
selection = choice([i for i in range(testX.shape[0])])
random_face_pixels = testX_faces[selection]
random_face_emb = testX[selection]
random_face_class = testy[selection]
random_face_name = out_encoder.inverse_transform([random_face_class])

接下来，我们可以使用人脸嵌入作为输入，用拟合模型进行单次预测。

我们可以预测类整数和预测的概率。

# prediction for the face
samples = expand_dims(random_face_emb, axis=0)
yhat_class = model.predict(samples)
yhat_prob = model.predict_proba(samples)

然后我们可以得到预测的类整数的名称，以及这个预测的概率。

# get name
class_index = yhat_class[0]
class_probability = yhat_prob[0,class_index] * 100
predict_names = out_encoder.inverse_transform(yhat_class)

然后我们可以打印这些信息。

print('Predicted: %s (%.3f)' % (predict_names[0], class_probability))
print('Expected: %s' % random_face_name[0])

我们还可以绘制人脸像素以及预测的名称和概率。

# plot for fun
pyplot.imshow(random_face_pixels)
title = '%s (%.3f)' % (predict_names[0], class_probability)
pyplot.title(title)
pyplot.show()

将所有这些联系在一起，下面列出了预测测试数据集中给定未见过照片的身份的完整示例。

# develop a classifier for the 5 Celebrity Faces Dataset
from random import choice
from numpy import load
from numpy import expand_dims
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import Normalizer
from sklearn.svm import SVC
from matplotlib import pyplot
# load faces
data = load('5-celebrity-faces-dataset.npz')
testX_faces = data['arr_2']
# load face embeddings
data = load('5-celebrity-faces-embeddings.npz')
trainX, trainy, testX, testy = data['arr_0'], data['arr_1'], data['arr_2'], data['arr_3']
# normalize input vectors
in_encoder = Normalizer(norm='l2')
trainX = in_encoder.transform(trainX)
testX = in_encoder.transform(testX)
# label encode targets
out_encoder = LabelEncoder()
out_encoder.fit(trainy)
trainy = out_encoder.transform(trainy)
testy = out_encoder.transform(testy)
# fit model
model = SVC(kernel='linear', probability=True)
model.fit(trainX, trainy)
# test model on a random example from the test dataset
selection = choice([i for i in range(testX.shape[0])])
random_face_pixels = testX_faces[selection]
random_face_emb = testX[selection]
random_face_class = testy[selection]
random_face_name = out_encoder.inverse_transform([random_face_class])
# prediction for the face
samples = expand_dims(random_face_emb, axis=0)
yhat_class = model.predict(samples)
yhat_prob = model.predict_proba(samples)
# get name
class_index = yhat_class[0]
class_probability = yhat_prob[0,class_index] * 100
predict_names = out_encoder.inverse_transform(yhat_class)
print('Predicted: %s (%.3f)' % (predict_names[0], class_probability))
print('Expected: %s' % random_face_name[0])
# plot for fun
pyplot.imshow(random_face_pixels)
title = '%s (%.3f)' % (predict_names[0], class_probability)
pyplot.title(title)
pyplot.show()

每次运行代码时，都会从测试数据集中选择一个不同的随机示例。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，选择并正确预测了杰瑞·宋飞的照片。

Predicted: jerry_seinfeld (88.476)
Expected: jerry_seinfeld

还会创建所选面部的绘图，在图像标题中显示预测的名称和概率。

Detected Face of Jerry Seinfeld, Correctly Identified by the SVM Classifier

SVM 分类器正确识别的杰瑞·宋飞人脸

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

FaceNet:人脸识别和聚类的统一嵌入，2015。

书

人脸识别手册， 2011。

项目

蜜蜂

摘要

在本教程中，您发现了如何使用 FaceNet 和 SVM 分类器开发人脸检测系统，以从照片中识别人。

具体来说，您了解到:

关于由谷歌开发的 FaceNet 人脸识别系统以及开源实现和预训练模型。
如何准备人脸检测数据集包括首先通过人脸检测系统提取人脸，然后通过人脸嵌入提取人脸特征。
如何拟合，评估和演示一个 SVM 模型来预测人脸嵌入的身份。

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何通过深度学习开发计算机视觉能力

原文：machinelearningmastery.com/how-to-deve…

最后更新于 2019 年 7 月 5 日

计算机视觉可能是受深度学习发展影响最大的一个领域。

对于计算机视觉领域的问题，通过深度学习来发展和展示能力可能是困难的。目前还不清楚如何开始，最重要的技术是什么，以及哪些类型的问题和项目可以最好地突出深度学习可以给该领域带来的价值。

一种方法是系统地开发，同时展示数据处理、建模技术和应用领域的能力，并在公开的已完成项目组合中展示您的成果。这种方法可以让你从一个项目到另一个项目复合你的技能。它还为真实的项目提供了基础，可以向潜在的雇主展示和讨论这些项目，以展示你的能力。

在这篇文章中，你将发现如何发展和展示应用于计算机视觉问题的深度学习能力。

看完这篇文章，你会知道:

开发一个已完成的小项目组合既可以用于未来的新项目，也可以通过对计算机视觉项目的深入学习来展示你的能力。
项目可以保持较小的范围，尽管它们仍然可以展示解决问题的系统方法和熟练模型的开发。
可以遵循三级能力框架，包括数据处理能力、技术能力和应用能力。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

How to Develop and Demonstrate Competence With Deep Learning for Computer Vision

如何通过深度学习培养和展示计算机视觉能力图片由 Sole Perez 提供，保留部分权利。

概观

本教程分为三个部分；它们是:

计算机视觉的深度学习
开发小型项目组合
计算机视觉能力框架的深度学习

计算机视觉的深度学习

也许受深度学习发展影响最大的一个领域是计算机视觉。

计算机视觉是人工智能的一个子领域，关注理解图像中的数据，如照片和视频。

计算机视觉任务，如识别照片中的手写数字和对象，是一些早期的案例研究，展示了现代深度学习技术获得最先进结果的能力。

作为一名从业者，你可能希望通过深入学习计算机视觉来发展和展示你的技能。

这确实假设了一些事情，例如:

您熟悉应用机器学习，这意味着您能够端到端地完成预测建模项目，并交付一个熟练的模型。
你熟悉深度学习技术，这意味着你知道主要方法之间的区别以及何时使用它们。

这并不意味着你是专家，只是你有工作知识，能够系统地解决问题。

作为一个机器学习甚至深度学习的实践者，你如何在计算机视觉应用方面表现出能力？

开发小型项目组合

计算机视觉的深度学习能力可以通过基于项目的方法来培养和展示。

具体来说，这些技能可以通过完成和展示使用计算机视觉问题深度学习技术的小项目来逐步建立和展示。

这需要你开发一个完整的项目组合。投资组合在两个具体方面为您提供帮助:

技能发展:你可以在未来的项目中利用投资组合中项目的代码和发现，加快你的进度，让你承担更大、更具挑战性的项目。
技能展示:项目的公开展示展示了你的能力，为与潜在雇主讨论 API、模型选择和设计决策提供了基础。

项目可以集中在标准和公开可用的计算机视觉数据集上，例如由学者开发和主持的数据集或用于机器学习竞赛的数据集。

项目可以以系统的方式完成，包括明确的问题定义、相关文献和模型的回顾、模型开发和调整，以及以报告、笔记本甚至幻灯片演示格式呈现结果和发现等方面。

项目很小，这意味着它们可以在一个工作日内完成，可能会在几个晚上和周末完成。这一点很重要，因为它将项目的范围限制在工作流和交付熟练的结果上，而不是开发最先进的结果。

计算机视觉能力框架的深度学习

可以根据挑战性或复杂性以及杠杆作用或技能发展来仔细选择项目。

以下是开发和展示计算机视觉深度学习能力的三级框架，面向已经熟悉应用机器学习基础和深度学习基础的从业者:

一级:数据处理能力。你知道如何加载和处理图像数据。
二级:技术能力。你知道如何定义、拟合和调整卷积神经网络。
三级:应用能力。你可以为常见的计算机视觉问题开发熟练的深度学习模型。

第 1 级:数据处理能力

数据处理能力是指加载和转换数据的能力。

这包括基本的数据输入/输出操作，如加载和保存图像或视频数据。

最重要的是，它包括使用标准的应用编程接口来操作图像数据，这在使用深度学习神经网络准备数据进行建模时可能是有用的。

例子包括:

图像尺寸调整和插值。
图像模糊和锐化。
图像仿射变换。
图像白化和阈值化。

数据处理可以用许多图像处理 API 中的一种来演示，例如:

它可能包括机器学习和深度学习库的基本数据处理能力，例如:

Python 中你最喜欢的图像处理 API 有哪些？在下面的评论里告诉我。

第二级:技术能力

技术能力是指使用用于计算机视觉问题的特定深度学习模型和方法的能力。

这从高层次上包括三大类方法:

多层感知器。
卷积神经网络。
递归神经网络，如长短期记忆网络，或 LSTM。

更具体地说，这需要展示如何配置和充分利用 CNN 中使用的层的强大技能，例如:

卷积层。
池化层。
使用层的模式。

这也可能包括一些有效模型的一般类别的技能，例如:

ImageNet CNNs 如 AlexNet、VGG、ResNet、Inception 等。
CNN-lsms、LSM-CNNs 等。
R-CNNs、YOLO 等。

你最喜欢的计算机视觉深度学习技术是什么？在下面的评论里告诉我。

第三级:应用能力

应用能力指的是解决特定计算机视觉问题并使用深度学习方法来提供熟练模型的能力。

一个熟练的模型意味着一个能够做出比简单的基线方法表现更好的预测的模型。这并不意味着获得最先进的结果，并在论文中复制一个模型和结果，尽管如果它们在一个小项目的范围内，它们是很好的项目想法。

项目应该系统地完成，包括以下大部分(如果不是全部)步骤:

问题描述。描述预测建模问题，包括领域和相关背景。
文献综述。描述使用深度学习方法解决问题的标准或通用方法，如开创性和/或最近的研究论文中所述。
汇总数据。描述可用的数据，包括统计摘要和数据可视化。
评估模型。抽查一套模型类型、配置、数据准备方案等，以缩小问题的范围。
提升表现。提高一个或多个模型的表现，这些模型可以很好地配合超参数调整和集成方法。
显示结果。介绍项目的发现。

在这个过程之前的第一步，第 0 步，可能是选择一个适合项目的公共数据集。

计算机视觉深度学习的支柱是图像分类，通常称为图像识别或对象检测。这包括预测给定图像(通常是照片)的类别标签。

这类问题应该是重点。

这种类型的两个标准计算机视觉数据集包括:

对手写数字进行分类(例如 MNIST 和 SVHN )。
对对象的照片进行分类(例如 CIFAR-10 和 CIFAR-100)。
人脸照片分类(如 VGGFace2 )

相关的计算机视觉任务是识别照片中一个或多个对象的位置，也称为对象识别或对象定位或分割。

对象识别和定位(例如 COCO )

还有一些任务涉及计算机视觉和自然语言处理的混合，例如:

照片字幕(例如 Flickr8k )

最后，可以使用现有标准数据集或照片目录的操作来执行计算机视觉任务，例如:

照片上色。
照片重建。
照片超分辨率。
照片合成(例如深度假像)。

你最喜欢的计算机视觉深度学习应用是什么？在下面的评论里告诉我。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

蜜蜂

资料组

文章

摘要

在这篇文章中，你发现了如何开发和展示应用于计算机视觉问题的深度学习能力。

具体来说，您了解到:

开发一个已完成的小项目组合既可以用于未来的新项目，也可以通过对计算机视觉项目的深入学习来展示你的能力。
项目可以保持较小的范围，尽管它们仍然可以展示解决问题的系统方法和熟练模型的开发。
可以遵循三级能力框架，包括数据处理能力、技术能力和应用能力。

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何用 CNN 评估图像分类的像素缩放方法

原文：machinelearningmastery.com/how-to-eval…

最后更新于 2020 年 8 月 28 日

在图像分类任务中，必须先准备好图像数据，然后才能将其用作建模的基础。

准备图像数据的一个方面是缩放像素值，例如将值归一化到范围 0-1、居中、标准化等等。

如何为你的图像分类或者计算机视觉建模任务选择一个好的，甚至是最好的像素缩放方法？

在本教程中，您将发现如何使用深度学习方法选择像素缩放方法进行图像分类。

完成本教程后，您将知道:

使用特定数据集上的实验和经验结果选择像素缩放方法的过程。
如何实现标准的像素缩放方法，为建模准备图像数据。
如何通过案例研究为标准图像分类问题选择像素缩放方法。

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

How to Evaluate Pixel Scaling Methods for Image Classification With Convolutional Neural Networks

教程概述

本教程分为 6 个部分；它们是:

选择像素缩放方法的步骤
选择数据集:MNIST 图像分类
选择模型:卷积神经网络
选择像素缩放方法
运行实验
分析结果

选择像素缩放方法的步骤

给定一个新的图像分类任务，应该使用什么样的像素缩放方法？

这个问题有很多种回答方式；例如:

使用据报道用于研究论文中类似问题的技术。
使用博客文章、课程或书籍中的启发法。
使用你最喜欢的技巧。
使用最简单的技巧。
…

相反，我建议使用实验来发现什么最适合您的特定数据集。

这可以通过以下过程实现:

第一步:选择数据集。这可能是整个训练数据集，也可能是一个小子集。想法是快速完成实验并得出结果。
第二步:选择型号。设计一个熟练的模型，但不一定是解决问题的最佳模型。可能需要模型的一些并行原型。
第三步:选择像素缩放方法。列出 3-5 个评估你的问题的数据准备方案。
第四步:运行实验。以这样的方式运行实验，即结果是可靠的和有代表性的，理想的情况是将每个实验重复多次。
第五步:分析结果。从学习速度和重复实验的平均表现两方面比较方法。

实验方法将使用一个非优化的模型，也许还有一个训练数据的子集，这两者都可能给你必须做出的决定增加噪音。

因此，您正在寻找一个信号，表明您的图像的一个数据准备方案明显优于其他方案；如果数据集不是这种情况，那么应该使用最简单(计算复杂度最低)的技术，例如像素归一化。

高级像素缩放方法的清晰信号可以通过以下两种方式之一看到:

更快的学习。学习曲线清楚地表明，在给定的数据准备方案下，模型学习得更快。
准确率更高。在给定的数据准备方案下，平均模型表现明显显示出更好的准确性。

现在，我们有了一个为图像数据选择像素缩放方法的过程，让我们看一个例子。我们将使用适合美国有线电视新闻网的 MNIST 图像分类任务，并评估一系列标准像素缩放方法。

第一步。选择数据集:MNIST 图像分类

MNIST 问题，简称 MNIST，是一个由 70，000 幅手写数字图像组成的图像分类问题。

该问题的目标是将手写数字的给定图像分类为从 0 到 9 的整数。因此，这是一个多类图像分类问题。

它是评估机器学习和深度学习算法的标准数据集。数据集的最佳结果约为 99.79%准确，或误差率约为 0.21%(例如小于 1%)。

该数据集作为 Keras 库的一部分提供，可以通过调用Keras . datasets . mnist . load _ data()函数自动下载(如果需要)并加载到内存中。

该函数返回两个元组:一个用于训练输入和输出，一个用于测试输入和输出。例如:

# example of loading the MNIST dataset
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

我们可以加载 MNIST 数据集并对其进行汇总。

下面列出了完整的示例。

# load and summarize the MNIST dataset
from keras.datasets import mnist
# load dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# summarize dataset shape
print('Train', train_images.shape, train_labels.shape)
print('Test', (test_images.shape, test_labels.shape))
# summarize pixel values
print('Train', train_images.min(), train_images.max(), train_images.mean(), train_images.std())
print('Train', test_images.min(), test_images.max(), test_images.mean(), test_images.std())

运行该示例首先将数据集加载到内存中。然后报告训练和测试数据集的形状。

我们可以看到，所有图像都是 28×28 像素，只有一个灰度图像通道。训练数据集有 60，000 幅图像，测试数据集有 10，000 幅图像。

我们还可以看到，像素值是介于 0 和 255 之间的整数值，并且两个数据集之间像素值的平均值和标准差相似。

Train (60000, 28, 28) (60000,)
Test ((10000, 28, 28), (10000,))
Train 0 255 33.318421449829934 78.56748998339798
Train 0 255 33.791224489795916 79.17246322228644

数据集相对较小；我们将使用整个训练和测试数据集

现在我们已经熟悉了 MNIST 以及如何加载数据集，让我们回顾一下一些像素缩放方法。

第二步。选择模型:卷积神经网络

我们将使用卷积神经网络模型来评估不同的像素缩放方法。

预计美国有线电视新闻网在这个问题上表现非常好，尽管为这个实验选择的模型不一定表现好或最好。相反，它必须是熟练的(比随机的更好)，并且必须允许不同数据准备方案的影响在学习速度和/或模型表现方面有所不同。

因此，模型必须有足够的能力来学习问题。

我们将演示 MNIST 问题的基线模型。

首先，必须加载数据集，扩展训练和测试数据集的形状以添加通道维度，设置为 1，因为我们只有一个黑白通道。

# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = trainX.shape[1], trainX.shape[2], 1
trainX = trainX.reshape((trainX.shape[0], width, height, channels))
testX = testX.reshape((testX.shape[0], width, height, channels))

接下来，我们将对本例的像素值进行归一化，并对多类分类所需的目标值进行热编码。

# normalize pixel values
trainX = trainX.astype('float32') / 255
testX = testX.astype('float32') / 255
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)

该模型被定义为一个卷积层，后跟一个最大池层；这种组合再次重复，然后过滤器映射被展平，由完全连接的层解释，然后是输出层。

ReLU 激活功能用于隐藏层，softmax 激活功能用于输出层。指定足够的过滤器映射和节点，以提供足够的能力来学习问题。

# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(width, height, channels)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))

随机梯度下降的 Adam 变异用于寻找模型权重。使用分类交叉熵损失函数，这是多类分类所需要的，并且在训练期间监控分类准确率。

# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

该模型适用于五个训练时期，并且使用了 128 幅图像的大批量。

# fit model
model.fit(trainX, trainY, epochs=5, batch_size=128)

一旦拟合，就在测试数据集上评估模型。

# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print(acc)

下面列出了完整的例子，大约一分钟后就可以在中央处理器上轻松运行。

# baseline cnn model for the mnist problem
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
# load dataset
(trainX, trainY), (testX, testY) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = trainX.shape[1], trainX.shape[2], 1
trainX = trainX.reshape((trainX.shape[0], width, height, channels))
testX = testX.reshape((testX.shape[0], width, height, channels))
# normalize pixel values
trainX = trainX.astype('float32') / 255
testX = testX.astype('float32') / 255
# one hot encode target values
trainY = to_categorical(trainY)
testY = to_categorical(testY)
# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(width, height, channels)))
model.add(MaxPooling2D((2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dense(10, activation='softmax'))
# compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# fit model
model.fit(trainX, trainY, epochs=5, batch_size=128)
# evaluate model
_, acc = model.evaluate(testX, testY, verbose=0)
print(acc)

运行实例表明，该模型能够很好地快速学习问题。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

事实上，在这次运行中，模型在测试数据集上的表现是 99%，即 1%的错误率。这不是最先进的(通过设计)，但也离最先进的不远。

60000/60000 [==============================] - 13s 220us/step - loss: 0.2321 - acc: 0.9323
Epoch 2/5
60000/60000 [==============================] - 12s 204us/step - loss: 0.0628 - acc: 0.9810
Epoch 3/5
60000/60000 [==============================] - 13s 208us/step - loss: 0.0446 - acc: 0.9861
Epoch 4/5
60000/60000 [==============================] - 13s 209us/step - loss: 0.0340 - acc: 0.9895
Epoch 5/5
60000/60000 [==============================] - 12s 208us/step - loss: 0.0287 - acc: 0.9908
0.99

第三步。选择像素缩放方法

神经网络模型通常不能在原始像素值上训练，例如 0 到 255 范围内的像素值。

原因是网络使用输入的加权和，为了网络既稳定又有效地训练，权重应该保持较小。

相反，像素值必须在训练之前进行缩放。缩放像素值可能有三种主要方法；它们是:

归一化:像素值缩放到 0-1 的范围。
居中:从每个像素值中减去平均像素值，得到以零的平均值为中心的像素值分布。
标准化:像素值缩放为标准高斯，均值为零，标准差为 1。

传统上，使用 sigmoid 激活函数，求和为 0(零均值)的输入是首选。随着 ReLU 和类似激活功能的广泛采用，情况可能会也可能不会如此。

此外，在居中和标准化中，可以跨通道、图像、小批量或整个训练数据集计算平均值或均值和标准偏差。这可能会对可能评估的所选缩放方法增加额外的变化。

规范化通常是默认方法，因为我们可以假设像素值总是在 0-255 的范围内，这使得该过程实现起来非常简单有效。

尽管平均值可以按图像(全局)或通道(局部)计算，并且可以跨一批图像或整个训练数据集计算，但是居中通常被推荐为首选方法，因为它在许多流行的论文中被使用，并且论文中描述的过程通常并不确切指定使用了哪种变化。

我们将试验上面列出的三种方法，即标准化、居中和标准化。对中的平均值和标准化的平均值和标准偏差将在整个训练数据集中计算。

您可以探索的其他变化包括:

计算每个通道的统计数据(对于彩色图像)。
计算每个图像的统计数据。
计算每个批次的统计数据。
居中或标准化后的标准化。

以下示例实现了三种选定的像素缩放方法，并演示了它们对 MNIST 数据集的影响。

# demonstrate pixel scaling methods on mnist dataset
from keras.datasets import mnist

# normalize images
def prep_normalize(train, test):
	# convert from integers to floats
	train_norm = train.astype('float32')
	test_norm = test.astype('float32')
	# normalize to range 0-1
	train_norm = train_norm / 255.0
	test_norm = test_norm / 255.0
	# return normalized images
	return train_norm, test_norm

# center images
def prep_center(train, test):
	# convert from integers to floats
	train_cent = train.astype('float32')
	test_cent = test.astype('float32')
	# calculate statistics
	m = train_cent.mean()
	# center datasets
	train_cent = train_cent - m
	test_cent = test_cent - m
	# return normalized images
	return train_cent, test_cent

# standardize images
def prep_standardize(train, test):
	# convert from integers to floats
	train_stan = train.astype('float32')
	test_stan = test.astype('float32')
	# calculate statistics
	m = train_stan.mean()
	s = train_stan.std()
	# center datasets
	train_stan = (train_stan - m) / s
	test_stan = (test_stan - m) / s
	# return normalized images
	return train_stan, test_stan

# load dataset
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
# normalize
trainX, testX = prep_normalize(train_images, test_images)
print('normalization')
print('Train', trainX.min(), trainX.max(), trainX.mean(), trainX.std())
print('Test', testX.min(), testX.max(), testX.mean(), testX.std())
# center
trainX, testX = prep_center(train_images, test_images)
print('center')
print('Train', trainX.min(), trainX.max(), trainX.mean(), trainX.std())
print('Test', testX.min(), testX.max(), testX.mean(), testX.std())
# standardize
trainX, testX = prep_standardize(train_images, test_images)
print('standardize')
print('Train', trainX.min(), trainX.max(), trainX.mean(), trainX.std())
print('Test', testX.min(), testX.max(), testX.mean(), testX.std())

运行该示例首先标准化数据集，并报告训练和测试数据集的最小、最大、平均和标准偏差。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

然后，针对定心和标准化数据准备方案重复这一过程。结果提供了缩放程序确实被正确实现的证据。

normalization
Train 0.0 1.0 0.13066062 0.30810776
Test 0.0 1.0 0.13251467 0.31048027

center
Train -33.318447 221.68155 -1.9512918e-05 78.567444
Test -33.318447 221.68155 0.47278798 79.17245

standardize
Train -0.42407447 2.8215446 -3.4560264e-07 0.9999998
Test -0.42407447 2.8215446 0.0060174568 1.0077008

第四步。运行实验

既然我们已经定义了要评估的数据集、模型和数据准备方案，我们就可以定义和运行实验了。

每个模型在 CPU 上运行大约需要一分钟，所以我们不想实验花费太长时间。我们将评估三个数据准备方案中的每一个，每个方案将被评估 10 次，这意味着在现代硬件上完成实验大约需要 30 分钟。

我们可以定义一个函数，在需要时重新加载数据集。

# load train and test dataset
def load_dataset():
	# load dataset
	(trainX, trainY), (testX, testY) = mnist.load_data()
	# reshape dataset to have a single channel
	width, height, channels = trainX.shape[1], trainX.shape[2], 1
	trainX = trainX.reshape((trainX.shape[0], width, height, channels))
	testX = testX.reshape((testX.shape[0], width, height, channels))
	# one hot encode target values
	trainY = to_categorical(trainY)
	testY = to_categorical(testY)
	return trainX, trainY, testX, testY

我们还可以定义一个函数来定义和编译我们的模型，以适应问题。

# define cnn model
def define_model():
	model = Sequential()
	model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(width, height, channels)))
	model.add(MaxPooling2D((2, 2)))
	model.add(Conv2D(64, (3, 3), activation='relu'))
	model.add(MaxPooling2D((2, 2)))
	model.add(Flatten())
	model.add(Dense(64, activation='relu'))
	model.add(Dense(10, activation='softmax'))
	# compile model
	model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
	return model

我们已经有了为训练和测试数据集准备像素数据的功能。

最后，我们可以定义一个名为*repeat _ evaluation()*的函数，该函数以数据准备函数的名称调用来准备数据，并将加载数据集并重复定义模型、准备数据集、拟合和评估模型。它将返回一个准确性分数列表，可用于总结模型在所选数据准备方案下的表现。

# repeated evaluation of model with data prep scheme
def repeated_evaluation(datapre_func, n_repeats=10):
	# prepare data
	trainX, trainY, testX, testY = load_dataset()
	# repeated evaluation
	scores = list()
	for i in range(n_repeats):
		# define model
		model = define_model()
		# prepare data
		prep_trainX, prep_testX = datapre_func(trainX, testX)
		# fit model
		model.fit(prep_trainX, trainY, epochs=5, batch_size=64, verbose=0)
		# evaluate model
		_, acc = model.evaluate(prep_testX, testY, verbose=0)
		# store result
		scores.append(acc)
		print('> %d: %.3f' % (i, acc * 100.0))
	return scores

然后可以为三个数据准备方案中的每一个调用*repeat _ evaluation()*函数，并且可以报告方案下模型表现的均值和标准差。

我们还可以创建一个方框和触须图来总结和比较每个方案的准确性分数分布。

all_scores = list()
# normalization
scores = repeated_evaluation(prep_normalize)
print('Normalization: %.3f (%.3f)' % (mean(scores), std(scores)))
all_scores.append(scores)
# center
scores = repeated_evaluation(prep_center)
print('Centered: %.3f (%.3f)' % (mean(scores), std(scores)))
all_scores.append(scores)
# standardize
scores = repeated_evaluation(prep_standardize)
print('Standardized: %.3f (%.3f)' % (mean(scores), std(scores)))
all_scores.append(scores)
# box and whisker plots of results
pyplot.boxplot(all_scores, labels=['norm', 'cent', 'stan'])
pyplot.show()

将所有这些结合在一起，下面列出了运行实验来比较 MNIST 数据集上的像素缩放方法的完整示例。

# comparison of training-set based pixel scaling methods on MNIST
from numpy import mean
from numpy import std
from matplotlib import pyplot
from keras.datasets import mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten

# load train and test dataset
def load_dataset():
	# load dataset
	(trainX, trainY), (testX, testY) = mnist.load_data()
	# reshape dataset to have a single channel
	width, height, channels = trainX.shape[1], trainX.shape[2], 1
	trainX = trainX.reshape((trainX.shape[0], width, height, channels))
	testX = testX.reshape((testX.shape[0], width, height, channels))
	# one hot encode target values
	trainY = to_categorical(trainY)
	testY = to_categorical(testY)
	return trainX, trainY, testX, testY

# define cnn model
def define_model():
	model = Sequential()
	model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
	model.add(MaxPooling2D((2, 2)))
	model.add(Conv2D(64, (3, 3), activation='relu'))
	model.add(MaxPooling2D((2, 2)))
	model.add(Flatten())
	model.add(Dense(64, activation='relu'))
	model.add(Dense(10, activation='softmax'))
	# compile model
	model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
	return model

# normalize images
def prep_normalize(train, test):
	# convert from integers to floats
	train_norm = train.astype('float32')
	test_norm = test.astype('float32')
	# normalize to range 0-1
	train_norm = train_norm / 255.0
	test_norm = test_norm / 255.0
	# return normalized images
	return train_norm, test_norm

# center images
def prep_center(train, test):
	# convert from integers to floats
	train_cent = train.astype('float32')
	test_cent = test.astype('float32')
	# calculate statistics
	m = train_cent.mean()
	# center datasets
	train_cent = train_cent - m
	test_cent = test_cent - m
	# return normalized images
	return train_cent, test_cent

# standardize images
def prep_standardize(train, test):
	# convert from integers to floats
	train_stan = train.astype('float32')
	test_stan = test.astype('float32')
	# calculate statistics
	m = train_stan.mean()
	s = train_stan.std()
	# center datasets
	train_stan = (train_stan - m) / s
	test_stan = (test_stan - m) / s
	# return normalized images
	return train_stan, test_stan

# repeated evaluation of model with data prep scheme
def repeated_evaluation(datapre_func, n_repeats=10):
	# prepare data
	trainX, trainY, testX, testY = load_dataset()
	# repeated evaluation
	scores = list()
	for i in range(n_repeats):
		# define model
		model = define_model()
		# prepare data
		prep_trainX, prep_testX = datapre_func(trainX, testX)
		# fit model
		model.fit(prep_trainX, trainY, epochs=5, batch_size=64, verbose=0)
		# evaluate model
		_, acc = model.evaluate(prep_testX, testY, verbose=0)
		# store result
		scores.append(acc)
		print('> %d: %.3f' % (i, acc * 100.0))
	return scores

all_scores = list()
# normalization
scores = repeated_evaluation(prep_normalize)
print('Normalization: %.3f (%.3f)' % (mean(scores), std(scores)))
all_scores.append(scores)
# center
scores = repeated_evaluation(prep_center)
print('Centered: %.3f (%.3f)' % (mean(scores), std(scores)))
all_scores.append(scores)
# standardize
scores = repeated_evaluation(prep_standardize)
print('Standardized: %.3f (%.3f)' % (mean(scores), std(scores)))
all_scores.append(scores)
# box and whisker plots of results
pyplot.boxplot(all_scores, labels=['norm', 'cent', 'stan'])
pyplot.show()

在中央处理器上运行该示例可能需要大约 30 分钟。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

报告模型每次重复评估的准确性，并在每次运行结束时重复准确性得分的平均值和标准偏差。

> 0: 98.930
> 1: 98.960
> 2: 98.910
> 3: 99.050
> 4: 99.040
> 5: 98.800
> 6: 98.880
> 7: 99.020
> 8: 99.100
> 9: 99.050
Normalization: 0.990 (0.001)
> 0: 98.570
> 1: 98.530
> 2: 98.230
> 3: 98.110
> 4: 98.840
> 5: 98.720
> 6: 9.800
> 7: 98.170
> 8: 98.710
> 9: 10.320
Centered: 0.808 (0.354)
> 0: 99.150
> 1: 98.820
> 2: 99.000
> 3: 98.850
> 4: 99.140
> 5: 99.050
> 6: 99.120
> 7: 99.100
> 8: 98.940
> 9: 99.110
Standardized: 0.990 (0.001

Box and Whisker Plot of CNN Performance on MNIST With Different Pixel Scaling Methods

不同像素缩放方法下美国有线电视新闻网在 MNIST 表现的盒须图

第五步。分析结果

为简洁起见，我们将只在数据准备方案的比较中查看模型表现。这项研究的扩展还将关注每个像素缩放方法下的学习率。

实验结果表明，在 MNIST 数据集上，像素归一化和标准化与所选模型之间的差异很小或没有差异(在所选准确率下)。

从这些结果来看，我会在这个数据集和这个模型上使用规范化而不是标准化，因为结果很好，而且与标准化相比规范化很简单。

这些是有用的结果，因为它们表明在建模之前对像素值进行居中的默认启发式方法对于这个数据集来说并不是好的建议。

可悲的是，方框图和触须图并不能很容易地比较准确率分数的分布，因为中心缩放方法的一些可怕的异常值会挤压分布。

扩展ˌ扩张

本节列出了一些您可能希望探索的扩展教程的想法。

分批缩放。更新研究以计算每个批次的缩放统计数据，而不是整个训练数据集，并查看这对缩放方法的选择是否有影响。
学习曲线。更新研究，为每种数据缩放方法收集一些学习曲线，并比较学习速度。
CIFAR 。在 CIFAR-10 数据集上重复研究，并添加支持全局(跨所有通道缩放)和局部(每个通道缩放)方法的像素缩放方法。

如果你探索这些扩展，我很想知道。在下面的评论中发表你的发现。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

摘要

在本教程中，您发现了如何使用深度学习方法为图像分类选择像素缩放方法。

具体来说，您了解到:

使用特定数据集上的实验和经验结果选择像素缩放方法的过程。
如何实现标准的像素缩放方法，为建模准备图像数据。
如何通过案例研究为标准图像分类问题选择像素缩放方法。

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何开始计算机视觉深度学习（7 天迷你课程）

原文：machinelearningmastery.com/how-to-get-…

最后更新于 2020 年 4 月 2 日

计算机视觉速成课程深度学习。

7 天内为你的计算机视觉项目带来深度学习方法。

我们充斥着来自照片、视频、Instagram、YouTube 的数字图像，以及越来越多的实时视频流。

处理图像数据很难，因为它需要从不同的领域汲取知识，如数字信号处理、机器学习、统计方法，以及当今的深度学习。

在一些具有奇异性和简单模型的具有挑战性的计算机视觉问题上，深度学习方法正在超越经典方法和统计方法。

在本速成课程中，您将发现如何在七天内使用 Python 开始并自信地开发针对计算机视觉问题的深度学习。

注: 这是一个很大很重要的岗位。你可能想把它做成书签。

我们开始吧。

**2019 年 11 月更新:**针对 TensorFlow v2.0 和 MTCNN v0.1.0 进行了更新。

How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

如何开始计算机视觉深度学习(7 天迷你课程) 图片由 oliver.dodd 提供，保留部分权利。

这个速成班是给谁的？

在我们开始之前，让我们确保你在正确的地方。

下面的列表提供了一些关于本课程是为谁设计的一般指南。

如果这些点不完全匹配，不要惊慌；你可能只需要在一个或另一个领域进行复习就能跟上。

你需要知道:

为了深入学习，您需要了解基本的 Python、NumPy 和 Keras。

你不需要成为:

你不需要成为数学天才！
你不需要成为深度学习专家！
你不需要成为计算机视觉研究员！

这门速成课程将把你从一个懂得一点机器学习的开发人员带到一个能给你自己的计算机视觉项目带来深度学习方法的开发人员。

注:本速成课程假设您有一个正在运行的 Python 2 或 3 SciPy 环境，其中至少安装了 NumPy、Pandas、Sklearn 和 Keras 2。如果您需要环境方面的帮助，可以遵循这里的逐步教程:

如何设置机器学习和深度学习的 Python 环境

速成班概述

这门速成课分为七节课。

您可以每天完成一节课(推荐)或一天内完成所有课程(硬核)。这真的取决于你有多少时间和你的热情程度。

以下七节课将帮助您开始深入学习 Python 中的计算机视觉，并提高工作效率:

第 01 课:深度学习与计算机视觉
第 02 课:准备图像数据
第 03 课:卷积神经网络
第 04 课:图像分类
第 05 课:列车图像分类模型
第 06 课:图片增加
第 07 课:人脸检测

每节课可能花费你 60 秒到 30 分钟。慢慢来，按照自己的节奏完成课程。提问，甚至在下面的评论中发布结果。

这些课程可能期望你去发现如何做事。我会给你提示，但每节课的一部分要点是强迫你学习去哪里寻找关于深度学习、计算机视觉和 Python 中同类最佳工具的帮助(提示:我在这个博客上有所有的答案，只需使用搜索框)。

在评论中发布您的结果；我会为你加油的！

坚持住。不要放弃。

注:这只是一个速成班。关于更多的细节和充实的教程，请参见我的书，题目是“计算机视觉的深度学习”

第一课:深度学习和计算机视觉

在本课中，您将发现计算机视觉深度学习方法的前景。

计算机视觉

计算机视觉，简称 CV，被广泛定义为帮助计算机“看到”或从照片和视频等数字图像中提取意义。

50 多年来，研究人员一直致力于帮助计算机视觉的问题，并取得了一些巨大的成功，例如现代相机和智能手机中可用的人脸检测。

理解图像的问题没有解决，而且可能永远不会解决。这主要是因为这个世界既复杂又混乱。规则很少。然而，我们可以轻松地识别对象、人和环境。

深度学习

深度学习(Deep Learning)是机器学习的一个分支，它关注的是受大脑结构和功能启发的算法，称为人工神经网络。

深度学习的一个特性是，这种类型的模型的表现通过用更多的例子训练它以及通过增加它的深度或表示能力来提高。

除了可扩展性之外，深度学习模型的另一个经常被引用的好处是它们能够从原始数据中执行自动特征提取，也称为特征学习。

计算机视觉深度学习的前景

深度学习方法在计算机视觉领域很受欢迎，主要是因为它们兑现了自己的承诺。

深度学习的力量最初的一些大型演示是在计算机视觉中，特别是图像分类。最近在对象检测和人脸识别。

计算机视觉深度学习的三个关键承诺如下:

特征学习的承诺。也就是说，深度学习方法可以从模型所需的图像数据中自动学习特征，而不是要求特征检测器由专家手工制作和指定。
持续改进的承诺。也就是说，计算机视觉中深度学习的表现是基于真实的结果，并且这些改进似乎在继续，也许还在加快。
端到端模型的承诺。也就是说，大型端到端深度学习模型可以适用于大型图像或视频数据集，从而提供更通用、表现更好的方法。

计算机视觉不是解决的，而是需要深度学习才能让你在该领域的许多挑战性问题上达到最先进的水平。

你的任务

这节课，你必须研究并列出深度学习方法在计算机视觉领域的五个令人印象深刻的应用。如果你能链接到一篇演示这个例子的研究论文，你将获得加分。

在下面的评论中发表你的答案。我很想看看你的发现。

在下一课中，您将了解如何为建模准备图像数据。

第 02 课:准备图像数据

在本课中，您将了解如何为建模准备图像数据。

图像由像素值矩阵组成。

像素值通常是 0 到 255 之间的无符号整数。尽管这些像素值可以以原始格式直接呈现给神经网络模型，但这可能会导致建模过程中的挑战，例如模型的训练比预期的慢。

相反，在建模之前准备图像像素值可能会有很大的好处，例如简单地将像素值缩放到 0-1 的范围，以居中甚至标准化这些值。

这称为规范化，可以直接在加载的图像上执行。下面的示例使用 PIL 库(Python 中的标准图像处理库)来加载图像并规范化其像素值。

首先，确认您已经安装了 Pillow 库；它安装在大多数 SciPy 环境中，但您可以在这里了解更多信息:

PIL/Pillow 安装说明

接下来，下载一张澳大利亚悉尼邦迪海滩的照片，由伊莎贝尔·舒尔茨拍摄，并根据许可许可发布。将图像保存在当前工作目录中，文件名为“ bondi_beach.jpg ”。

下载邦迪海滩照片(bondi_beach.jpg)

接下来，我们可以使用 Pillow 库加载照片，确认最小和最大像素值，对这些值进行归一化，并确认进行了归一化。

# example of pixel normalization
from numpy import asarray
from PIL import Image
# load image
image = Image.open('bondi_beach.jpg')
pixels = asarray(image)
# confirm pixel range is 0-255
print('Data Type: %s' % pixels.dtype)
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))
# convert from integers to floats
pixels = pixels.astype('float32')
# normalize to the range 0-1
pixels /= 255.0
# confirm the normalization
print('Min: %.3f, Max: %.3f' % (pixels.min(), pixels.max()))

你的任务

本课的任务是在提供的照片上运行示例代码，并报告归一化前后的最小和最大像素值。

对于加分，您可以更新示例以标准化像素值。

在下面的评论中发表你的发现。我很想看看你的发现。

在下一课中，您将发现关于卷积神经网络模型的信息。

第三课:卷积神经网络

在本课中，您将发现如何使用卷积层、池层和完全连接的输出层来构建卷积神经网络。

卷积层

卷积是将滤波器简单地应用于导致激活的输入。对输入重复应用相同的过滤器会产生一个称为特征图的激活图，指示输入(如图像)中检测到的特征的位置和强度。

通过指定要学习的滤波器数量和每个滤波器的固定大小(通常称为内核形状)，可以创建一个卷积层。

池化层

池化层通过总结要素图的面片中存在的要素，提供了一种对要素图进行下采样的方法。

最大池化或最大池化是一种池化操作，用于计算每个要素地图的每个面片中的最大值。

分类器层

一旦特征被提取出来，它们就可以被解释并用于进行预测，例如对照片中的对象类型进行分类。

这可以通过首先展平二维要素图，然后添加完全连接的输出层来实现。对于二进制分类问题，输出层将有一个节点来预测两个类的值在 0 和 1 之间。

卷积神经网络

下面的例子创建了一个卷积神经网络，它期望灰度图像的平方大小为 256×256 像素，一个卷积层有 32 个滤波器，每个滤波器的大小为 3×3 像素，一个最大池层和一个二进制分类输出层。

# cnn with single convolutional, pooling and output layer
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# create model
model = Sequential()
# add convolutional layer
model.add(Conv2D(32, (3,3), input_shape=(256, 256, 1)))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(1, activation='sigmoid'))
model.summary()

你的任务

在本课中，您的任务是运行该示例，并描述卷积层和池层如何改变输入图像的形状。

对于额外的点，您可以尝试添加更多的卷积层或池层，并描述它在流经模型时对图像的影响。

在下面的评论中发表你的发现。我很想看看你的发现。

在下一课中，您将学习如何使用深度卷积神经网络对对象的照片进行分类。

第 04 课:图像分类

在本课中，您将发现如何使用预先训练好的模型对对象的照片进行分类。

深度卷积神经网络模型在非常大的数据集上训练可能需要几天甚至几周的时间。

简化这一过程的一种方法是重用为标准计算机视觉基准数据集(如 ImageNet 图像识别任务)开发的预训练模型的模型权重。

下面的例子使用 VGG-16 预训练模型将对象的照片分类到 1000 个已知类别中的一个。

下载这张由贾斯汀·摩根拍摄并在许可许可下发布的狗的照片。用文件名“dog.jpg将其保存在您当前的工作目录中。

下载狗的照片(dog.jpg)

以下示例将加载照片并输出预测，对照片中的对象进行分类。

注意:第一次运行这个例子，需要下载预训练好的模型，几百兆，根据你上网的速度做几分钟。

# example of using a pre-trained model as a classifier
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.applications.vgg16 import preprocess_input
from keras.applications.vgg16 import decode_predictions
from keras.applications.vgg16 import VGG16
# load an image from file
image = load_img('dog.jpg', target_size=(224, 224))
# convert the image pixels to a numpy array
image = img_to_array(image)
# reshape data for the model
image = image.reshape((1, image.shape[0], image.shape[1], image.shape[2]))
# prepare the image for the VGG model
image = preprocess_input(image)
# load the model
model = VGG16()
# predict the probability across all output classes
yhat = model.predict(image)
# convert the probabilities to class labels
label = decode_predictions(yhat)
# retrieve the most likely result, e.g. highest probability
label = label[0][0]
# print the classification
print('%s (%.2f%%)' % (label[1], label[2]*100))

你的任务

本课的任务是运行示例并报告结果。

要获得加分，请尝试在另一张普通对象的照片上运行该示例。

在下面的评论中发表你的发现。我很想看看你的发现。

在下一课中，您将发现如何拟合和评估图像分类模型。

第五课:列车图像分类模型

在本课中，您将发现如何训练和评估用于图像分类的卷积神经网络。

Fashion-MNIST 服装分类问题是一个用于计算机视觉和深度学习的新标准数据集。

它是一个数据集，由 60，000 个 28×28 像素的小正方形灰度图像组成，包括 10 种服装，如鞋子、t 恤、连衣裙等。

下面的示例加载数据集，缩放像素值，然后在训练数据集上拟合卷积神经网络，并在测试数据集上评估网络的表现。

这个例子将在现代的中央处理器上运行几分钟；不需要 GPU。

# fit a cnn on the fashion mnist dataset
from keras.datasets import fashion_mnist
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dense
from keras.layers import Flatten
# load dataset
(trainX, trainY), (testX, testY) = fashion_mnist.load_data()
# reshape dataset to have a single channel
trainX = trainX.reshape((trainX.shape[0], 28, 28, 1))
testX = testX.reshape((testX.shape[0], 28, 28, 1))
# convert from integers to floats
trainX, testX = trainX.astype('float32'), testX.astype('float32')
# normalize to range 0-1
trainX,testX  = trainX / 255.0, testX / 255.0
# one hot encode target values
trainY, testY = to_categorical(trainY), to_categorical(testY)
# define model
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', input_shape=(28, 28, 1)))
model.add(MaxPooling2D())
model.add(Flatten())
model.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# fit model
model.fit(trainX, trainY, epochs=10, batch_size=32, verbose=2)
# evaluate model
loss, acc = model.evaluate(testX, testY, verbose=0)
print(loss, acc)

你的任务

在本课中，您的任务是运行示例并报告模型在测试数据集上的表现。

对于奖励点，尝试改变模型的配置，或者尝试保存模型，然后加载它，并使用它对服装的新灰度照片进行预测。

在下面的评论中发表你的发现。我很想看看你的发现。

在下一课中，您将发现如何在训练数据上使用图像扩充。

第 6 课:图像放大

在本课中，您将发现如何使用图像扩充。

图像数据扩充是一种可以通过在数据集中创建图像的修改版本来人工扩展训练数据集大小的技术。

在更多的数据上训练深度学习神经网络模型可以产生更熟练的模型，并且增强技术可以创建图像的变化，这可以提高拟合模型将他们所学知识推广到新图像的能力。

Keras 深度学习神经网络库通过 ImageDataGenerator 类提供了使用图像数据扩充来拟合模型的能力。

下载一张由和 YaDontStop 拍摄的鸟的照片，在许可的许可下发布。将其保存到您当前的工作目录中，名称为“bird.jpg”。

下载一只鸟的照片(bird.jpg)

下面的示例将照片加载为数据集，并使用图像扩充来创建图像的翻转和旋转版本，该版本可用于训练卷积神经网络模型。

# example using image augmentation
from numpy import expand_dims
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import ImageDataGenerator
from matplotlib import pyplot
# load the image
img = load_img('bird.jpg')
# convert to numpy array
data = img_to_array(img)
# expand dimension to one sample
samples = expand_dims(data, 0)
# create image data augmentation generator
datagen = ImageDataGenerator(horizontal_flip=True, vertical_flip=True, rotation_range=90)
# prepare iterator
it = datagen.flow(samples, batch_size=1)
# generate samples and plot
for i in range(9):
     # define subplot
     pyplot.subplot(330 + 1 + i)
     # generate batch of images
     batch = it.next()
     # convert to unsigned integers for viewing
     image = batch[0].astype('uint32')
     # plot raw pixel data
     pyplot.imshow(image)
# show the figure
pyplot.show()

你的任务

在本课中，您的任务是运行示例并报告图像扩充对原始图像的影响。

要获得加分，请尝试 ImageDataGenerator 类支持的其他类型的图像扩充。

在下面的评论中发表你的发现。我很想看看你发现了什么。

在下一课中，您将发现如何使用深度卷积网络来检测照片中的人脸。

第 07 课:人脸检测

在本课中，您将发现如何使用卷积神经网络进行人脸检测。

人脸检测是人类需要解决的一个小问题，经典的基于特征的技术，如级联分类器，已经很好地解决了这个问题。

最近，深度学习方法在标准人脸检测数据集上取得了最先进的结果。一个例子是多任务级联卷积神经网络，简称 MTCNN。

ipazc/MTCNN 项目提供了一个 MTCNN 的开源实现，可以轻松安装如下:

sudo pip install mtcnn

下载一张由荷兰拍摄并在许可许可下发布的人在街上的照片。将其保存到您当前的工作目录中，名称为“street.jpg”。

下载一张人在街上的照片(street.jpg)

下面的示例将加载照片并使用 MTCNN 模型来检测人脸，并将绘制照片并在第一个检测到的人脸周围画一个框。

# face detection with mtcnn on a photograph
from matplotlib import pyplot
from matplotlib.patches import Rectangle
from mtcnn.mtcnn import MTCNN
# load image from file
pixels = pyplot.imread('street.jpg')
# create the detector, using default weights
detector = MTCNN()
# detect faces in the image
faces = detector.detect_faces(pixels)
# plot the image
pyplot.imshow(pixels)
# get the context for drawing boxes
ax = pyplot.gca()
# get coordinates from the first face
x, y, width, height = faces[0]['box']
# create the shape
rect = Rectangle((x, y), width, height, fill=False, color='red')
# draw the box
ax.add_patch(rect)
# show the plot
pyplot.show()

你的任务

本课的任务是运行示例并描述结果。

要获得加分，请在另一张有多张脸的照片上尝试该模型，并更新代码示例，在每个检测到的脸周围画一个方框。

在下面的评论中发表你的发现。我很想看看你的发现。

末日！

(看你走了多远)

你成功了。干得好！

花一点时间，回头看看你已经走了多远。

你发现了:

计算机视觉是什么，深度学习对这个领域的前景和影响。
如何缩放图像数据的像素值，以便为建模做好准备。
如何从头开发一个卷积神经网络模型？
如何使用预先训练好的模型对对象的照片进行分类。
如何从零开始训练模特对服装照片进行分类？
如何使用图像扩充在训练数据集中创建照片的修改副本。
如何使用预先训练的深度学习模型来检测照片中的人脸。

这只是你深入学习计算机视觉旅程的开始。不断练习和发展你的技能。

下一步，看看我写的关于计算机视觉深度学习的书。

摘要

你觉得迷你课程怎么样？ 你喜欢这个速成班吗？

你有什么问题吗？有什么症结吗？让我知道。请在下面留言。

如何在 Keras 从头开发 VGG、Inception 和 ResNet 模块

原文：machinelearningmastery.com/how-to-impl…

最后更新于 2019 年 7 月 5 日

里程碑模型中有一些离散的架构元素，您可以在自己的卷积神经网络设计中使用。

具体来说，在图像分类等任务上取得最先进成果的模型使用重复多次的离散架构元素，例如 VGG 模型中的 VGG 区块、谷歌网络中的初始模块和 ResNet 中的剩余模块。

一旦您能够实现这些架构元素的参数化版本，您就可以在为计算机视觉和其他应用程序设计自己的模型时使用它们。

在本教程中，您将发现如何从零开始实现里程碑卷积神经网络模型的关键架构元素。

完成本教程后，您将知道:

如何实现 VGG-16 和 VGG-19 卷积神经网络模型中使用的 VGG 模块。
如何实现 GoogLeNet 模型中使用的幼稚且优化的初始模块。
如何实现 ResNet 模型中的身份残差模块？

用我的新书计算机视觉深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

How to Implement Major Architecture Innovations for Convolutional Neural Networks

如何实现卷积神经网络的主要架构创新图片由达维宁提供，保留部分权利。

教程概述

本教程分为三个部分；它们是:

如何实现 VGG 街区
如何实现初始模块
如何实现剩余模块

如何实现 VGG 街区

以牛津视觉几何组命名的 VGG 卷积神经网络体系结构是在计算机视觉中使用深度学习方法的重要里程碑。

卡伦·西蒙扬和安德鲁·齐泽曼在 2014 年发表的题为“用于大规模图像识别的非常深卷积网络”的论文中描述了该架构，并在 LSVRC-2014 计算机视觉竞赛中取得了最高表现。

这个建筑的关键创新是定义和重复了我们称之为 VGG 街区的东西。这些是使用小滤波器(例如 3×3 像素)的卷积层组，后面是最大池化层。

图像通过一堆卷积码(conv。)层，其中我们使用具有非常小的感受野的过滤器:3×3(这是捕捉左/右、上/下、中心的概念的最小尺寸)。[……]最大池化在 2×2 像素窗口上执行，步长为 2。

——用于大规模图像识别的超深度卷积网络，2014。

当从零开始开发新模型时，具有 VGG 块的卷积神经网络是一个明智的起点，因为它易于理解，易于实现，并且在从图像中提取特征方面非常有效。

我们可以将 VGG 块的规格概括为一个或多个卷积层，具有相同数量的滤波器和 3×3 的滤波器大小，1×1 的步长，相同的填充，因此输出大小与每个滤波器的输入大小相同，并且使用整流的线性激活函数。这些层之后是最大池层，大小为 2×2，跨度相同。

我们可以使用 Keras 函数 API 定义一个函数来创建一个 VGG 块，该函数具有给定数量的卷积层和每层给定数量的滤波器。

# function for creating a vgg block
def vgg_block(layer_in, n_filters, n_conv):
	# add convolutional layers
	for _ in range(n_conv):
		layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu')(layer_in)
	# add max pooling layer
	layer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in)
	return layer_in

要使用该函数，可以在块之前传入该层，并接收可用于集成到模型中的块末端的层。

例如，第一层可能是输入层，可以作为参数传递到函数中。然后，该函数返回对块中最后一层(池层)的引用，该层可以连接到平坦层和后续密集层，以进行分类预测。

我们可以通过定义一个小模型来演示如何使用这个函数，该模型期望正方形彩色图像作为输入，并向模型中添加一个具有两个卷积层的 VGG 块，每个卷积层具有 64 个滤波器。

# Example of creating a CNN model with a VGG block
from keras.models import Model
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.utils import plot_model

# function for creating a vgg block
def vgg_block(layer_in, n_filters, n_conv):
	# add convolutional layers
	for _ in range(n_conv):
		layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu')(layer_in)
	# add max pooling layer
	layer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in)
	return layer_in

# define model input
visible = Input(shape=(256, 256, 3))
# add vgg module
layer = vgg_block(visible, 64, 2)
# create model
model = Model(inputs=visible, outputs=layer)
# summarize model
model.summary()
# plot model architecture
plot_model(model, show_shapes=True, to_file='vgg_block.png')

运行该示例会创建模型并总结结构。

我们可以看到，正如预期的那样，该模型添加了一个带有两个卷积层的 VGG 块，每个卷积层有 64 个滤波器，然后是一个最大池层。

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 256, 256, 3)       0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 256, 256, 64)      1792
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 256, 256, 64)      36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 128, 128, 64)      0
=================================================================
Total params: 38,720
Trainable params: 38,720
Non-trainable params: 0
_________________________________________________________________

还会创建一个模型架构图，这可能有助于使模型布局更加具体。

注意，创建绘图假设您已经安装了 pydot 和 pygraphviz 。如果不是这样，您可以注释掉导入语句，并调用示例中的 plot_model() 函数。

Plot of Convolutional Neural Network Architecture With a VGG Block

带有 VGG 块的卷积神经网络结构图

在你自己的模型中使用 VGG 积木应该很常见，因为它们非常简单有效。

我们可以扩展该示例并演示一个具有三个 VGG 块的单个模型，前两个块具有两个卷积层，分别具有 64 个和 128 个滤波器，第三个块具有四个卷积层，具有 256 个滤波器。这是 VGG 块的常见用法，其中过滤器的数量随着模型的深度而增加。

完整的代码列表如下。

# Example of creating a CNN model with many VGG blocks
from keras.models import Model
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.utils import plot_model

# function for creating a vgg block
def vgg_block(layer_in, n_filters, n_conv):
	# add convolutional layers
	for _ in range(n_conv):
		layer_in = Conv2D(n_filters, (3,3), padding='same', activation='relu')(layer_in)
	# add max pooling layer
	layer_in = MaxPooling2D((2,2), strides=(2,2))(layer_in)
	return layer_in

# define model input
visible = Input(shape=(256, 256, 3))
# add vgg module
layer = vgg_block(visible, 64, 2)
# add vgg module
layer = vgg_block(layer, 128, 2)
# add vgg module
layer = vgg_block(layer, 256, 4)
# create model
model = Model(inputs=visible, outputs=layer)
# summarize model
model.summary()
# plot model architecture
plot_model(model, show_shapes=True, to_file='multiple_vgg_blocks.png')

再次，运行示例总结了模型架构，我们可以清楚地看到 VGG 区块的模式。

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 256, 256, 3)       0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 256, 256, 64)      1792
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 256, 256, 64)      36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 128, 128, 64)      0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 128, 128, 128)     73856
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 128, 128, 128)     147584
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 64, 64, 128)       0
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 64, 64, 256)       295168
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 64, 64, 256)       590080
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 64, 64, 256)       590080
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 64, 64, 256)       590080
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 32, 32, 256)       0
=================================================================
Total params: 2,325,568
Trainable params: 2,325,568
Non-trainable params: 0
_________________________________________________________________

创建了模型架构的图，提供了对相同线性层进展的不同视角。

Plot of Convolutional Neural Network Architecture With Multiple VGG Blocks

多 VGG 块卷积神经网络结构图

如何实现初始模块

克里斯蒂安·塞格迪(Christian Szegedy)等人在 2015 年的论文中描述了初始模块，并将其用于谷歌网络模型，该论文题为“用卷积更深入。”

与 VGG 模式一样，谷歌网模式在 2014 年版本的 ILSVRC 挑战赛中取得了顶级表现。

初始模型的关键创新被称为初始模块。这是一组并行卷积层，具有不同大小的滤波器(例如 1×1 、3×3、5×5)以及一个和 3×3 最大池化层，然后将它们的结果连接起来。

为了避免补丁对齐问题，初始架构的当前体现被限制为滤波器大小为 1×1、3×3 和 5×5；这个决定更多的是基于方便而不是必要。[……]此外，由于池化操作对于当前卷积网络的成功至关重要，因此建议在每个这样的阶段增加一个替代的并行池化路径也应该具有额外的有益效果

——用回旋更深入，2015。

这是一个非常简单和强大的架构单元，它不仅允许模型学习相同大小的并行过滤器，还允许学习不同大小的并行过滤器，从而允许在多个尺度上学习。

我们可以直接使用 Keras 函数式 API 实现一个初始模块。下面的函数将为每个并行卷积层创建一个带有固定数量滤波器的初始模块。从论文中描述的 GoogLeNet 架构来看，由于模型经过高度优化，似乎没有对并行卷积层使用系统的滤波器规模。因此，我们可以对模块定义进行参数化，以便指定在 1×1、3×3 和 5×5 滤波器中使用的滤波器数量。

# function for creating a naive inception block
def inception_module(layer_in, f1, f2, f3):
	# 1x1 conv
	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in)
	# 3x3 conv
	conv3 = Conv2D(f2, (3,3), padding='same', activation='relu')(layer_in)
	# 5x5 conv
	conv5 = Conv2D(f3, (5,5), padding='same', activation='relu')(layer_in)
	# 3x3 max pooling
	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in)
	# concatenate filters, assumes filters/channels last
	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1)
	return layer_out

要使用该函数，请提供对前一层的引用作为输入、过滤器的数量，它将返回对串联过滤器层的引用，然后您可以连接到更多初始模块或子模型来进行预测。

我们可以通过创建一个具有单个初始模块的模型来演示如何使用这个函数。在这种情况下，过滤器的数量基于论文表 1 中的“初始(3a) ”。

下面列出了完整的示例。

# example of creating a CNN with an inception module
from keras.models import Model
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers.merge import concatenate
from keras.utils import plot_model

# function for creating a naive inception block
def naive_inception_module(layer_in, f1, f2, f3):
	# 1x1 conv
	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in)
	# 3x3 conv
	conv3 = Conv2D(f2, (3,3), padding='same', activation='relu')(layer_in)
	# 5x5 conv
	conv5 = Conv2D(f3, (5,5), padding='same', activation='relu')(layer_in)
	# 3x3 max pooling
	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in)
	# concatenate filters, assumes filters/channels last
	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1)
	return layer_out

# define model input
visible = Input(shape=(256, 256, 3))
# add inception module
layer = naive_inception_module(visible, 64, 128, 32)
# create model
model = Model(inputs=visible, outputs=layer)
# summarize model
model.summary()
# plot model architecture
plot_model(model, show_shapes=True, to_file='naive_inception_module.png')

运行该示例会创建模型并总结层。

我们知道卷积层和池层是并行的，但是这个总结并不容易捕捉结构。

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 256, 256, 3)  0
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 256, 256, 64) 256         input_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 256, 256, 128 3584        input_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 256, 256, 32) 2432        input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 256, 256, 3)  0           input_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 256, 256, 227 0           conv2d_1[0][0]
                                                                 conv2d_2[0][0]
                                                                 conv2d_3[0][0]
                                                                 max_pooling2d_1[0][0]
==================================================================================================
Total params: 6,272
Trainable params: 6,272
Non-trainable params: 0
__________________________________________________________________________________________________

还创建了模型架构的图，这有助于清楚地看到模块的并行结构以及模块的每个元素的输出的匹配形状，这允许它们通过第三维(滤波器或通道)直接连接。

Plot of Convolutional Neural Network Architecture With a Naive Inception Module

带有初始模块的卷积神经网络结构图

我们已经实现的初始模块的版本被称为初始模块。

为了减少所需的计算量，对模块进行了修改。具体来说，增加 1×1 卷积层是为了减少 3×3 和 5×5 卷积层之前的滤波器数量，增加汇聚层之后的滤波器数量。

这导致了初始架构的第二个想法:在计算需求增加过多的地方明智地减少维度。[……]也就是说，在昂贵的 3×3 和 5×5 卷积之前，使用 1×1 卷积来计算约简。除了用作还原剂，它们还包括整流线性激活的使用，使它们具有双重用途

——用回旋更深入，2015。

如果您打算在模型中使用许多初始模块，您可能需要这种基于计算表现的修改。

下面的函数通过参数化实现了这种优化改进，因此您可以控制 3×3 和 5×5 卷积层之前滤波器数量的减少量，以及最大池化之后增加的滤波器数量。

# function for creating a projected inception module
def inception_module(layer_in, f1, f2_in, f2_out, f3_in, f3_out, f4_out):
	# 1x1 conv
	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in)
	# 3x3 conv
	conv3 = Conv2D(f2_in, (1,1), padding='same', activation='relu')(layer_in)
	conv3 = Conv2D(f2_out, (3,3), padding='same', activation='relu')(conv3)
	# 5x5 conv
	conv5 = Conv2D(f3_in, (1,1), padding='same', activation='relu')(layer_in)
	conv5 = Conv2D(f3_out, (5,5), padding='same', activation='relu')(conv5)
	# 3x3 max pooling
	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in)
	pool = Conv2D(f4_out, (1,1), padding='same', activation='relu')(pool)
	# concatenate filters, assumes filters/channels last
	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1)
	return layer_out

我们可以创建一个模型，其中包含两个优化的初始模块，以具体了解架构在实践中的样子。

在这种情况下，过滤器配置的数量基于本文表 1 中的“初始(3a) ”和“初始(3b) ”。

下面列出了完整的示例。

# example of creating a CNN with an efficient inception module
from keras.models import Model
from keras.layers import Input
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers.merge import concatenate
from keras.utils import plot_model

# function for creating a projected inception module
def inception_module(layer_in, f1, f2_in, f2_out, f3_in, f3_out, f4_out):
	# 1x1 conv
	conv1 = Conv2D(f1, (1,1), padding='same', activation='relu')(layer_in)
	# 3x3 conv
	conv3 = Conv2D(f2_in, (1,1), padding='same', activation='relu')(layer_in)
	conv3 = Conv2D(f2_out, (3,3), padding='same', activation='relu')(conv3)
	# 5x5 conv
	conv5 = Conv2D(f3_in, (1,1), padding='same', activation='relu')(layer_in)
	conv5 = Conv2D(f3_out, (5,5), padding='same', activation='relu')(conv5)
	# 3x3 max pooling
	pool = MaxPooling2D((3,3), strides=(1,1), padding='same')(layer_in)
	pool = Conv2D(f4_out, (1,1), padding='same', activation='relu')(pool)
	# concatenate filters, assumes filters/channels last
	layer_out = concatenate([conv1, conv3, conv5, pool], axis=-1)
	return layer_out

# define model input
visible = Input(shape=(256, 256, 3))
# add inception block 1
layer = inception_module(visible, 64, 96, 128, 16, 32, 32)
# add inception block 1
layer = inception_module(layer, 128, 128, 192, 32, 96, 64)
# create model
model = Model(inputs=visible, outputs=layer)
# summarize model
model.summary()
# plot model architecture
plot_model(model, show_shapes=True, to_file='inception_module.png')

运行该示例会创建层的线性摘要，这对理解正在发生的事情并没有真正的帮助。

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 256, 256, 3)  0
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 256, 256, 96) 384         input_1[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 256, 256, 16) 64          input_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 256, 256, 3)  0           input_1[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 256, 256, 64) 256         input_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 256, 256, 128 110720      conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 256, 256, 32) 12832       conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 256, 256, 32) 128         max_pooling2d_1[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 256, 256, 256 0           conv2d_1[0][0]
                                                                 conv2d_3[0][0]
                                                                 conv2d_5[0][0]
                                                                 conv2d_6[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 256, 256, 128 32896       concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 256, 256, 32) 8224        concatenate_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 256, 256, 256 0           concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 256, 256, 128 32896       concatenate_1[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 256, 256, 192 221376      conv2d_8[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 256, 256, 96) 76896       conv2d_10[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 256, 256, 64) 16448       max_pooling2d_2[0][0]
__________________________________________________________________________________________________
concatenate_2 (Concatenate)     (None, 256, 256, 480 0           conv2d_7[0][0]
                                                                 conv2d_9[0][0]
                                                                 conv2d_11[0][0]
                                                                 conv2d_12[0][0]
==================================================================================================
Total params: 513,120
Trainable params: 513,120
Non-trainable params: 0
__________________________________________________________________________________________________

创建了模型架构的图，该图明确了每个模块的布局以及第一个模型如何为第二个模块提供信息。

请注意，由于空间原因，每个初始模块中的第一个 1×1 卷积位于最右侧，但除此之外，其他层在每个模块中从左到右组织。

Plot of Convolutional Neural Network Architecture With a Efficient Inception Module

具有有效初始模块的卷积神经网络结构图

如何实现剩余模块

何等人在 2016 年的论文《图像识别的深度残差学习》中提出了卷积神经网络的残差网络(ResNet)架构，该架构在 2015 年版本的 ILSVRC 挑战中取得了成功。

ResNet 的一个关键创新是剩余模块。残差模块，特别是单位残差模型，是具有相同数量的滤波器和小滤波器大小的两个卷积层的块，其中第二层的输出与第一卷积层的输入相加。绘制成图形后，模块的输入被添加到模块的输出中，称为快捷连接。

我们可以在 Keras 中使用函数式 API 和 add() 合并函数直接实现这一点。

# function for creating an identity residual module
def residual_module(layer_in, n_filters):
	# conv1
	conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
	# conv2
	conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1)
	# add filters, assumes filters/channels last
	layer_out = add([conv2, layer_in])
	# activation function
	layer_out = Activation('relu')(layer_out)
	return layer_out

这种直接实现的一个限制是，如果输入层的滤波器数量与模块最后一个卷积层的滤波器数量不匹配(由 n_filters 定义)，那么我们将得到一个错误。

一种解决方案是使用 1×1 卷积层，通常称为投影层，以增加输入层的滤波器数量或减少模块中最后一个卷积层的滤波器数量。前一种解决方案更有意义，是论文中提出的方法，称为投影捷径。

当维度增加[…]时，我们考虑两个选项:(A)快捷方式仍然执行身份映射，为增加维度填充额外的零条目。该选项不引入额外参数；(B)投影快捷方式[…]用于匹配尺寸(由 1×1 卷积完成)。

——图像识别的深度残差学习，2015。

下面是函数的更新版本，如果可能的话，将使用标识，否则输入中过滤器数量的投影与 n_filters 参数不匹配。

# function for creating an identity or projection residual module
def residual_module(layer_in, n_filters):
	merge_input = layer_in
	# check if the number of filters needs to be increase, assumes channels last format
	if layer_in.shape[-1] != n_filters:
		merge_input = Conv2D(n_filters, (1,1), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
	# conv1
	conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
	# conv2
	conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1)
	# add filters, assumes filters/channels last
	layer_out = add([conv2, merge_input])
	# activation function
	layer_out = Activation('relu')(layer_out)
	return layer_out

我们可以用一个简单的模型演示这个模块的用法。

下面列出了完整的示例。

# example of a CNN model with an identity or projection residual module
from keras.models import Model
from keras.layers import Input
from keras.layers import Activation
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import add
from keras.utils import plot_model

# function for creating an identity or projection residual module
def residual_module(layer_in, n_filters):
	merge_input = layer_in
	# check if the number of filters needs to be increase, assumes channels last format
	if layer_in.shape[-1] != n_filters:
		merge_input = Conv2D(n_filters, (1,1), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
	# conv1
	conv1 = Conv2D(n_filters, (3,3), padding='same', activation='relu', kernel_initializer='he_normal')(layer_in)
	# conv2
	conv2 = Conv2D(n_filters, (3,3), padding='same', activation='linear', kernel_initializer='he_normal')(conv1)
	# add filters, assumes filters/channels last
	layer_out = add([conv2, merge_input])
	# activation function
	layer_out = Activation('relu')(layer_out)
	return layer_out

# define model input
visible = Input(shape=(256, 256, 3))
# add vgg module
layer = residual_module(visible, 64)
# create model
model = Model(inputs=visible, outputs=layer)
# summarize model
model.summary()
# plot model architecture
plot_model(model, show_shapes=True, to_file='residual_module.png')

运行该示例首先创建模型，然后打印层摘要。

因为模块是线性的，所以这个总结有助于了解发生了什么。

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 256, 256, 3)  0
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 256, 256, 64) 1792        input_1[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 256, 256, 64) 36928       conv2d_2[0][0]
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 256, 256, 64) 256         input_1[0][0]
__________________________________________________________________________________________________
add_1 (Add)                     (None, 256, 256, 64) 0           conv2d_3[0][0]
                                                                 conv2d_1[0][0]
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 256, 256, 64) 0           add_1[0][0]
==================================================================================================
Total params: 38,976
Trainable params: 38,976
Non-trainable params: 0
__________________________________________________________________________________________________

还创建了模型架构的图。

我们可以看到输入中过滤器数量膨胀的模块，以及模块末尾两个元素的添加。

Plot of Convolutional Neural Network Architecture With an Residual Module

带有残差模块的卷积神经网络结构图

本文描述了其他类型的剩余连接，如瓶颈。这些都是留给读者的练习，可以通过更新 residual_module() 函数轻松实现。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

邮件

如何使用 Keras 函数 API 进行深度学习

报纸

应用程序接口

摘要

在本教程中，您发现了如何从零开始实现里程碑卷积神经网络模型中的关键架构元素。

具体来说，您了解到:

如何实现 VGG-16 和 VGG-19 卷积神经网络模型中使用的 VGG 模块。
如何实现 GoogLeNet 模型中使用的幼稚且优化的初始模块。
如何实现 ResNet 模型中的身份残差模块？

你有什么问题吗？在下面的评论中提问，我会尽力回答。