Machine-Learning-Mastery-深度学习表现教程-五-Machine Learning Mastery

Machine Learning Mastery 深度学习表现教程（五）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

如何利用迁移学习来提高深度学习神经网络的表现

原文：machinelearningmastery.com/how-to-impr…

最后更新于 2020 年 8 月 25 日

深度学习神经网络的一个有趣的好处是，它们可以在相关问题上重用。

迁移学习指的是一种针对不同但某种程度上相似的问题进行预测建模的技术，然后可以部分或全部重用该技术，以加速训练并提高模型在感兴趣问题上的表现。

在深度学习中，这意味着在新模型中重用预训练网络模型的一个或多个层中的权重，或者保持权重固定，对其进行微调，或者在训练模型时完全调整权重。

在本教程中，您将发现如何使用迁移学习来提高带有 Keras 的 Python 中深度学习神经网络的表现。

完成本教程后，您将知道:

转移学习是一种重用在相关预测建模问题上训练的模型的方法。
作为一种权重初始化方案或特征提取方法，转移学习可以用来加速神经网络的训练。
如何利用迁移学习提高多类分类问题的 MLP 表现？

用我的新书更好的深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

2019 年 10 月更新:针对 Keras 2.3 和 TensorFlow 2.0 更新。
2020 年 1 月更新:针对 Sklearn v0.22 API 的变化进行了更新。

How to Improve Performance With Transfer Learning for Deep Learning Neural Networks

教程概述

本教程分为六个部分；它们是:

什么是迁移学习？
斑点多类分类问题
问题 1 的多层感知器模型
问题 2 的独立 MLP 模型
问题 2 的迁移学习 MLP
问题 2 的模型比较

什么是迁移学习？

迁移学习通常指的是在一个问题上训练的模型以某种方式用于第二个相关问题的过程。

迁移学习和领域适应指的是这样一种情况，即在一种环境(即 P1 分布)中所学的知识被用来提高在另一种环境(如 P2 分布)中的泛化能力。

—第 536 页，深度学习，2016。

在深度学习中，迁移学习是一种技术，通过这种技术，神经网络模型首先在与正在解决的问题相似的问题上进行训练。来自训练模型的一个或多个层随后被用于在感兴趣的问题上训练的新模型中。

这通常在监督学习环境中理解，其中输入是相同的，但是目标可能是不同的性质。例如，我们可能在第一个设置中了解一组视觉类别，如猫和狗，然后在第二个设置中了解一组不同的视觉类别，如蚂蚁和黄蜂。

—第 536 页，深度学习，2016。

迁移学习具有减少神经网络模型的训练时间并导致较低的泛化误差的优点。

实现迁移学习有两种主要方法；它们是:

重量初始化。
特征提取。

重用层中的权重可以用作训练过程的起点，并根据新问题进行调整。这种用法将迁移学习视为一种权重初始化方案。当第一个相关问题比感兴趣的问题有更多的标记数据时，这可能是有用的，并且问题结构的相似性可能在两种情况下都有用。

……目标是利用第一个设置中的数据来提取信息，这些信息可能在学习时有用，甚至在第二个设置中直接进行预测时有用。

—第 538 页，深度学习，2016。

或者，网络的权重可能不适应新的问题，只有在重用层之后的新层可以被训练来解释它们的输出。这种用法将迁移学习视为一种特征提取方案。这种方法的一个例子是在开发照片字幕模型时，将为照片分类训练的深度卷积神经网络模型重新用作特征提取器。

这些用法的变化可能不涉及最初在新问题上训练模型的权重，而是稍后用小学习率微调学习模型的所有权重。

斑点多类分类问题

我们将使用一个小的多类分类问题作为基础来演示迁移学习。

Sklearn 类提供了 make_blobs()函数，该函数可用于创建具有规定数量的样本、输入变量、类和类内样本方差的多类分类问题。

我们可以将问题配置为具有两个输入变量(表示点的 x 和 y 坐标)和每组内点的标准偏差 2.0。我们将使用相同的随机状态(伪随机数发生器的种子)来确保我们总是获得相同的数据点。

# generate 2d classification dataset
X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=1)

结果是我们可以建模的数据集的输入和输出元素。

“ random_state ”参数可以变化，以给出问题的不同版本(不同的聚类中心)。我们可以用它从两个不同的问题中生成样本:在一个问题上训练一个模型，并重新使用权重来更好地学习第二个问题的模型。

具体来说，我们将 random_state=1 称为问题 1， random_state=2 称为问题 2。

问题 1 。带有两个输入变量和三个类的 Blobs 问题，其中 random_state 参数设置为 1。
问题 2 。带有两个输入变量和三个类的 Blobs 问题，其中 random_state 参数设置为 2。

为了了解问题的复杂性，我们可以在二维散点图上绘制每个点，并按类值给每个点着色。

下面列出了完整的示例。

# plot of blobs multiclass classification problems 1 and 2
from sklearn.datasets import make_blobs
from numpy import where
from matplotlib import pyplot

# generate samples for blobs problem with a given random seed
def samples_for_seed(seed):
	# generate samples
	X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)
	return X, y

# create a scatter plot of points colored by class value
def plot_samples(X, y, classes=3):
	# plot points for each class
	for i in range(classes):
		# select indices of points with each class label
		samples_ix = where(y == i)
		# plot points for this class with a given color
		pyplot.scatter(X[samples_ix, 0], X[samples_ix, 1])

# generate multiple problems
n_problems = 2
for i in range(1, n_problems+1):
	# specify subplot
	pyplot.subplot(210 + i)
	# generate samples
	X, y = samples_for_seed(i)
	# scatter plot of samples
	plot_samples(X, y)
# plot figure
pyplot.show()

运行该示例会为问题 1 和问题 2 生成一个包含 1，000 个示例的示例，并为每个示例创建一个散点图，根据数据点的类值对其进行着色。

Scatter Plots of Blobs Dataset for Problems 1 and 2 With Three Classes and Points Colored by Class Value

问题 1 和问题 2 的斑点数据集散点图，具有三个类和按类值着色的点

这为迁移学习提供了良好的基础，因为问题的每个版本都有相似的输入数据，具有相似的规模，尽管目标信息不同(例如，聚类中心)。

我们期望模型的某些方面适合于一个版本的斑点问题(例如问题 1)，当模型适合于一个新版本的斑点问题(例如问题 2)时，这些方面是有用的。

问题 1 的多层感知器模型

在本节中，我们将为问题 1 开发一个多层感知器模型(MLP)，并将该模型保存到文件中，以便我们以后可以重用权重。

首先，我们将开发一个函数来为建模准备数据集。在用给定的随机种子调用 make_blobs()函数之后(例如，在这个例子中，问题 1 是一个)，目标变量必须是一个热编码的，这样我们就可以开发一个模型来预测给定样本属于每个目标类的概率。

然后可以将准备好的样本分成两半，训练数据集和测试数据集各有 500 个示例。下面的 samples_for_seed() 函数实现了这一点，为给定的随机数种子准备数据集，并重新调整分割成输入和输出组件的训练和测试集。

# prepare a blobs examples with a given random seed
def samples_for_seed(seed):
	# generate samples
	X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)
	# one hot encode output variable
	y = to_categorical(y)
	# split into train and test
	n_train = 500
	trainX, testX = X[:n_train, :], X[n_train:, :]
	trainy, testy = y[:n_train], y[n_train:]
	return trainX, trainy, testX, testy

我们可以调用这个函数为问题 1 准备一个数据集，如下所示。

# prepare data
trainX, trainy, testX, testy = samples_for_seed(1)

接下来，我们可以在训练数据集上定义和拟合一个模型。

模型预期数据中的两个变量有两个输入。该模型将有两个隐藏层，每个隐藏层有五个节点，以及经过校正的线性激活函数。这个函数可能不需要两层，尽管我们对模型学习一些深层结构感兴趣，我们可以跨这个问题的实例重用这些结构。输出层有三个节点，目标变量和 softmax 激活函数中的每个类一个节点。

# define model
model = Sequential()
model.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(3, activation='softmax'))

假设问题是一个多类分类问题，分类交叉熵损失函数被最小化，具有默认学习率和无动量的随机梯度下降被用来学习问题。

# compile model
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

该模型适用于训练数据集上的 100 个时期，测试集在训练期间用作验证数据集，在每个时期结束时评估两个数据集上的表现，以便我们可以绘制学习曲线。

history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)

fit_model() 函数将这些元素联系在一起，将训练和测试数据集作为参数，并返回 fit 模型和训练历史。

# define and fit model on a training dataset
def fit_model(trainX, trainy, testX, testy):
	# define model
	model = Sequential()
	model.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(3, activation='softmax'))
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# fit model
	history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
	return model, history

我们可以用准备好的数据集调用这个函数来获得一个拟合模型和在训练过程中收集的历史。

# fit model on train dataset
model, history = fit_model(trainX, trainy, testX, testy)

最后，我们可以总结模型的表现。

可以评估模型在列车和测试集上的分类准确率。

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

在训练期间收集的历史可以用于创建线图，该线图显示了在每个训练时期模型和测试集的损失和分类准确率，提供了学习曲线。

# plot loss during training
pyplot.subplot(211)
pyplot.title('Loss')
pyplot.plot(history.history['loss'], label='train')
pyplot.plot(history.history['val_loss'], label='test')
pyplot.legend()
# plot accuracy during training
pyplot.subplot(212)
pyplot.title('Accuracy')
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

下面的*summary _ model()*函数实现了这一点，将拟合模型、训练历史和数据集作为参数，打印模型表现，并创建模型学习曲线图。

# summarize the performance of the fit model
def summarize_model(model, history, trainX, trainy, testX, testy):
	# evaluate the model
	_, train_acc = model.evaluate(trainX, trainy, verbose=0)
	_, test_acc = model.evaluate(testX, testy, verbose=0)
	print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
	# plot loss during training
	pyplot.subplot(211)
	pyplot.title('Loss')
	pyplot.plot(history.history['loss'], label='train')
	pyplot.plot(history.history['val_loss'], label='test')
	pyplot.legend()
	# plot accuracy during training
	pyplot.subplot(212)
	pyplot.title('Accuracy')
	pyplot.plot(history.history['accuracy'], label='train')
	pyplot.plot(history.history['val_accuracy'], label='test')
	pyplot.legend()
	pyplot.show()

我们可以用拟合模型和准备好的数据调用这个函数。

# evaluate model behavior
summarize_model(model, history, trainX, trainy, testX, testy)

在运行结束时，我们可以将模型保存到文件中，以便以后加载它，并将其用作一些迁移学习实验的基础。

请注意，将模型保存到文件需要安装 h5py 库。该库可以通过 pip 安装，如下所示:

sudo pip install h5py

通过调用模型上的 save() 函数，可以保存拟合模型。

# save model to file
model.save('model.h5')

将这些元素结合在一起，下面列出了在问题 1 中拟合 MLP、总结模型表现并将模型保存到文件中的完整示例。

# fit mlp model on problem 1 and save model to file
from sklearn.datasets import make_blobs
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.utils import to_categorical
from matplotlib import pyplot

# prepare a blobs examples with a given random seed
def samples_for_seed(seed):
	# generate samples
	X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)
	# one hot encode output variable
	y = to_categorical(y)
	# split into train and test
	n_train = 500
	trainX, testX = X[:n_train, :], X[n_train:, :]
	trainy, testy = y[:n_train], y[n_train:]
	return trainX, trainy, testX, testy

# define and fit model on a training dataset
def fit_model(trainX, trainy, testX, testy):
	# define model
	model = Sequential()
	model.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(3, activation='softmax'))
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# fit model
	history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
	return model, history

# summarize the performance of the fit model
def summarize_model(model, history, trainX, trainy, testX, testy):
	# evaluate the model
	_, train_acc = model.evaluate(trainX, trainy, verbose=0)
	_, test_acc = model.evaluate(testX, testy, verbose=0)
	print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
	# plot loss during training
	pyplot.subplot(211)
	pyplot.title('Loss')
	pyplot.plot(history.history['loss'], label='train')
	pyplot.plot(history.history['val_loss'], label='test')
	pyplot.legend()
	# plot accuracy during training
	pyplot.subplot(212)
	pyplot.title('Accuracy')
	pyplot.plot(history.history['accuracy'], label='train')
	pyplot.plot(history.history['val_accuracy'], label='test')
	pyplot.legend()
	pyplot.show()

# prepare data
trainX, trainy, testX, testy = samples_for_seed(1)
# fit model on train dataset
model, history = fit_model(trainX, trainy, testX, testy)
# evaluate model behavior
summarize_model(model, history, trainX, trainy, testX, testy)
# save model to file
model.save('model.h5')

运行该示例适合并评估模型的表现，在列车和测试集上打印分类准确率。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到该模型在问题 1 上表现良好，在训练和测试数据集上都达到了大约 92%的分类准确率。

Train: 0.916, Test: 0.920

还创建了一个图形，总结了模型的学习曲线，显示了在每个训练时期结束时训练(蓝色)和测试(橙色)数据集上模型的损失(顶部)和准确率(底部)。

你的情节可能看起来不一样，但预计会表现出相同的一般行为。如果没有，尝试运行该示例几次。

在这种情况下，我们可以看到，该模型相当快地很好地学习了这个问题，可能在大约 40 个时代收敛，并在两个数据集上保持相当稳定。

Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP on Problem 1

问题 1 中 MLP 的训练和测试集的损失和准确率学习曲线

现在我们已经看到了如何为 blobs 问题 1 开发一个独立的 MLP，我们可以看看如何为问题 2 做同样的事情，它可以作为一个基线。

问题 2 的独立 MLP 模型

可以更新上一节中的示例，使 MLP 模型适合问题 2。

首先了解独立模型在问题 2 上的表现和学习动态非常重要，因为这将提供一个表现基线，可用于与使用迁移学习解决同一问题的模型进行比较。

需要进行一次更改，将对 samples_for_seed() 的调用更改为使用两个而不是一个的伪随机数发生器种子。

# prepare data
trainX, trainy, testX, testy = samples_for_seed(2)

为完整起见，下面列出了此更改的完整示例。

# fit mlp model on problem 2 and save model to file
from sklearn.datasets import make_blobs
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.utils import to_categorical
from matplotlib import pyplot

# prepare a blobs examples with a given random seed
def samples_for_seed(seed):
	# generate samples
	X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)
	# one hot encode output variable
	y = to_categorical(y)
	# split into train and test
	n_train = 500
	trainX, testX = X[:n_train, :], X[n_train:, :]
	trainy, testy = y[:n_train], y[n_train:]
	return trainX, trainy, testX, testy

# define and fit model on a training dataset
def fit_model(trainX, trainy, testX, testy):
	# define model
	model = Sequential()
	model.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(3, activation='softmax'))
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# fit model
	history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
	return model, history

# summarize the performance of the fit model
def summarize_model(model, history, trainX, trainy, testX, testy):
	# evaluate the model
	_, train_acc = model.evaluate(trainX, trainy, verbose=0)
	_, test_acc = model.evaluate(testX, testy, verbose=0)
	print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
	# plot loss during training
	pyplot.subplot(211)
	pyplot.title('Loss')
	pyplot.plot(history.history['loss'], label='train')
	pyplot.plot(history.history['val_loss'], label='test')
	pyplot.legend()
	# plot accuracy during training
	pyplot.subplot(212)
	pyplot.title('Accuracy')
	pyplot.plot(history.history['accuracy'], label='train')
	pyplot.plot(history.history['val_accuracy'], label='test')
	pyplot.legend()
	pyplot.show()

# prepare data
trainX, trainy, testX, testy = samples_for_seed(2)
# fit model on train dataset
model, history = fit_model(trainX, trainy, testX, testy)
# evaluate model behavior
summarize_model(model, history, trainX, trainy, testX, testy)

运行该示例适合并评估模型的表现，在列车和测试集上打印分类准确率。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到模型在问题 2 上表现良好，但不如在问题 1 上表现得好，在训练数据集和测试数据集上都达到了大约 79%的分类准确率。

Train: 0.794, Test: 0.794

还创建了一个图形来总结模型的学习曲线。你的情节可能看起来不一样，但预计会表现出相同的一般行为。如果没有，尝试运行该示例几次。

在这种情况下，我们可以看到模型收敛的速度比我们在上一节中看到的问题 1 要慢。这表明这个版本的问题可能会稍微更具挑战性，至少对于所选的型号配置来说是如此。

Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP on Problem 2

问题 2 中 MLP 的训练和测试集的损失和准确率学习曲线

现在我们已经有了 MLP 在问题 2 上的表现和学习动态的基线，我们可以看到在这个问题上转移学习的增加是如何影响 MLP 的。

问题 2 的迁移学习 MLP

可以加载适合问题 1 的模型，权重可以用作适合问题 2 的模型的初始权重。

这是一种转移学习，其中对不同但相关的问题的学习被用作一种权重初始化方案。

这需要更新 fit_model() 函数来加载模型，并将其重新调整到问题 2 的示例中。

保存在“model.h5”中的模型可以使用 load_model() Keras 函数加载。

# load model
model = load_model('model.h5')

一旦加载，模型就可以按照常规进行编译和拟合。

更新后的 fit_model() 有此变化，如下所示。

# load and re-fit model on a training dataset
def fit_model(trainX, trainy, testX, testy):
	# load model
	model = load_model('model.h5')
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# re-fit model
	history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
	return model, history

我们预期，使用来自模型的权重的模型适合于不同但相关的问题，以学习问题，可能在学习曲线方面更快，并且可能导致更低的泛化误差，尽管这些方面将取决于问题和模型的选择。

为完整起见，下面列出了此更改的完整示例。

# transfer learning with mlp model on problem 2
from sklearn.datasets import make_blobs
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.utils import to_categorical
from keras.models import load_model
from matplotlib import pyplot

# prepare a blobs examples with a given random seed
def samples_for_seed(seed):
	# generate samples
	X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)
	# one hot encode output variable
	y = to_categorical(y)
	# split into train and test
	n_train = 500
	trainX, testX = X[:n_train, :], X[n_train:, :]
	trainy, testy = y[:n_train], y[n_train:]
	return trainX, trainy, testX, testy

# load and re-fit model on a training dataset
def fit_model(trainX, trainy, testX, testy):
	# load model
	model = load_model('model.h5')
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# re-fit model
	history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=100, verbose=0)
	return model, history

# summarize the performance of the fit model
def summarize_model(model, history, trainX, trainy, testX, testy):
	# evaluate the model
	_, train_acc = model.evaluate(trainX, trainy, verbose=0)
	_, test_acc = model.evaluate(testX, testy, verbose=0)
	print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
	# plot loss during training
	pyplot.subplot(211)
	pyplot.title('Loss')
	pyplot.plot(history.history['loss'], label='train')
	pyplot.plot(history.history['val_loss'], label='test')
	pyplot.legend()
	# plot accuracy during training
	pyplot.subplot(212)
	pyplot.title('Accuracy')
	pyplot.plot(history.history['accuracy'], label='train')
	pyplot.plot(history.history['val_accuracy'], label='test')
	pyplot.legend()
	pyplot.show()

# prepare data
trainX, trainy, testX, testy = samples_for_seed(2)
# fit model on train dataset
model, history = fit_model(trainX, trainy, testX, testy)
# evaluate model behavior
summarize_model(model, history, trainX, trainy, testX, testy)

运行该示例适合并评估模型的表现，在列车和测试集上打印分类准确率。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到模型实现了较低的泛化误差，在问题 2 的测试数据集上实现了约 81%的准确率，而独立模型实现了约 79%的准确率。

Train: 0.786, Test: 0.810

还创建了一个图形来总结模型的学习曲线。你的情节可能看起来不一样，但预计会表现出相同的一般行为。如果没有，尝试运行该示例几次。

在这种情况下，我们可以看到模型看起来确实有类似的学习曲线，尽管我们确实看到测试集(橙色线)的学习曲线有明显的改进，无论是在更早(纪元 20 之前)的更好表现方面，还是在模型在训练集的表现之上。

Loss and Accuracy Learning Curves on the Train and Test Sets for an MLP With Transfer Learning on Problem 2

问题 2 中具有迁移学习的 MLP 的训练和测试集上的损失和准确率学习曲线

我们只研究了一个独立的 MLP 模式和一个有转移学习的 MLP。

神经网络算法是随机的，因此需要多次运行的平均表现来查看观察到的行为是真实的还是统计上的偶然。

问题 2 的模型比较

为了确定将迁移学习用于斑点多类分类问题是否具有实际效果，我们必须多次重复每个实验，并分析重复之间的平均表现。

我们将比较在问题 2 上训练的独立模型和使用转移学习的模型的表现，平均重复 30 次。

此外，我们将研究保持某些层中的权重固定是否会提高模型表现。

在问题 1 上训练的模型有两个隐藏层。通过保持第一或第一和第二隐藏层固定，具有不变权重的层将充当特征提取器，并且可以提供使得学习问题 2 更容易的特征，从而影响学习的速度和/或测试集上模型的准确性。

作为第一步，我们将简化 fit_model() 函数来拟合模型，并丢弃任何训练历史，这样我们就可以专注于训练模型的最终准确率。

# define and fit model on a training dataset
def fit_model(trainX, trainy):
	# define model
	model = Sequential()
	model.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(3, activation='softmax'))
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# fit model
	model.fit(trainX, trainy, epochs=100, verbose=0)
	return model

接下来，我们可以开发一个函数，该函数将在训练数据集上重复拟合问题 2 的新独立模型，并在测试集上评估准确性。

下面的 eval_standalone_model() 函数实现了这一点，将训练集和测试集作为参数以及重复次数，并返回测试数据集中模型的准确度分数列表。

# repeated evaluation of a standalone model
def eval_standalone_model(trainX, trainy, testX, testy, n_repeats):
	scores = list()
	for _ in range(n_repeats):
		# define and fit a new model on the train dataset
		model = fit_model(trainX, trainy)
		# evaluate model on test dataset
		_, test_acc = model.evaluate(testX, testy, verbose=0)
		scores.append(test_acc)
	return scores

总结从该函数返回的准确度分数的分布将给出所选独立模型在问题 2 上表现如何的想法。

# repeated evaluation of standalone model
standalone_scores = eval_standalone_model(trainX, trainy, testX, testy, n_repeats)
print('Standalone %.3f (%.3f)' % (mean(standalone_scores), std(standalone_scores)))

接下来，我们需要一个等价函数来评估使用迁移学习的模型。

在每个循环中，在问题 1 上训练的模型必须从文件中加载，适合问题 2 的训练数据集，然后在问题 2 的测试集上评估。

此外，我们将在加载的模型中配置 0、1 或 2 个隐藏层以保持固定。保持 0 个隐藏层固定意味着当学习问题 2 时，模型中的所有权重将被调整，使用转移学习作为权重初始化方案。然而，保持隐藏层的两个(2)固定意味着在训练期间仅模型的输出层将被调整，使用转移学习作为特征提取方法。

下面的 eval_transfer_model() 函数实现了这一点，将问题 2 的训练和测试数据集作为参数，加载的模型中隐藏层的数量保持不变，重复实验的次数也保持不变。

该函数返回一个测试准确性分数列表，总结这个分布将给出一个合理的想法，即具有所选类型的迁移学习的模型在问题 2 上的表现如何。

# repeated evaluation of a model with transfer learning
def eval_transfer_model(trainX, trainy, testX, testy, n_fixed, n_repeats):
	scores = list()
	for _ in range(n_repeats):
		# load model
		model = load_model('model.h5')
		# mark layer weights as fixed or not trainable
		for i in range(n_fixed):
			model.layers[i].trainable = False
		# re-compile model
		model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
		# fit model on train dataset
		model.fit(trainX, trainy, epochs=100, verbose=0)
		# evaluate model on test dataset
		_, test_acc = model.evaluate(testX, testy, verbose=0)
		scores.append(test_acc)
	return scores

我们可以重复调用这个函数，在一个循环中将 n_fixed 设置为 0、1、2，并在进行的同时总结表现；例如:

# repeated evaluation of transfer learning model, vary fixed layers
n_fixed = 3
for i in range(n_fixed):
	scores = eval_transfer_model(trainX, trainy, testX, testy, i, n_repeats)
	print('Transfer (fixed=%d) %.3f (%.3f)' % (i, mean(scores), std(scores)))

除了报告每个模型的平均值和标准差之外，我们还可以收集所有的分数，并创建一个方框和晶须图来总结和比较模型分数的分布。

将所有这些元素结合在一起，下面列出了完整的示例。

# compare standalone mlp model performance to transfer learning
from sklearn.datasets import make_blobs
from keras.layers import Dense
from keras.models import Sequential
from keras.optimizers import SGD
from keras.utils import to_categorical
from keras.models import load_model
from matplotlib import pyplot
from numpy import mean
from numpy import std

# prepare a blobs examples with a given random seed
def samples_for_seed(seed):
	# generate samples
	X, y = make_blobs(n_samples=1000, centers=3, n_features=2, cluster_std=2, random_state=seed)
	# one hot encode output variable
	y = to_categorical(y)
	# split into train and test
	n_train = 500
	trainX, testX = X[:n_train, :], X[n_train:, :]
	trainy, testy = y[:n_train], y[n_train:]
	return trainX, trainy, testX, testy

# define and fit model on a training dataset
def fit_model(trainX, trainy):
	# define model
	model = Sequential()
	model.add(Dense(5, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(5, activation='relu', kernel_initializer='he_uniform'))
	model.add(Dense(3, activation='softmax'))
	# compile model
	model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
	# fit model
	model.fit(trainX, trainy, epochs=100, verbose=0)
	return model

# repeated evaluation of a standalone model
def eval_standalone_model(trainX, trainy, testX, testy, n_repeats):
	scores = list()
	for _ in range(n_repeats):
		# define and fit a new model on the train dataset
		model = fit_model(trainX, trainy)
		# evaluate model on test dataset
		_, test_acc = model.evaluate(testX, testy, verbose=0)
		scores.append(test_acc)
	return scores

# repeated evaluation of a model with transfer learning
def eval_transfer_model(trainX, trainy, testX, testy, n_fixed, n_repeats):
	scores = list()
	for _ in range(n_repeats):
		# load model
		model = load_model('model.h5')
		# mark layer weights as fixed or not trainable
		for i in range(n_fixed):
			model.layers[i].trainable = False
		# re-compile model
		model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
		# fit model on train dataset
		model.fit(trainX, trainy, epochs=100, verbose=0)
		# evaluate model on test dataset
		_, test_acc = model.evaluate(testX, testy, verbose=0)
		scores.append(test_acc)
	return scores

# prepare data for problem 2
trainX, trainy, testX, testy = samples_for_seed(2)
n_repeats = 30
dists, dist_labels = list(), list()

# repeated evaluation of standalone model
standalone_scores = eval_standalone_model(trainX, trainy, testX, testy, n_repeats)
print('Standalone %.3f (%.3f)' % (mean(standalone_scores), std(standalone_scores)))
dists.append(standalone_scores)
dist_labels.append('standalone')

# repeated evaluation of transfer learning model, vary fixed layers
n_fixed = 3
for i in range(n_fixed):
	scores = eval_transfer_model(trainX, trainy, testX, testy, i, n_repeats)
	print('Transfer (fixed=%d) %.3f (%.3f)' % (i, mean(scores), std(scores)))
	dists.append(scores)
	dist_labels.append('transfer f='+str(i))

# box and whisker plot of score distributions
pyplot.boxplot(dists, labels=dist_labels)
pyplot.show()

运行该示例首先报告每个模型的测试数据集中分类准确率的平均值和标准偏差。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到独立模型在问题 2 上达到了大约 78%的准确率，并且有 10%的大标准偏差。相比之下，我们可以看到所有迁移学习模型的传播范围要小得多，从大约 0.05%到 1.5%不等。

测试准确度分数的标准偏差的差异显示了转移学习可以给模型带来的稳定性，减少了通过随机学习算法引入的最终模型的表现差异。

比较模型的平均测试准确率，我们可以看到，使用模型作为权重初始化方案(固定=0)的转移学习比独立模型的表现更好，准确率约为 80%。

保持所有隐藏层固定(固定=2)并将其用作特征提取方案，平均而言会导致比独立模型更差的表现。这表明这种方法在这种情况下限制太多。

有趣的是，当第一个隐藏层保持固定(固定=1)并且第二个隐藏层以大约 81%的测试分类准确率适应问题时，我们看到了最佳表现。这表明，在这种情况下，问题受益于迁移学习的特征提取和权重初始化属性。

看到最后一种方法的结果如何与第二个隐藏层(可能还有输出层)的权重用随机数重新初始化的相同模型进行比较可能会很有趣。这种比较将证明单独的转移学习的特征提取属性或者特征提取和权重初始化属性都是有益的。

Standalone 0.787 (0.101)
Transfer (fixed=0) 0.805 (0.004)
Transfer (fixed=1) 0.817 (0.005)
Transfer (fixed=2) 0.750 (0.014)

创建了一个图形，显示了四个方框图和触须图。方框显示每个数据分布的中间 50%，橙色线显示中间值，圆点显示异常值。

独立模型的箱线图显示了许多异常值，表明平均而言，模型表现良好，但也有可能表现非常差。

相反，我们看到具有迁移学习的模型的行为更稳定，表现出更紧密的表现分布。

Box and Whisker Plot Comparing Standalone and Transfer Learning Models via Test Set Accuracy on the Blobs Multiclass Classification Problem

Blobs 多类分类问题中通过测试集准确率比较独立和迁移学习模型的盒须图

扩展ˌ扩张

本节列出了一些您可能希望探索的扩展教程的想法。

反向实验。为问题 2 训练并保存一个模型，看看当它用于问题 1 的迁移学习时是否有帮助。
添加隐藏层。更新示例以保持两个隐藏层都固定，但是在输出层之前的固定层之后添加一个新的隐藏层，该隐藏层具有随机初始化的权重，并比较表现。
随机初始化层。更新示例，随机初始化第二个隐藏层和输出层的权重，并比较表现。

如果你探索这些扩展，我很想知道。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

邮件

报纸

书

第 15.2 节迁移学习和领域适应，深度学习，2016。

文章

迁移学习，维基百科

摘要

在本教程中，您发现了如何使用迁移学习来提高带有 Keras 的 Python 中深度学习神经网络的表现。

具体来说，您了解到:

转移学习是一种重用在相关预测建模问题上训练的模型的方法。
作为一种权重初始化方案或特征提取方法，转移学习可以用来加速神经网络的训练。
如何利用迁移学习提高多类分类问题的 MLP 表现？

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何利用 Keras 中的活动正则化减少泛化误差

原文：machinelearningmastery.com/how-to-redu…

最后更新于 2020 年 8 月 25 日

活动正则化提供了一种方法来鼓励神经网络学习原始观察的稀疏特征或内部表示。

在自动编码器(称为稀疏自动编码器)和编码器-解码器模型中寻找稀疏的学习表示是很常见的，尽管这种方法通常也可以用于减少过拟合并提高模型将推广到新观测值的能力。

在本教程中，您将发现用于向深度学习神经网络模型添加活动正则化的 Keras API。

完成本教程后，您将知道:

如何使用 Keras API 创建向量范数正则化。
如何使用 Keras API 为 MLP、CNN 和 RNN 层添加活动正则化。
如何通过向现有模型添加活动正则化来减少过拟合？

用我的新书更好的深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

2019 年 10 月更新:针对 Keras 2.3 和 TensorFlow 2.0 更新。

How to Reduce Generalization Error in Deep Neural Networks With Activity Regularization in Keras

如何利用 Keras 中的活动正则化减少深度神经网络中的泛化误差。

教程概述

本教程分为三个部分；它们是:

Keras 的活动正则化
层上的活动正则化
活动正规化案例研究

Keras 的活动正则化

Keras 支持活动正规化。

支持三种不同的正则化技术，每种技术在keras . regulators模块中作为一个类提供:

l1 :活跃度计算为绝对值之和。
l2 :活跃度计算为平方值之和。
l1_l2 :活跃度计算为绝对值之和与平方值之和。

每一个 l1 和 l2 正则化器都使用一个超参数来控制每个活动对总和的贡献量。 l1_l2 正则化器获取两个超参数，l1 和 l2 方法各一个。

正则化类必须被导入，然后被实例化；例如:

# import regularizer
from keras.regularizers import l1
# instantiate regularizer
reg = l1(0.001)

层上的活动正则化

活动正则化是在 Keras 中的一个层上指定的。

这可以通过将层上的activity _ regulator参数设置为实例化和配置的正则化类来实现。

正则化应用于层的输出，但是您可以控制层的“输出”的实际含义。具体来说，您可以灵活选择层输出是指在激活功能之前还是之后应用正则化。

例如，可以在层上指定函数和正则化，在这种情况下，激活正则化应用于激活函数的输出，在这种情况下，校正线性激活函数或 ReLU 。

...
model.add(Dense(32, activation='relu', activity_regularizer=l1(0.001)))
...

或者，您可以指定线性激活函数(默认值，不执行任何转换)，这意味着激活正则化应用于原始输出，然后，激活函数可以作为后续层添加。

...
model.add(Dense(32, activation='linear', activity_regularizer=l1(0.001)))
model.add(Activation('relu'))
...

后者可能是激活正则化的首选用法，如“深度稀疏整流神经网络”中所述，以便允许模型结合整流线性激活函数学习将激活取为真零值。然而，激活正则化的两种可能的用途可能会被探索，以便发现什么最适合您的特定模型和数据集。

让我们来看看活动正则化如何与一些常见的层类型一起使用。

MLP 活动正规化

下面的示例在稠密全连通层上设置 l1 范数活动正则化。

# example of l1 norm on activity from a dense layer
from keras.layers import Dense
from keras.regularizers import l1
...
model.add(Dense(32, activity_regularizer=l1(0.001)))
...

美国有线电视新闻网活动正规化

以下示例在 Conv2D 卷积层上设置 l1 范数活动正则化。

# example of l1 norm on activity from a cnn layer
from keras.layers import Conv2D
from keras.regularizers import l1
...
model.add(Conv2D(32, (3,3), activity_regularizer=l1(0.001)))
...

RNN 活动正规化

以下示例在 LSTM 递归层上设置 l1 范数活动正则化。

# example of l1 norm on activity from an lstm layer
from keras.layers import LSTM
from keras.regularizers import l1
...
model.add(LSTM(32, activity_regularizer=l1(0.001)))
...

现在我们知道如何使用活动正则化应用编程接口，让我们看一个工作示例。

活动正规化案例研究

在本节中，我们将演示如何使用活动正则化来减少简单二分类问题上 MLP 的过拟合。

虽然活动正则化最常用于鼓励自动编码器和编码器-解码器模型中的稀疏学习表示，但它也可以直接在普通神经网络中使用，以实现相同的效果并提高模型的泛化能力。

此示例提供了一个模板，用于将活动正则化应用于您自己的神经网络，以解决分类和回归问题。

二分类问题

我们将使用一个标准的二分类问题，它定义了两个观察值的二维同心圆，每个类一个圆。

每个观察都有两个相同规模的输入变量和一个 0 或 1 的类输出值。该数据集被称为“圆”数据集，这是因为绘制时每个类中观测值的形状。

我们可以使用 make_circles()函数从这个问题中生成观察值。我们将向数据中添加噪声，并为随机数生成器播种，这样每次运行代码时都会生成相同的样本。

# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)

我们可以绘制数据集，其中两个变量作为图形上的 x 和 y 坐标，类值作为观察的颜色。

下面列出了生成数据集并绘制它的完整示例。

# generate two circles dataset
from sklearn.datasets import make_circles
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()

运行该示例会创建一个散点图，显示每个类中观察值的同心圆形状。

我们可以看到点扩散的噪音使得圆圈不那么明显。

Scatter Plot of Circles Dataset with Color Showing the Class Value of Each Sample

带有显示每个样本类别值的颜色的圆形数据集散点图

这是一个很好的测试问题，因为类不能用一条线分开，例如不能线性分开，需要一个非线性的方法，如神经网络来解决。

我们只生成了 100 个样本，这对于神经网络来说是很小的，这提供了对训练数据集进行过度训练的机会，并且在测试数据集上具有更高的误差:这是使用正则化的一个很好的例子。

此外，样本有噪声，这使得模型有机会学习样本中不一般化的方面。

过采样多层感知器

我们可以开发一个 MLP 模型来解决这个二分类问题。

该模型将有一个隐藏层，其中可能需要更多的节点来解决这个问题，从而提供了一个过度填充的机会。我们还将对模型进行比要求更长时间的训练，以确保模型溢出。

在定义模型之前，我们将把数据集分成训练集和测试集，用 30 个例子训练模型，用 70 个例子评估拟合模型的表现。

# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]

接下来，我们可以定义模型。

隐藏层使用 500 个节点和校正的线性激活函数。输出层使用 sigmoid 激活函数来预测类值 0 或 1。

该模型采用二元交叉熵损失函数进行优化，适用于二分类问题和高效的 Adam 版本梯度下降。

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

然后，定义的模型适用于 4000 个时期的训练数据，默认批量为 32。

我们还将使用测试数据集作为验证数据集。

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

我们可以在测试数据集上评估模型的表现并报告结果。

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

最后，我们将绘制模型在每个时期的列车和测试集上的表现。

如果模型确实过度训练了训练数据集，那么随着模型学习训练数据集中的统计噪声，我们将期望训练集上的准确率线图继续增加，并且测试集上升，然后再次下降。

# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

我们可以将所有这些部分绑在一起，完整的例子如下。

# mlp overfit on the two circles dataset
from sklearn.datasets import make_circles
from keras.layers import Dense
from keras.models import Sequential
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

我们可以看到，该模型在训练数据集上的表现优于测试数据集，这可能是过拟合的一个迹象。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

因为模型被严重过拟合，我们通常不会期望模型在同一数据集上重复运行时的准确率有太大差异。

Train: 1.000, Test: 0.786

创建一个图形，显示列车和测试集上模型准确率的线图。

我们可以看到 overfit 模型的预期形状，其中测试准确率增加到一个点，然后开始再次降低。

Line Plots of Accuracy on Train and Test Datasets While Training Showing an Overfit

训练时训练和测试数据集上的准确率线图显示出过拟合

激活正则化的过采样 MLP

我们可以更新示例以使用激活正则化。

有几种不同的正则化方法可供选择，但使用最常见的可能是个好主意，这就是 L1 向量范数。

该正则化具有鼓励稀疏表示(大量零)的效果，这由允许真零值的校正线性激活函数支持。

我们可以通过使用 keras 中的 keras .正则化子. l1 类来做到这一点。

我们将配置层使用线性激活函数，以便我们可以正则化原始输出，然后在层的正则化输出后添加 relu 激活层。我们将正则化超参数设置为 1E-4 或 0.0001，稍微试错一下就发现了。

model.add(Dense(500, input_dim=2, activation='linear', activity_regularizer=l1(0.0001)))
model.add(Activation('relu'))

下面列出了具有 L1 范数约束的完整更新示例:

# mlp overfit on the two circles dataset with activation regularization
from sklearn.datasets import make_circles
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import l1
from keras.layers import Activation
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='linear', activity_regularizer=l1(0.0001)))
model.add(Activation('relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到，活动正则化导致训练数据集的准确率从 100%下降到 96%，测试集的准确率从 78%提升到 82%。

Train: 0.967, Test: 0.829

回顾训练和测试准确率的线图，我们可以看到模型似乎不再过拟合训练数据集。

列车和测试集上的模型准确率继续增加到平稳状态。

Line Plots of Accuracy on Train and Test Datasets While Training With Activity Regularization

活动正则化训练时训练和测试数据集上的准确率线图

为了完整起见，我们可以将结果与模型的一个版本进行比较，其中在 relu 激活函数之后应用了活动正则化。

model.add(Dense(500, input_dim=2, activation='relu', activity_regularizer=l1(0.0001)))

下面列出了完整的示例。

# mlp overfit on the two circles dataset with activation regularization
from sklearn.datasets import make_circles
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import l1
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', activity_regularizer=l1(0.0001)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到，至少在这个问题上以及用这个模型，激活函数后的激活正则化并没有改善泛化误差；事实上，这让事情变得更糟。

Train: 1.000, Test: 0.743

回顾训练和测试准确率的线图，我们可以看到模型确实仍然显示出过度训练数据集的迹象。

Line Plots of Accuracy on Train and Test Datasets While Training With Activity Regularization, Still Overfit

训练和测试数据集上的准确率线图，同时使用活动正则化进行训练，仍然过度训练

这表明，用您自己的数据集实现活动正则化的两种方法都值得尝试，以确认您充分利用了这种方法。

扩展ˌ扩张

本节列出了一些您可能希望探索的扩展教程的想法。

报告激活平均值。更新示例以计算正则层的平均激活，并确认激活确实变得更加稀疏。
网格搜索。更新示例以网格搜索正则化超参数的不同值。
替代定额。更新示例以评估 L2 或 layer 向量范数，从而调整隐藏层输出。
重复评估。更新示例以多次拟合和评估模型，并报告模型表现的平均值和标准差。

如果你探索这些扩展，我很想知道。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

邮件

应用程序接口

摘要

在本教程中，您发现了用于向深度学习神经网络模型添加活动正则化的 Keras API。

具体来说，您了解到:

如何使用 Keras API 创建向量范数正则化。
如何使用 Keras API 为 MLP、CNN 和 RNN 层添加活动正则化。
如何通过向现有模型添加活动正则化来减少过拟合？

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何在 Keras 中利用权重衰减减少神经网络的过拟合

原文：machinelearningmastery.com/how-to-redu…

最后更新于 2020 年 8 月 25 日

权重正则化提供了一种方法来减少深度学习神经网络模型对训练数据的过拟合，并提高模型对新数据(如保持测试集)的表现。

权重正则化有多种类型，如 L1 和 L2 向量范数，每种都需要一个必须配置的超参数。

在本教程中，您将发现如何使用 Keras 在 Python 中应用权重正则化来提高过拟合深度学习神经网络的表现。

完成本教程后，您将知道:

如何使用 Keras API 为 MLP、CNN 或 LSTM 神经网络添加权重正则化。
书籍和近期研究论文中使用的权重正则化配置示例。
如何通过一个案例研究来识别 overfit 模型并使用权重正则化来提高测试表现。

用我的新书更好的深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

2019 年 10 月更新:针对 Keras 2.3 和 TensorFlow 2.0 更新。

How to Reduce Overfitting in Deep Learning With Weight Regularization

如何通过权重调整减少深度学习中的过拟合图片由海阿米尔姆提供，保留部分权利。

教程概述

本教程分为三个部分；它们是:

Keras 中的权重正则化
权重正则化示例
权重正则化案例研究

Keras 中的权重正则化应用编程接口

Keras 提供了一个权重调整 API，允许你在损失函数中加入权重大小的惩罚。

提供了三个不同的正则化实例；它们是:

L1:绝对重量的总和。
L2:重量平方的总和。
L1L2:绝对权重和平方权重之和。

正则项在 keras .正则项下提供，名称为 l1、l2 和 l1_l2。每一个都以正则超参数为自变量。例如:

keras.regularizers.l1(0.01)
keras.regularizers.l2(0.01)
keras.regularizers.l1_l2(l1=0.01, l2=0.01)

默认情况下，任何层都不使用正则化。

当层在 Keras 模型中定义时，权重正则化器可以被添加到每个层。

这是通过在每一层设置 kernel _ regularizer 参数来实现的。通过 bias _ regularizer 参数，也可以为偏差使用单独的正则化器，尽管这种方法不常用。

让我们看一些例子。

致密层的权重正则化

以下示例在密集全连通层上设置 l2 正则化:

# example of l2 on a dense layer
from keras.layers import Dense
from keras.regularizers import l2
...
model.add(Dense(32, kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

卷积层的权重正则化

像密集层一样，卷积层(例如，Conv1D 和 Conv2D)也使用核正则化和偏置正则化参数来定义正则化。

以下示例在 Conv2D 卷积层上设置 l2 正则化:

# example of l2 on a convolutional layer
from keras.layers import Conv2D
from keras.regularizers import l2
...
model.add(Conv2D(32, (3,3), kernel_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

递归层的权重正则化

像 LSTM 这样的循环层在调整权重方面提供了更大的灵活性。

输入、递归和偏差权重都可以通过核正则化、递归正则化和偏差正则化参数分别正则化。

以下示例在 LSTM 递归层上设置 l2 正则化:

# example of l2 on an lstm layer
from keras.layers import LSTM
from keras.regularizers import l2
...
model.add(LSTM(32, kernel_regularizer=l2(0.01), recurrent_regularizer=l2(0.01), bias_regularizer=l2(0.01)))
...

权重正则化示例

查看文献中报告的一些权重正则化配置的示例可能会有所帮助。

选择和调整特定于您的网络和数据集的正则化技术非常重要，尽管实际示例也可以提供可能是有用的起点的常见配置的概念。

回想一下，0.1 可以用科学符号写成 1E-1 或 1e-1，也可以写成指数 10^-1，0.01 可以写成 1e-2 或 10^-2 等等。

MLP 权重正则化示例

权重正则化借用了统计学中的惩罚回归模型。

最常见的正则化类型是 L2，也简称为“权重衰减”，其值通常在 0 到 0.1 之间的对数标度上，例如 0.1、0.001、0.0001 等。

λ[正则化超参数]的合理值在 0 和 0.1 之间。

—第 144 页，应用预测建模，2013 年。

多层感知器上的经典文本“神经锻造:前馈人工神经网络中的监督学习””提供了一个工作示例，通过首先在没有任何正则化的情况下训练模型，然后稳步增加惩罚来演示权重衰减的影响。他们用图表证明了权重衰减具有改善最终决策函数的效果。

……net 被训练………,权重衰减在 1200 个时期从 0 增加到 1E-5，在 2500 个时期增加到 1E-4，在 400 个时期增加到 1E-3。[……]表面更平滑，过渡更平缓

—第 270 页，神经锻造:前馈人工神经网络中的监督学习，1999。

这是一个有趣的过程，可能值得研究。作者还评论了预测重量衰减对问题影响的难度。

……很难提前预测需要什么价值才能达到预期的结果。0.001 的值是任意选择的，因为它是一个典型的引用的整数

—第 270 页，神经锻造:前馈人工神经网络中的监督学习，1999。

美国有线电视新闻网权重正则化示例

权重正则化似乎没有在 CNN 模型中广泛使用，或者如果使用，它的使用也没有被广泛报道。

具有非常小的正则化超参数(例如 0.0005 或 5×10^−4)的 L2 权重正则化可能是一个很好的起点。

多伦多大学的 Alex Krizhevsky 等人在他们 2012 年发表的题为“使用深度卷积神经网络进行图像网络分类”的论文中，为图像网络数据集开发了深度 CNN 模型，实现了当时最先进的结果报告:

…重量衰减 0.0005。我们发现这少量的重量衰减对模型学习很重要。换句话说，这里的权重衰减不仅仅是一个正则化:它减少了模型的训练误差。

牛津大学的卡伦·西蒙扬和安德鲁·齐塞曼在他们 2015 年发表的题为“用于大规模图像识别的非常深的卷积网络”的论文中，为 ImageNet 数据集开发了一个 CNN，并报告:

通过权重衰减(L2 惩罚乘数设置为 5×10^−4)来调整训练

谷歌的 Francois Chollet(也是 Keras 的作者)在 2016 年发表的题为“T0”异常:深度可分离卷积的深度学习的论文中报告了来自谷歌的 Inception V3 CNN 模型的权重衰减(从 Inception V3 论文中不清楚)以及他对 ImageNet 数据集的改进异常中使用的权重衰减:

Inception V3 模型使用 4e-5 的权重衰减(L2 正则化)率，这是针对 ImageNet 上的表现精心调整的。我们发现这个速率对于异常来说是非常次优的，而是稳定在 1e-5。

LSTM 权重正则化示例

LSTM 模型通常使用权重正则化。

经常使用的配置是 L2(权重衰减)和非常小的超参数(例如 10^−6).经常没有报告什么权重是正则化的(输入、递归和/或偏差)，尽管人们会假设输入和递归权重都是正则化的。

来自谷歌大脑的 Gabriel Pereyra 等人在 2017 年发表的题为“通过惩罚自信的输出分布来规范神经网络”的论文中应用 seq2seq LSTMs 模型来预测《华尔街日报》的人物，并报告:

所有模型都使用 10^−6 的重量衰减

来自 Google Brain 的 Barret Zoph 和 Quoc Le 在 2017 年发表的题为“使用强化学习的神经架构搜索”的论文中提到，“使用 LSTMs 和强化学习来学习网络架构，以最好地解决 CIFAR-10 数据集并报告:

1e-4 的重量衰减

来自 Google Brain 和 Nvidia 的 Ron Weiss 等人在他们 2017 年发表的题为“序列到序列模型可以直接翻译外来语音”的论文中开发了一个用于语音翻译的序列到序列 LSTM，并报告:

L2 权重衰减用于 10^−6 权重

权重正则化案例研究

在本节中，我们将演示如何使用权重正则化来减少简单二分类问题上 MLP 的过拟合。

此示例提供了一个模板，用于将权重正则化应用于您自己的神经网络，以解决分类和回归问题。

二分类问题

我们将使用一个标准的二进制分类问题，它定义了两个半圆的观测值:每个类一个半圆。

每个观察都有两个相同规模的输入变量和一个 0 或 1 的类输出值。这个数据集被称为“卫星”数据集，因为绘制时每个类别中观测值的形状。

我们可以使用 make_moons()函数从这个问题中生成观测值。我们将向数据中添加噪声，并为随机数生成器播种，这样每次运行代码时都会生成相同的样本。

# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)

我们可以绘制数据集，其中两个变量作为图形上的 x 和 y 坐标，类值作为观察的颜色。

下面列出了生成数据集并绘制它的完整示例。

# generate two moons dataset
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()

运行该示例会创建一个散点图，显示每个类别中观测值的半圆或月亮形状。我们可以看到点扩散的噪音使得卫星不那么明显。

Scatter Plot of Moons Dataset With Color Showing the Class Value of Each Sample

卫星数据集散点图，带有显示每个样本类别值的颜色

这是一个很好的测试问题，因为类不能用一条线分开，例如不能线性分开，需要一个非线性的方法，如神经网络来解决。

我们只生成了 100 个样本，这对于神经网络来说是很小的，这提供了对训练数据集进行过度训练的机会，并且在测试数据集上具有更高的误差:这是使用正则化的一个很好的例子。此外，样本有噪声，这使得模型有机会学习样本中不一般化的方面。

过采样多层感知器模型

我们可以开发一个 MLP 模型来解决这个二分类问题。

在定义模型之前，我们将把数据集分成训练集和测试集，用 30 个例子训练模型，用 70 个例子评估拟合模型的表现。

# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]

接下来，我们可以定义模型。

该模型使用隐藏层中的 500 个节点和校正后的线性激活函数。

输出层使用 sigmoid 激活函数来预测类值 0 或 1。

该模型使用二元交叉熵损失函数进行优化，适用于二分类问题和高效的 Adam 版本梯度下降。

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

然后，定义的模型适用于 4000 个时期的训练数据，默认批量为 32。

# fit model
model.fit(trainX, trainy, epochs=4000, verbose=0)

最后，我们可以在测试数据集上评估模型的表现并报告结果。

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

我们可以把所有这些部分绑在一起；下面列出了完整的示例。

# overfit mlp for the moons dataset
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
model.fit(trainX, trainy, epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

运行该示例会报告列车和测试数据集上的模型表现。

我们可以看到，该模型在训练数据集上的表现优于测试数据集，这可能是过拟合的一个迹象。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

因为模型被严重过拟合，我们通常不会期望模型在同一数据集上重复运行时的准确率有太大差异。

Train: 1.000, Test: 0.914

过拟合的另一个标志是训练和测试数据集的模型学习曲线图。

overfit 模型应该在训练和测试中显示准确率增加，并且在某一点上，准确率在测试数据集中下降，但在训练数据集中继续上升。

我们可以更新示例来绘制这些曲线。下面列出了完整的示例。

# overfit mlp for the moons dataset plotting history
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# plot history
# summarize history for accuracy
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会在列车和测试集上创建模型准确率的线图。

我们可以看到过拟合模型的预期形状，其中测试准确率增加到一个点，然后开始再次降低。

Line Plots of Accuracy on Train and Test Datasets While Training

训练时训练和测试数据集的准确率线图

带权重正则化的 MLP 模型

我们可以给隐藏层增加权重正则化，以减少模型对训练数据集的过拟合，并提高保持集的表现。

我们将使用 L2 向量范数，也称为权重衰减，正则化参数(称为α或λ)为 0.001，可以任意选择。

这可以通过向层添加 kernel _ regularizer 参数并将其设置为 l2 实例来实现。

model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(0.001)))

下面列出了使用权重正则化在 moons 数据集上拟合和评估模型的更新示例。

# mlp with weight regularization for the moons dataset
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import l2
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(0.001)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
model.fit(trainX, trainy, epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

运行该示例会报告模型在列车和测试数据集上的表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到训练数据集的准确性没有变化，测试数据集有所改进。

Train: 1.000, Test: 0.943

我们预计，过拟合的指示性学习曲线也会通过使用权重正则化而改变。

我们不应该看到模型在测试集上的准确性不断增加，然后又下降，而应该看到它在训练过程中不断上升。

下面列出了拟合模型、绘制训练和测试学习曲线的完整示例。

# mlp with weight regularization for the moons dataset plotting history
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import l2
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(0.001)))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# plot history
# summarize history for accuracy
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会创建列车的线形图，并在训练期间测试每个时期的模型准确率。

不出所料，我们看到测试数据集上的学习曲线先上升后平稳，这表明模型可能没有过度训练训练数据集。

Line Plots of Accuracy on Train and Test Datasets While Training Without Overfitting

在没有过拟合的情况下训练时训练和测试数据集上的准确率线图

网格搜索正则化超参数

一旦您可以确认权重正则化可以改进您的 overfit 模型，您就可以测试正则化参数的不同值。

一个好的做法是首先通过 0.0 到 0.1 之间的数量级进行网格搜索，然后一旦找到一个级别，就在该级别上进行网格搜索。

我们可以通过定义要测试的值，循环遍历每个值并记录训练和测试表现，在数量级中进行网格搜索。

...
# grid search values
values = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]
all_train, all_test = list(), list()
for param in values:
	...
	model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(param)))
	...
	all_train.append(train_acc)
	all_test.append(test_acc)

一旦我们有了所有的值，我们就可以将结果绘制成线图，以帮助发现列车配置中的任何模式，并测试准确性。

因为参数跳跃数量级(10 的幂)，我们可以使用对数标度创建结果的线图。Matplotlib 库允许通过semi ogx()函数进行此操作。例如:

pyplot.semilogx(values, all_train, label='train', marker='o')
pyplot.semilogx(values, all_test, label='test', marker='o')

下面列出了月球数据集上网格搜索权重正则化值的完整示例。

# grid search regularization values for moons dataset
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from keras.regularizers import l2
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# grid search values
values = [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]
all_train, all_test = list(), list()
for param in values:
	# define model
	model = Sequential()
	model.add(Dense(500, input_dim=2, activation='relu', kernel_regularizer=l2(param)))
	model.add(Dense(1, activation='sigmoid'))
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	# fit model
	model.fit(trainX, trainy, epochs=4000, verbose=0)
	# evaluate the model
	_, train_acc = model.evaluate(trainX, trainy, verbose=0)
	_, test_acc = model.evaluate(testX, testy, verbose=0)
	print('Param: %f, Train: %.3f, Test: %.3f' % (param, train_acc, test_acc))
	all_train.append(train_acc)
	all_test.append(test_acc)
# plot train and test means
pyplot.semilogx(values, all_train, label='train', marker='o')
pyplot.semilogx(values, all_test, label='test', marker='o')
pyplot.legend()
pyplot.show()

运行该示例会打印每个评估模型在列车和测试集上的参数值和准确率。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

结果表明，0.01 或 0.001 可能就足够了，并且可以为进一步的网格搜索提供良好的界限。

Param: 0.100000, Train: 0.967, Test: 0.829
Param: 0.010000, Train: 1.000, Test: 0.943
Param: 0.001000, Train: 1.000, Test: 0.943
Param: 0.000100, Train: 1.000, Test: 0.929
Param: 0.000010, Train: 1.000, Test: 0.929
Param: 0.000001, Train: 1.000, Test: 0.914

还创建了结果的线图，显示了随着更大的权重正则化参数值，测试准确率的增加，至少在一点上。

我们可以看到，使用最大值 0.1 会导致列车和测试准确率大幅下降。

Line Plot of Model Accuracy on Train and Test Datasets With Different Weight Regularization Parameters

不同权重正则化参数下训练和测试数据集模型准确率的线图

扩展ˌ扩张

本节列出了一些您可能希望探索的扩展教程的想法。

尝试候补。更新示例以使用 L1 或组合 L1L2 方法代替 L2 正则化。
报告权重定额。更新示例以计算网络权重的大小，并证明正则化确实使大小变小。
正则化输出层。更新示例以调整模型的输出层并比较结果。
规范偏差。更新示例以调整偏差权重并比较结果。
重复模型评估。更新示例以多次拟合和评估模型，并报告模型表现的平均值和标准差。
沿数量级的网格搜索。将网格搜索示例更新为在表现最佳的参数值数量级内进行网格搜索。
模型的重复正则化。创建一个新的示例，继续训练具有不断增加的正则化级别的拟合模型(例如 1E-6、1E-5 等)。)并查看它是否会在测试集上产生表现更好的模型。

如果你探索这些扩展，我很想知道。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

邮件

应用程序接口

摘要

在本教程中，您发现了如何使用 Keras 在 Python 中应用权重正则化来提高过拟合深度学习神经网络的表现。

具体来说，您了解到:

如何使用 Keras API 为 MLP、CNN 或 LSTM 神经网络添加权重正则化。
书籍和近期研究论文中使用的权重正则化配置示例。
如何通过一个案例研究来识别 overfit 模型并使用权重正则化来提高测试表现。

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何在 Keras 中利用权重约束减少过拟合

原文：machinelearningmastery.com/how-to-redu…

最后更新于 2020 年 8 月 25 日

权重约束提供了一种方法来减少深度学习神经网络模型对训练数据的过拟合，并提高模型对新数据(如保持测试集)的表现。

权重约束有多种类型，如最大和单位向量范数，有些需要必须配置的超参数。

在本教程中，您将发现用于向深度学习神经网络模型添加权重约束以减少过拟合的 Keras API。

完成本教程后，您将知道:

如何使用 Keras API 创建向量范数约束。
如何使用 Keras API 为 MLP、CNN 和 RNN 层添加权重约束。
如何通过向现有模型添加权重约束来减少过拟合？

用我的新书更好的深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

2019 年 3 月更新:修正了一些用法示例中使用等式代替赋值的错别字。
2019 年 10 月更新:针对 Keras 2.3 和 TensorFlow 2.0 更新。

How to Reduce Overfitting in Deep Neural Networks With Weight Constraints in Keras

如何减少深度神经网络中的过拟合在 Keras 中有权重约束。

教程概述

本教程分为三个部分；它们是:

Keras 的权重限制
层的权重约束
权重约束案例研究

Keras 的权重限制

Keras API 支持重量限制。

约束是按层指定的，但在层中按节点应用和实现。

使用约束通常包括为输入权重在层上设置内核约束参数，为偏置权重设置偏置约束。

通常，权重约束不用于偏置权重。

一组不同的向量规范可以用作约束，在 keras.constraints 模块中作为类提供。它们是:

最大范数 ( 最大 _ 范数)，强制砝码的量值等于或低于给定限值。
非负范数 ( 非负，强制权重为正。
单位定额 ( 单位 _ 定额)，强制砝码的量值为 1.0。
最小-最大范数 ( 最小 _ 最大 _ 范数)，强制权重的大小在一个范围内。

例如，可以导入和实例化约束:

# import norm
from keras.constraints import max_norm
# instantiate norm
norm = max_norm(3.0)

层的权重约束

重量标准可以用于 Keras 的大多数层。

在本节中，我们将看一些常见的例子。

MLP 权重约束

以下示例在密集完全连接层上设置最大范数权重约束。

# example of max norm on a dense layer
from keras.layers import Dense
from keras.constraints import max_norm
...
model.add(Dense(32, kernel_constraint=max_norm(3), bias_constraint=max_norm(3)))
...

美国有线电视新闻网重量限制

下面的示例在卷积层上设置了最大范数权重约束。

# example of max norm on a cnn layer
from keras.layers import Conv2D
from keras.constraints import max_norm
...
model.add(Conv2D(32, (3,3), kernel_constraint=max_norm(3), bias_constraint=max_norm(3)))
...

RNN 权重约束

与其他层类型不同，递归神经网络允许您对输入权重和偏差以及递归输入权重设置权重约束。

循环权重的约束是通过层的循环约束参数设置的。

以下示例在 LSTM 层上设置了最大标准权重约束。

# example of max norm on an lstm layer
from keras.layers import LSTM
from keras.constraints import max_norm
...
model.add(LSTM(32, kernel_constraint=max_norm(3), recurrent_constraint=max_norm(3), bias_constraint=max_norm(3)))
...

既然我们知道了如何使用权重约束应用编程接口，让我们来看看一个成功的例子。

权重约束案例研究

在本节中，我们将演示如何使用权重约束来减少简单二分类问题上 MLP 的过拟合。

此示例提供了一个模板，用于将权重约束应用于您自己的神经网络，以解决分类和回归问题。

二分类问题

我们将使用一个标准的二进制分类问题，它定义了两个半圆的观测值，每个类一个半圆。

每个观察都有两个相同规模的输入变量和一个 0 或 1 的类输出值。这个数据集被称为“卫星”数据集，因为绘制时每个类别中观测值的形状。

我们可以使用 make_moons()函数从这个问题中生成观测值。我们将向数据中添加噪声，并为随机数生成器播种，这样每次运行代码时都会生成相同的样本。

# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)

我们可以绘制数据集，其中两个变量作为图形上的 x 和 y 坐标，类值作为观察的颜色。

下面列出了生成数据集并绘制它的完整示例。

# generate two moons dataset
from sklearn.datasets import make_moons
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()

运行该示例会创建一个散点图，显示每个类别中观测值的半圆或月亮形状。我们可以看到点扩散的噪音使得卫星不那么明显。

Scatter Plot of Moons Dataset With Color Showing the Class Value of Each Sample

卫星数据集散点图，带有显示每个样本类别值的颜色

这是一个很好的测试问题，因为类不能用一条线分开，例如不能线性分开，需要一个非线性的方法，如神经网络来解决。

过采样多层感知器

我们可以开发一个 MLP 模型来解决这个二分类问题。

该模型将有一个隐藏层，该隐藏层的节点可能比解决该问题所需的节点更多，这为过度填充提供了机会。我们还将对模型进行比要求更长时间的训练，以确保模型溢出。

在定义模型之前，我们将把数据集分成训练集和测试集，用 30 个例子训练模型，用 70 个例子评估拟合模型的表现。

# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]

接下来，我们可以定义模型。

隐藏层使用隐藏层中的 500 个节点和校正的线性激活函数。输出层使用 sigmoid 激活函数来预测类值 0 或 1。

该模型使用二元交叉熵损失函数进行优化，适用于二分类问题和高效的 Adam 版本梯度下降。

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

然后，定义的模型适用于 4000 个时期的训练数据，默认批量为 32。

我们还将使用测试数据集作为验证数据集。

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

我们可以在测试数据集上评估模型的表现并报告结果。

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

最后，我们将绘制模型在每个时期的列车和测试集上的表现。

# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

我们可以把所有这些部分绑在一起；下面列出了完整的示例。

# mlp overfit on the moons dataset
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

我们可以看到，该模型在训练数据集上的表现优于测试数据集，这可能是过拟合的一个迹象。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

因为模型过于精确，我们通常不会期望在同一数据集上模型的重复运行中有太多(如果有的话)准确率差异。

Train: 1.000, Test: 0.914

创建一个图形，显示列车和测试集上模型准确率的线图。

我们可以看到过拟合模型的预期形状，其中测试准确率增加到一个点，然后开始再次降低。

Line Plots of Accuracy on Train and Test Datasets While Training Showing an Overfit

训练时训练和测试数据集上的准确率线图显示出过拟合

重量受限的 MLP 过度捕捞

我们可以更新示例以使用权重约束。

有一些不同的重量限制可供选择。这个模型的一个很好的简单约束是简单地归一化权重，使得范数等于 1.0。

该约束具有强制所有传入权重变小的效果。

我们可以通过使用 Keras 中的单位 _ 范数来做到这一点。该约束可以添加到第一个隐藏层，如下所示:

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))

我们也可以通过使用 min_max_norm 并将最小值和最大值设置为 1.0 来获得相同的结果，例如:

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=min_max_norm(min_value=1.0, max_value=1.0)))

我们不能用最大范数约束来达到同样的结果，因为它允许范数等于或低于指定的极限；例如:

model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=max_norm(1.0)))

下面列出了带有单位定额约束的完整更新示例:

# mlp overfit on the moons dataset with a unit norm constraint
from sklearn.datasets import make_moons
from keras.layers import Dense
from keras.models import Sequential
from keras.constraints import unit_norm
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu', kernel_constraint=unit_norm()))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

我们可以看到，对权重大小的严格约束确实提高了模型在保持集上的表现，而不影响训练集的表现。

Train: 1.000, Test: 0.943

回顾训练和测试准确率的线图，我们可以看到模型似乎不再过拟合训练数据集。

列车和测试集上的模型准确率继续增加到平稳状态。

Line Plots of Accuracy on Train and Test Datasets While Training With Weight Constraints

带权重约束的训练和测试数据集上的准确率线图

扩展ˌ扩张

本节列出了一些您可能希望探索的扩展教程的想法。

报告权重定额。更新示例以计算网络权重的大小，并演示约束确实使大小变小了。
约束输出层。更新示例，将约束添加到模型的输出层，并比较结果。
约束偏差。更新示例，为偏差权重添加约束，并比较结果。
重复评估。更新示例以多次拟合和评估模型，并报告模型表现的平均值和标准差。

如果你探索这些扩展，我很想知道。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

邮件

机器学习中向量规范的温和介绍

应用程序接口

摘要

在本教程中，您发现了用于向深度学习神经网络模型添加权重约束的 Keras API。

具体来说，您了解到:

如何使用 Keras API 创建向量范数约束。
如何使用 Keras API 为 MLP、CNN 和 RNN 层添加权重约束。
如何通过向现有模型添加权重约束来减少过拟合？

你有什么问题吗？在下面的评论中提问，我会尽力回答。

如何在 Keras 中利用丢弃正则化减少过拟合

原文：machinelearningmastery.com/how-to-redu…

最后更新于 2020 年 8 月 25 日

脱落正则化是一种计算量小的正则化深度神经网络的方法。

Dropout 的工作原理是从概率上删除或“退出”一个层的输入，这些输入可能是数据样本中的输入变量，也可能是上一层的激活。它具有模拟大量具有非常不同的网络结构的网络的效果，并且反过来使得网络中的节点通常对输入更加鲁棒。

在本教程中，您将发现用于向深度学习神经网络模型添加丢弃正则化的 Keras API。

完成本教程后，您将知道:

如何使用 Keras API 创建一个脱落层。
如何使用 Keras API 为 MLP、CNN 和 RNN 层添加丢弃正则化。
如何通过在现有模型中增加一个丢失正则化来减少过拟合？

用我的新书更好的深度学习启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

我们开始吧。

2019 年 10 月更新:针对 Keras 2.3 和 TensorFlow 2.0 更新。

How to Reduce Overfitting With Dropout Regularization in Keras

如何减少过拟合与丢弃正规化在 Keras 图片由项目拉刀疤，一些权利保留。

教程概述

本教程分为三个部分；它们是:

Keras 的丢弃正规化
层上的缺失正则化
丢弃正规化案例研究

Keras 的丢弃正规化

Keras 支持丢弃正规化。

Keras 中最简单的丢弃形式是由丢弃核心层提供的。

创建后，可以将层的丢弃率指定为将层的每个输入设置为零的概率。这与论文中丢弃率的定义不同，丢弃率指的是保留输入的概率。

因此，当论文中建议丢弃率为 0.8(保留 80%)时，实际上丢弃率为 0.2(将 20%的输入设置为零)。

下面是一个创建脱落层的示例，50%的概率将输入设置为零。

layer = Dropout(0.5)

层上的缺失正则化

丢弃层被添加到现有层之间的模型中，并应用于前一层的输出，该输出被馈送到后一层。

例如，给定两个致密层:

...
model.append(Dense(32))
model.append(Dense(32))
...

我们可以在它们之间插入一个脱离层，在这种情况下，第一层的输出或激活应用了脱离，然后作为下一层的输入。

这是第二层，现在已经应用了脱落。

...
model.append(Dense(32))
model.append(Dropout(0.5))
model.append(Dense(32))
...

丢失也可以应用于可见层，例如网络的输入。

这要求您将 Dropout 层定义为第一个层，并将 input_shape 参数添加到该层，以指定输入样本的预期形状。

...
model.add(Dropout(0.5, input_shape=(2,)))
...

让我们来看看如何将丢弃正规化用于一些常见的网络类型。

MLP 丢弃正规化

下面的例子增加了两个密集的全连接层之间的压差。

# example of dropout between fully connected layers
from keras.layers import Dense
from keras.layers import Dropout
...
model.add(Dense(32))
model.add(Dropout(0.5))
model.add(Dense(1))
...

美国有线电视新闻网丢弃正规化

丢弃可以在卷积层(如 Conv2D)之后和池化层(如 MaxPooling2D)之后使用。

通常，仅在池化层后才使用退出，但这只是一个粗略的启发。

# example of dropout for a CNN
from keras.layers import Dense
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Dropout
...
model.add(Conv2D(32, (3,3)))
model.add(Conv2D(32, (3,3)))
model.add(MaxPooling2D())
model.add(Dropout(0.5))
model.add(Dense(1))
...

在这种情况下，删除将应用于要素图中的每个元素或单元。

卷积神经网络使用丢弃的另一种方法是从卷积层中丢弃整个特征图，然后在池化期间不使用这些特征图。这被称为空间脱落(或“空间脱落”)。

相反，我们制定了一个新的退出方法，我们称之为空间退出。对于给定的卷积特征张量[…][我们]在整个特征映射中扩展缺失值。

——使用卷积网络的高效目标定位，2015。

空间丢失在 Keras 通过空间丢失 2D 层(以及 1D 和 3D 版本)提供。

# example of spatial dropout for a CNN
from keras.layers import Dense
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import SpatialDropout2D
...
model.add(Conv2D(32, (3,3)))
model.add(Conv2D(32, (3,3)))
model.add(SpatialDropout2D(0.5))
model.add(MaxPooling2D())
model.add(Dense(1))
...

RNN 丢弃正规化

下面的例子增加了两层之间的脱落:LSTM 循环层和密集的全连接层。

# example of dropout between LSTM and fully connected layers
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
...
model.add(LSTM(32))
model.add(Dropout(0.5))
model.add(Dense(1))
...

本例将压差应用于 LSTM 层的 32 个输出，作为密集层的输入。

或者，LSTM 的输入可能会丢失。在这种情况下，在提交给 LSTM 的每个样本中的每个时间步长应用不同的丢失掩码。

# example of dropout before LSTM layer
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
...
model.add(Dropout(0.5, input_shape=(...)))
model.add(LSTM(32))
model.add(Dense(1))
...

有一种替代方法可以使用像 LSTM 这样的反复层的脱落。LSTM 可以对样本内的所有输入使用相同的丢失屏蔽。相同的方法可以用于样本时间步长上的重复输入连接。这种带有递归模型的丢弃方法被称为变分 RNN。

所提出的技术(变分 RNN【…】)在每个时间步使用相同的丢失掩码，包括循环层。[……]实现我们的近似推理与在 rnn 中实现丢弃是一样的，在每个时间步长丢弃相同的网络单元，随机丢弃输入、输出和循环连接。这与现有技术形成对比，在现有技术中，不同的网络单元将在不同的时间步长被丢弃，并且不会将丢弃应用于重复的连接

——递归神经网络中脱落的理论基础应用，2016。

Keras 通过递归层上的两个参数，即输入的“drop”和递归输入的“recurrent _ drop”，支持可变 rnn(即输入和递归输入样本时间步长上的一致 drop)。

# example of variational LSTM dropout
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
...
model.add(LSTM(32, dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(1))
...

丢弃正规化案例研究

在这一节中，我们将演示如何在一个简单的二进制分类问题上使用丢弃正则化来减少 MLP 的过拟合。

此示例提供了一个模板，用于将丢失正则化应用到您自己的神经网络中，以解决分类和回归问题。

二分类问题

我们将使用一个标准的二分类问题，它定义了两个观察值的二维同心圆，每个类一个圆。

每个观察都有两个相同规模的输入变量和一个 0 或 1 的类输出值。该数据集被称为“圆”数据集，这是因为绘制时每个类中观测值的形状。

# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)

我们可以绘制数据集，其中两个变量作为图形上的 x 和 y 坐标，类值作为观察的颜色。

下面列出了生成数据集并绘制它的完整示例。

# generate two circles dataset
from sklearn.datasets import make_circles
from matplotlib import pyplot
from pandas import DataFrame
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# scatter plot, dots colored by class value
df = DataFrame(dict(x=X[:,0], y=X[:,1], label=y))
colors = {0:'red', 1:'blue'}
fig, ax = pyplot.subplots()
grouped = df.groupby('label')
for key, group in grouped:
    group.plot(ax=ax, kind='scatter', x='x', y='y', label=key, color=colors[key])
pyplot.show()

运行该示例会创建一个散点图，显示每个类中观察值的同心圆形状。我们可以看到点扩散的噪音使得圆圈不那么明显。

Scatter Plot of Circles Dataset with Color Showing the Class Value of Each Sample

带有显示每个样本类别值的颜色的圆形数据集散点图

这是一个很好的测试问题，因为类不能用一条线分开，例如不能线性分开，需要一个非线性的方法，如神经网络来解决。

过采样多层感知器

我们可以开发一个 MLP 模型来解决这个二分类问题。

在定义模型之前，我们将把数据集分成训练集和测试集，用 30 个例子训练模型，用 70 个例子评估拟合模型的表现。

# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]

接下来，我们可以定义模型。

隐藏层使用隐藏层中的 500 个节点和校正的线性激活函数。输出层使用 sigmoid 激活函数来预测类值 0 或 1。

该模型使用二元交叉熵损失函数进行优化，适用于二分类问题和高效的 Adam 版本梯度下降。

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

然后，定义的模型适用于 4000 个时期的训练数据，默认批量为 32。

我们还将使用测试数据集作为验证数据集。

# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)

我们可以在测试数据集上评估模型的表现并报告结果。

# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))

最后，我们将绘制模型在每个时期的列车和测试集上的表现。

# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

我们可以把所有这些部分绑在一起；下面列出了完整的示例。

# mlp overfit on the two circles dataset
from sklearn.datasets import make_circles
from keras.layers import Dense
from keras.models import Sequential
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

我们可以看到，该模型在训练数据集上的表现优于测试数据集，这可能是过拟合的一个迹象。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

因为模型被严重过拟合，我们通常不会期望模型在同一数据集上重复运行时的准确率有太大差异。

Train: 1.000, Test: 0.757

创建一个图形，显示列车和测试集上模型准确率的线图。

我们可以看到过拟合模型的预期形状，其中测试准确率增加到一个点，然后开始再次降低。

Line Plots of Accuracy on Train and Test Datasets While Training Showing an Overfit

训练时训练和测试数据集上的准确率线图显示出过拟合

丢弃正规化的过度 MLP

我们可以更新这个例子来使用脱落正则化。

我们只需在隐藏层和输出层之间插入一个新的 Dropout 层就可以做到这一点。在这种情况下，我们将丢弃率(将隐藏层的输出设置为零的概率)指定为 40%或 0.4。

# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

下面列出了隐藏层后添加了脱落的完整更新示例:

# mlp with dropout on the two circles dataset
from sklearn.datasets import make_circles
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from matplotlib import pyplot
# generate 2d classification dataset
X, y = make_circles(n_samples=100, noise=0.1, random_state=1)
# split into train and test
n_train = 30
trainX, testX = X[:n_train, :], X[n_train:, :]
trainy, testy = y[:n_train], y[n_train:]
# define model
model = Sequential()
model.add(Dense(500, input_dim=2, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit model
history = model.fit(trainX, trainy, validation_data=(testX, testy), epochs=4000, verbose=0)
# evaluate the model
_, train_acc = model.evaluate(trainX, trainy, verbose=0)
_, test_acc = model.evaluate(testX, testy, verbose=0)
print('Train: %.3f, Test: %.3f' % (train_acc, test_acc))
# plot history
pyplot.plot(history.history['accuracy'], label='train')
pyplot.plot(history.history['val_accuracy'], label='test')
pyplot.legend()
pyplot.show()

运行该示例会报告列车和测试数据集上的模型表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这个特定的例子中，我们可以看到，丢失导致训练数据集的准确率略有下降，从 100%下降到 96%，测试集的准确率从 75%上升到 81%。

Train: 0.967, Test: 0.814

回顾训练期间的训练和测试准确率的线图，我们可以看到模型似乎不再过度训练训练数据集。

列车和测试集上的模型准确率继续增加到平稳状态，尽管由于在训练过程中使用了脱扣器，噪声很大。

Line Plots of Accuracy on Train and Test Datasets While Training With Dropout Regularization

使用缺失正则化进行训练时，训练和测试数据集上的准确率线图

扩展ˌ扩张

本节列出了一些您可能希望探索的扩展教程的想法。

输入丢失。更新示例，对输入变量使用 drop 并比较结果。
重量约束。更新示例，将最大范数权重约束添加到隐藏层并比较结果。
重复评估。更新该示例，以重复对 overfit 和 dropout 模型的评估，并总结和比较平均结果。
网格搜索率。开发丢弃概率的网格搜索，并报告丢弃率和测试集准确性之间的关系。

如果你探索这些扩展，我很想知道。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

利用卷积网络的高效目标定位，2015。
递归神经网络中脱落的理论基础应用，2016。

邮件

应用程序接口

摘要

在本教程中，您发现了用于向深度学习神经网络模型添加丢弃正则化的 Keras API。

具体来说，您了解到:

如何使用 Keras API 创建一个脱落层。
如何使用 Keras API 为 MLP、CNN 和 RNN 层添加丢弃正则化。
如何通过在现有模型中增加一个丢失正则化来减少过拟合？

你有什么问题吗？在下面的评论中提问，我会尽力回答。