Machine-Learning-Mastery-优化教程-七-Machine Learning Mastery 优化教

Machine Learning Mastery 优化教程（七）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

如何使用优化算法手动拟合回归模型

原文：machinelearningmastery.com/optimize-re…

最后更新于 2021 年 10 月 12 日

使用线性回归和局部搜索优化算法对训练数据拟合回归模型。

像线性回归和逻辑回归这样的模型是通过最小二乘优化来训练的，这是找到使这些模型的误差最小化的系数的最有效的方法。

然而，可以使用替代的优化算法将回归模型拟合到训练数据集。这可能是一个有用的练习，以了解更多关于回归函数和优化在应用机器学习中的核心性质。对于数据不符合最小二乘优化程序要求的回归，也可能需要它。

在本教程中，您将发现如何手动优化回归模型的系数。

完成本教程后，您将知道:

如何从零开始开发回归的推理模型？
如何优化预测数值的线性回归模型的系数？
如何用随机爬山法优化逻辑回归模型的系数？

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. How to Use Optimization Algorithms to Manually Fit Regression Models

教程概述

本教程分为三个部分；它们是:

优化回归模型
优化线性回归模型
优化逻辑回归模型

优化回归模型

回归模型，像线性回归和逻辑回归，是统计学领域中众所周知的算法。

两种算法都是线性的，这意味着模型的输出是输入的加权和。线性回归是针对需要预测一个数的“回归”问题设计的，逻辑回归是针对需要预测一个类标签的“分类”问题设计的。

这些回归模型包括使用优化算法来为模型的每个输入找到一组系数，从而最小化预测误差。因为模型是线性的并且被很好地理解，所以可以使用有效的优化算法。

在线性回归的情况下，系数可以通过最小二乘优化找到，可以使用线性代数求解。在逻辑回归的情况下，通常使用局部搜索优化算法。

可以使用任意优化算法来训练线性和逻辑回归模型。

也就是说，我们可以定义一个回归模型，并使用给定的优化算法为该模型找到一组系数，从而使预测误差最小或分类准确率最大。

平均而言，使用替代优化算法的效率低于使用推荐的优化算法。尽管如此，在某些特定情况下，例如如果输入数据不符合模型的期望(如高斯分布)并且与外部输入不相关，这可能更有效。

演示优化在训练机器学习算法，特别是回归模型中的核心性质也是一个有趣的练习。

接下来，让我们探索如何使用随机爬山训练线性回归模型。

优化线性回归模型

线性回归模型可能是从数据中学习的最简单的预测模型。

该模型对每个输入都有一个系数，预测输出只是一些输入和系数的权重。

在本节中，我们将优化线性回归模型的系数。

首先，让我们定义一个合成回归问题，我们可以将其作为优化模型的重点。

我们可以使用make _ revolution()函数定义一个有 1000 行和 10 个输入变量的回归问题。

下面的示例创建数据集并总结数据的形状。

# define a regression dataset
from sklearn.datasets import make_regression
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)

运行该示例会打印出创建的数据集的形状，这证实了我们的预期。

(1000, 10) (1000,)

接下来，我们需要定义一个线性回归模型。

在我们优化模型系数之前，我们必须发展模型和我们对它如何工作的信心。

让我们从开发一个函数开始，该函数为来自数据集的给定输入数据行计算模型的激活。

该函数将获取模型的数据行和系数，并计算输入的加权和，加上一个额外的 y 截距(也称为偏移或偏差)系数。下面的 predict_row() 函数实现了这一点。

我们正在使用简单的 Python 列表和命令式编程风格，而不是故意使用 NumPy 数组或列表压缩，以使代码对 Python 初学者来说更易读。请随意优化它，并在下面的评论中发布您的代码。

# linear regression
def predict_row(row, coefficients):
	# add the bias, the last coefficient
	result = coefficients[-1]
	# add the weighted input
	for i in range(len(row)):
		result += coefficients[i] * row[i]
	return result

接下来，我们可以为给定数据集中的每一行调用 predict_row() 函数。下面的*预测 _ 数据集()*函数实现了这一点。

同样，为了可读性，我们有意使用简单的命令式编码风格，而不是列表压缩。

# use model coefficients to generate predictions for a dataset of rows
def predict_dataset(X, coefficients):
	yhats = list()
	for row in X:
		# make a prediction
		yhat = predict_row(row, coefficients)
		# store the prediction
		yhats.append(yhat)
	return yhats

最后，我们可以使用该模型对我们的合成数据集进行预测，以确认它都工作正常。

我们可以使用 rand()函数生成一组随机的模型系数。

回想一下，我们每个输入需要一个系数(本数据集中有 10 个输入)加上 y 截距系数的额外权重。

...
# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)
# determine the number of coefficients
n_coeff = X.shape[1] + 1
# generate random coefficients
coefficients = rand(n_coeff)

然后，我们可以将这些系数与数据集一起使用来进行预测。

...
# generate predictions for dataset
yhat = predict_dataset(X, coefficients)

我们可以评估这些预测的均方误差。

...
# calculate model prediction error
score = mean_squared_error(y, yhat)
print('MSE: %f' % score)

就这样。

我们可以将所有这些联系在一起，并演示我们的线性回归模型用于回归预测建模。下面列出了完整的示例。

# linear regression model
from numpy.random import rand
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

# linear regression
def predict_row(row, coefficients):
	# add the bias, the last coefficient
	result = coefficients[-1]
	# add the weighted input
	for i in range(len(row)):
		result += coefficients[i] * row[i]
	return result

# use model coefficients to generate predictions for a dataset of rows
def predict_dataset(X, coefficients):
	yhats = list()
	for row in X:
		# make a prediction
		yhat = predict_row(row, coefficients)
		# store the prediction
		yhats.append(yhat)
	return yhats

# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)
# determine the number of coefficients
n_coeff = X.shape[1] + 1
# generate random coefficients
coefficients = rand(n_coeff)
# generate predictions for dataset
yhat = predict_dataset(X, coefficients)
# calculate model prediction error
score = mean_squared_error(y, yhat)
print('MSE: %f' % score)

运行示例会为训练数据集中的每个示例生成预测，然后打印预测的均方误差。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在给定一组随机权重的情况下，我们预计会有很大的误差，这就是我们在这种情况下看到的，误差值约为 7，307 个单位。

MSE: 7307.756740

我们现在可以优化数据集的系数，以实现该数据集的低误差。

首先，我们需要将数据集分成训练集和测试集。重要的是保留一些未用于优化模型的数据，以便我们可以在用于对新数据进行预测时对模型的表现进行合理的估计。

我们将使用 67%的数据进行训练，剩下的 33%作为测试集来评估模型的表现。

...
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

接下来，我们可以开发一个随机爬山算法。

优化算法需要一个目标函数来优化。它必须采用一组系数，并返回一个对应于更好模型的最小化或最大化分数。

在这种情况下，我们将使用给定的一组系数来评估模型的均方误差，并返回误差分数，该分数必须最小化。

给定数据集和一组系数，下面的 objective() 函数实现了这一点，并返回模型的误差。

# objective function
def objective(X, y, coefficients):
	# generate predictions for dataset
	yhat = predict_dataset(X, coefficients)
	# calculate accuracy
	score = mean_squared_error(y, yhat)
	return score

接下来，我们可以定义随机爬山算法。

该算法将需要一个初始解(例如，随机系数)，并将反复不断地对解进行小的改变，并检查它是否产生一个表现更好的模型。对当前解决方案的更改量由步长超参数控制。该过程将持续固定次数的迭代，也作为超参数提供。

下面的*爬山()*函数实现了这一点，将数据集、目标函数、初始解和超参数作为参数，并返回找到的最佳系数集和估计的表现。

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = solution + randn(len(solution)) * step_size
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %.5f' % (i, solution_eval))
	return [solution, solution_eval]

然后我们可以调用这个函数，传入一组初始系数作为初始解，并将训练数据集作为数据集来优化模型。

...
# define the total iterations
n_iter = 2000
# define the maximum step size
step_size = 0.15
# determine the number of coefficients
n_coef = X.shape[1] + 1
# define the initial solution
solution = rand(n_coef)
# perform the hill climbing search
coefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('Done!')
print('Coefficients: %s' % coefficients)
print('Train MSE: %f' % (score))

最后，我们可以在测试数据集上评估最佳模型并报告表现。

...
# generate predictions for the test dataset
yhat = predict_dataset(X_test, coefficients)
# calculate accuracy
score = mean_squared_error(y_test, yhat)
print('Test MSE: %f' % (score))

将这些联系在一起，下面列出了在综合回归数据集上优化线性回归模型系数的完整示例。

# optimize linear regression coefficients for regression dataset
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# linear regression
def predict_row(row, coefficients):
	# add the bias, the last coefficient
	result = coefficients[-1]
	# add the weighted input
	for i in range(len(row)):
		result += coefficients[i] * row[i]
	return result

# use model coefficients to generate predictions for a dataset of rows
def predict_dataset(X, coefficients):
	yhats = list()
	for row in X:
		# make a prediction
		yhat = predict_row(row, coefficients)
		# store the prediction
		yhats.append(yhat)
	return yhats

# objective function
def objective(X, y, coefficients):
	# generate predictions for dataset
	yhat = predict_dataset(X, coefficients)
	# calculate accuracy
	score = mean_squared_error(y, yhat)
	return score

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = solution + randn(len(solution)) * step_size
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %.5f' % (i, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_regression(n_samples=1000, n_features=10, n_informative=2, noise=0.2, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 2000
# define the maximum step size
step_size = 0.15
# determine the number of coefficients
n_coef = X.shape[1] + 1
# define the initial solution
solution = rand(n_coef)
# perform the hill climbing search
coefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('Done!')
print('Coefficients: %s' % coefficients)
print('Train MSE: %f' % (score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, coefficients)
# calculate accuracy
score = mean_squared_error(y_test, yhat)
print('Test MSE: %f' % (score))

每次对模型进行改进时，运行该示例都会报告迭代次数和均方误差。

在搜索结束时，报告最佳系数集在训练数据集上的表现，并计算和报告相同模型在测试数据集上的表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到优化算法在训练和测试数据集上都找到了一组误差约为 0.08 的系数。

该算法在训练数据集和测试数据集上找到了表现非常相似的模型，这是一个好的迹象，表明该模型没有过拟合(过度优化)训练数据集。这意味着该模型可以很好地推广到新数据。

...
>1546 0.35426
>1567 0.32863
>1572 0.32322
>1619 0.24890
>1665 0.24800
>1691 0.24162
>1715 0.15893
>1809 0.15337
>1892 0.14656
>1956 0.08042
Done!
Coefficients: [ 1.30559829e-02 -2.58299382e-04  3.33118191e+00  3.20418534e-02
  1.36497902e-01  8.65445367e+01  2.78356715e-02 -8.50901499e-02
  8.90078243e-02  6.15779867e-02 -3.85657793e-02]
Train MSE: 0.080415
Test MSE: 0.080779

现在我们已经熟悉了如何手动优化线性回归模型的系数，让我们看看如何扩展示例来优化用于分类的逻辑回归模型的系数。

优化逻辑回归模型

逻辑回归模型是线性回归的扩展，用于分类预测建模。

Logistic 回归是针对二分类任务的，意思是数据集有两个类标签，class=0 和 class=1。

输出首先包括计算输入的加权和，然后将这个加权和传递给一个逻辑函数，也称为 sigmoid 函数。对于属于类=1 的例子，结果是 0 和 1 之间的二项式概率。

在本节中，我们将在上一节所学的基础上优化回归模型的系数以进行分类。我们将开发模型并用随机系数进行测试，然后使用随机爬山来优化模型系数。

首先，让我们定义一个合成的二分类问题，我们可以将其作为优化模型的重点。

我们可以使用 make_classification()函数定义一个包含 1000 行和 5 个输入变量的二分类问题。

下面的示例创建数据集并总结数据的形状。

# define a binary classification dataset
from sklearn.datasets import make_classification
# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# summarize the shape of the dataset
print(X.shape, y.shape)

运行该示例会打印出创建的数据集的形状，这证实了我们的预期。

(1000, 5) (1000,)

接下来，我们需要定义一个逻辑回归模型。

让我们从更新 predict_row() 函数开始，通过逻辑函数传递输入和系数的加权和。

逻辑函数定义为:

logistic = 1.0/(1.0+exp(-结果))

其中，结果是输入和系数的加权和，exp()是 e ( 欧拉数)乘以提供值的幂，通过 exp()函数实现。

更新后的 predict_row() 功能如下。

# logistic regression
def predict_row(row, coefficients):
	# add the bias, the last coefficient
	result = coefficients[-1]
	# add the weighted input
	for i in range(len(row)):
		result += coefficients[i] * row[i]
	# logistic function
	logistic = 1.0 / (1.0 + exp(-result))
	return logistic

这就是线性回归到逻辑回归的变化。

与线性回归一样，我们可以用一组随机模型系数来测试模型。

...
# determine the number of coefficients
n_coeff = X.shape[1] + 1
# generate random coefficients
coefficients = rand(n_coeff)
# generate predictions for dataset
yhat = predict_dataset(X, coefficients)

该模型做出的预测是属于类=1 的例子的概率。

对于预期的类标签，我们可以将预测舍入为整数值 0 和 1。

...
# round predictions to labels
yhat = [round(y) for y in yhat]

我们可以评估这些预测的分类准确性。

...
# calculate accuracy
score = accuracy_score(y, yhat)
print('Accuracy: %f' % score)

就这样。

我们可以将所有这些联系在一起，并演示我们用于二分类的简单逻辑回归模型。下面列出了完整的示例。

# logistic regression function for binary classification
from math import exp
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# logistic regression
def predict_row(row, coefficients):
	# add the bias, the last coefficient
	result = coefficients[-1]
	# add the weighted input
	for i in range(len(row)):
		result += coefficients[i] * row[i]
	# logistic function
	logistic = 1.0 / (1.0 + exp(-result))
	return logistic

# use model coefficients to generate predictions for a dataset of rows
def predict_dataset(X, coefficients):
	yhats = list()
	for row in X:
		# make a prediction
		yhat = predict_row(row, coefficients)
		# store the prediction
		yhats.append(yhat)
	return yhats

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# determine the number of coefficients
n_coeff = X.shape[1] + 1
# generate random coefficients
coefficients = rand(n_coeff)
# generate predictions for dataset
yhat = predict_dataset(X, coefficients)
# round predictions to labels
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y, yhat)
print('Accuracy: %f' % score)

运行示例会为训练数据集中的每个示例生成预测，然后打印预测的分类准确率。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在给定一组随机权重和每个类中具有相同数量示例的数据集的情况下，我们期望大约 50%的准确性，这大约是我们在本例中看到的。

Accuracy: 0.540000

我们现在可以优化数据集的权重，以在该数据集上获得良好的准确率。

用于线性回归的随机爬山算法可以再次用于逻辑回归。

重要的区别是更新了*目标()*函数，使用分类准确率而不是均方误差来舍入预测和评估模型。

# objective function
def objective(X, y, coefficients):
	# generate predictions for dataset
	yhat = predict_dataset(X, coefficients)
	# round predictions to labels
	yhat = [round(y) for y in yhat]
	# calculate accuracy
	score = accuracy_score(y, yhat)
	return score

*爬山()*功能也必须更新，以最大化解的得分，而不是线性回归情况下的最小化。

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = solution + randn(len(solution)) * step_size
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %.5f' % (i, solution_eval))
	return [solution, solution_eval]

最后，通过搜索找到的系数可以在运行结束时使用分类准确率进行评估。

...
# generate predictions for the test dataset
yhat = predict_dataset(X_test, coefficients)
# round predictions to labels
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %f' % (score))

将所有这些联系在一起，下面列出了使用随机爬山来最大化逻辑回归模型的分类准确率的完整示例。

# optimize logistic regression model with a stochastic hill climber
from math import exp
from numpy.random import randn
from numpy.random import rand
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# logistic regression
def predict_row(row, coefficients):
	# add the bias, the last coefficient
	result = coefficients[-1]
	# add the weighted input
	for i in range(len(row)):
		result += coefficients[i] * row[i]
	# logistic function
	logistic = 1.0 / (1.0 + exp(-result))
	return logistic

# use model coefficients to generate predictions for a dataset of rows
def predict_dataset(X, coefficients):
	yhats = list()
	for row in X:
		# make a prediction
		yhat = predict_row(row, coefficients)
		# store the prediction
		yhats.append(yhat)
	return yhats

# objective function
def objective(X, y, coefficients):
	# generate predictions for dataset
	yhat = predict_dataset(X, coefficients)
	# round predictions to labels
	yhat = [round(y) for y in yhat]
	# calculate accuracy
	score = accuracy_score(y, yhat)
	return score

# hill climbing local search algorithm
def hillclimbing(X, y, objective, solution, n_iter, step_size):
	# evaluate the initial point
	solution_eval = objective(X, y, solution)
	# run the hill climb
	for i in range(n_iter):
		# take a step
		candidate = solution + randn(len(solution)) * step_size
		# evaluate candidate point
		candidte_eval = objective(X, y, candidate)
		# check if we should keep the new point
		if candidte_eval >= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d %.5f' % (i, solution_eval))
	return [solution, solution_eval]

# define dataset
X, y = make_classification(n_samples=1000, n_features=5, n_informative=2, n_redundant=1, random_state=1)
# split into train test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# define the total iterations
n_iter = 2000
# define the maximum step size
step_size = 0.1
# determine the number of coefficients
n_coef = X.shape[1] + 1
# define the initial solution
solution = rand(n_coef)
# perform the hill climbing search
coefficients, score = hillclimbing(X_train, y_train, objective, solution, n_iter, step_size)
print('Done!')
print('Coefficients: %s' % coefficients)
print('Train Accuracy: %f' % (score))
# generate predictions for the test dataset
yhat = predict_dataset(X_test, coefficients)
# round predictions to labels
yhat = [round(y) for y in yhat]
# calculate accuracy
score = accuracy_score(y_test, yhat)
print('Test Accuracy: %f' % (score))

每次对模型进行改进时，运行该示例都会报告迭代次数和分类准确率。

在搜索结束时，报告最佳系数集在训练数据集上的表现，并计算和报告相同模型在测试数据集上的表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到优化算法找到了一组权重，在训练数据集上获得了大约 87.3%的准确率，在测试数据集上获得了大约 83.9%的准确率。

...
>200 0.85672
>225 0.85672
>230 0.85672
>245 0.86418
>281 0.86418
>285 0.86716
>294 0.86716
>306 0.86716
>316 0.86716
>317 0.86716
>320 0.86866
>348 0.86866
>362 0.87313
>784 0.87313
>1649 0.87313
Done!
Coefficients: [-0.04652756  0.23243427  2.58587637 -0.45528253 -0.4954355  -0.42658053]
Train Accuracy: 0.873134
Test Accuracy: 0.839394

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

蜜蜂

sklearn . dataset . make _ revolution APIS。
sklearn . datasets . make _ classification APIS。
sklearn . metrics . mean _ squared _ error API。
num py . random . rand API。

文章

摘要

在本教程中，您发现了如何手动优化回归模型的系数。

具体来说，您了解到:

如何从零开始开发回归的推理模型？
如何优化预测数值的线性回归模型的系数？
如何用随机爬山法优化逻辑回归模型的系数？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

过早收敛的温和介绍

原文：machinelearningmastery.com/premature-c…

最后更新于 2021 年 10 月 12 日

收敛指的是过程的极限，在评估优化算法的预期表现时，它可能是一个有用的分析工具。

当探索优化算法的学习动态以及使用优化算法训练的机器学习算法时，例如深度学习神经网络，它也可以是有用的经验工具。这激发了对学习曲线和技巧的研究，比如提前停止。

如果优化是一个产生候选解的过程，那么收敛就代表了过程结束时的一个稳定点，此时没有进一步的变化或改进。过早收敛指的是优化算法的失败模式，其中过程停止在不代表全局最优解的稳定点。

在本教程中，您将发现机器学习中过早收敛的温和介绍。

完成本教程后，您将知道:

收敛是指通过迭代优化算法在一系列解的末尾找到的稳定点。
过早收敛是指过早地发现一个稳定点，也许接近搜索的起点，并且带有比预期更差的评估。
优化算法的贪婪性提供了对算法收敛速度的控制。

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. A Gentle Introduction to Premature Convergence

教程概述

本教程分为三个部分；它们是:

机器学习中的收敛性
过早收敛
解决早熟收敛问题

机器学习中的收敛性

收敛一般是指一个过程的值随着时间的推移而有行为趋势。

当使用优化算法时，这是一个有用的想法。

优化指的是一类需要从目标函数中找到一组导致最大值或最小值的输入的问题。优化是一个迭代过程，产生一系列候选解，直到在过程结束时最终得到最终解。

达到稳定点最终解的优化算法的这种行为或动态被称为收敛，例如优化算法的收敛。这样，收敛定义了优化算法的终止。

局部下降涉及迭代选择下降方向，然后在该方向上迈出一步，并重复该过程，直到满足收敛或某些终止条件。

—第 13 页，优化算法，2019。

收敛:稳定点所在的优化算法的停止条件，算法的进一步迭代不太可能导致进一步的改进。

我们可以根据经验测量和探索优化算法的收敛性，例如使用学习曲线。此外，我们还可以分析地探索优化算法的收敛性，例如收敛性证明或平均情况下的计算复杂度。

强大的选择压力导致快速但可能过早的收敛。减弱选择压力会减慢搜索过程…

—第 78 页，进化计算:一种统一的方法，2002 年。

对于那些通过迭代优化算法在训练数据集上拟合(学习)的算法，例如逻辑回归和人工神经网络，优化和优化算法的收敛是机器学习中的一个重要概念。

因此，我们可以选择比其他算法导致更好的收敛行为的优化算法，或者花费大量时间通过优化的超参数(例如，学习率)来调整优化算法的收敛动态(学习动态)。

收敛行为可以与在收敛时找到的稳定点的目标函数评估以及这些关注点的组合进行比较，通常是根据收敛之前所需的算法迭代次数。

过早收敛

过早收敛指的是过早发生的过程的收敛。

在优化中，它指的是算法收敛到一个表现比预期差的稳定点。

过早收敛通常会影响复杂的优化任务，其中目标函数是非凸的，这意味着响应面包含许多不同的好解(稳定点)，可能有一个(或几个)最佳解。

如果我们将优化下的目标函数的响应面视为几何景观，并且我们正在寻求函数的最小值，那么过早优化指的是找到接近搜索起点的谷，该谷的深度小于问题域中最深的谷。

对于呈现高度多模态(崎岖)适应性景观或随时间变化的景观的问题，过多的开发通常会导致过早收敛到空间中的次优峰值。

—第 60 页，进化计算:一种统一的方法，2002 年。

以这种方式，过早收敛被描述为寻找局部最优解而不是优化算法的全局最优解。这是一个优化算法的具体失败案例。

过早收敛:优化算法收敛到一个比最优稳定点更差的点，该点可能接近起始点。

换句话说，收敛意味着搜索过程的结束，例如，找到了一个稳定点，并且算法的进一步迭代不可能改进解。过早收敛是指在不太理想的稳定点达到优化算法的停止条件。

解决早熟收敛问题

在任何具有合理挑战性的优化任务中，过早收敛都可能是一个相关的问题。

例如，进化计算和遗传算法领域的大多数研究涉及识别和克服算法在优化任务上的过早收敛。

如果选择集中在最适合的个体上，由于新种群的多样性降低，选择压力可能导致早熟收敛。

—第 139 页，计算智能:导论，2007 年第 2 版。

基于种群的优化算法，像进化算法和群体智能，经常根据选择压力和收敛之间的相互作用来描述它们的动态。例如，强大的选择压力会导致更快的收敛和可能的过早收敛。较弱的选择压力可能会导致较慢的收敛(较大的计算成本)，尽管可能会找到更好甚至全局的最优解。

具有高选择压力的算子比具有低选择压力的算子更快地减少种群的多样性，这可能导致过早收敛到次优解。高选择压力限制了人们的探索能力。

—第 135 页，计算智能:导论，2007 年第 2 版。

这种选择性压力的想法更有助于理解优化算法的学习动态。例如，被配置为过于贪婪的优化(例如，通过诸如步长或学习率的超参数)可能由于过早收敛而失败，而被配置为不太贪婪的相同算法可能克服过早收敛并发现更好的或全局最优解。

当使用随机梯度下降来训练神经网络模型时，可能会遇到过早收敛，表现为学习曲线快速指数下降，然后停止改善。

达到收敛所需的更新数量通常随着训练集的大小而增加。然而，随着 m 趋近于无穷大，在 SGD 对训练集中的每个示例进行采样之前，模型最终将收敛到其最佳可能的测试误差。

—第 153 页，深度学习，2016。

拟合神经网络容易过早收敛的事实促使人们使用学习曲线等方法来监控和诊断训练数据集中模型的收敛问题，并使用正则化方法，如提前停止，在找到稳定点之前停止优化算法，但代价是保持数据集的表现较差。

因此，对深度学习神经网络的许多研究最终都是为了克服过早收敛。

根据经验，我们经常发现“tanh”激活函数比逻辑函数能更快地收敛训练算法。

—第 127 页，用于模式识别的神经网络，1995。

这包括权重初始化等技术，这一点至关重要，因为神经网络的初始权重定义了优化过程的起点，而不良的初始化会导致过早收敛。

初始点可以决定算法是否收敛，一些初始点非常不稳定，以至于算法遇到数值困难并完全失败。

—第 301 页，深度学习，2016。

这也包括随机梯度下降优化算法的大量变化和扩展，例如增加动量以使算法不会超过最优值(稳定点)，以及 Adam 为每个正在优化的参数增加一个自动调整的步长超参数(学习率)，显著加快收敛速度。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

教程

如何利用学习曲线诊断机器学习模型表现

书

优化算法，2019。
遗传算法导论，1998。
计算智能:导论，第二版，2007。
深度学习，2016 年。
进化计算:统一方法，2002。
用于模式识别的神经网络，1995。
概率机器学习:导论 2020。

文章

摘要

在本教程中，您发现了机器学习中过早收敛的温和介绍。

具体来说，您了解到:

收敛是指通过迭代优化算法在一系列解的末尾找到的稳定点。
过早收敛是指过早地发现一个稳定点，也许接近搜索的起点，并且带有比预期更差的评估。
优化算法的贪婪性提供了对算法收敛速度的控制。

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

函数优化的随机搜索和网格搜索

原文：machinelearningmastery.com/random-sear…

最后更新于 2021 年 10 月 12 日

函数优化需要选择一种算法来有效地对搜索空间进行采样，并找到一个好的或最佳的解。

有许多算法可供选择，尽管为某个问题建立可行或可能的解决方案类型的基线很重要。这可以使用简单的优化算法来实现，例如随机搜索或网格搜索。

一个简单的优化算法所获得的结果在计算上是高效的，可以为更复杂的优化算法提供一个比较点。有时，发现简单算法可以获得最佳表现，特别是在那些有噪声或不平滑的问题上，以及那些领域专业知识通常会偏向于选择优化算法的问题上。

在本教程中，您将发现函数优化的简单算法。

完成本教程后，您将知道:

朴素算法在函数优化项目中的作用。
如何生成和评估函数优化的随机搜索？
如何为函数优化生成和评估网格搜索？

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. Random Search and Grid Search for Function Optimization

函数优化的随机搜索和网格搜索图片由 Kosala Bandara 提供，保留部分权利。

教程概述

本教程分为三个部分；它们是:

朴素函数优化算法
函数优化的随机搜索
函数优化的网格搜索

朴素函数优化算法

有许多不同的算法可以用来优化，但是你怎么知道你得到的结果是否好呢？

解决这个问题的一种方法是使用简单的优化算法建立表现基线。

朴素优化算法是一种对正在优化的目标函数不做任何假设的算法。

它可以用很少的努力来应用，并且该算法获得的最佳结果可以用作比较更复杂算法的参考点。如果一个更复杂的算法平均来说不能达到比一个幼稚的算法更好的结果，那么它在你的问题上没有技巧，应该被放弃。

有两种朴素的算法可以用于函数优化；它们是:

随机搜索
Grid Search

这些算法被称为“T0”搜索“T1”算法，因为在基础上，优化可以被视为一个搜索问题。例如，找到最小化或最大化目标函数输出的输入。

还有一种可以使用的算法叫做“穷举搜索”，它列举了所有可能的输入。这在实践中很少使用，因为枚举所有可能的输入是不可行的，例如需要太多时间来运行。

然而，如果你发现自己正在处理一个优化问题，所有的输入都可以在合理的时间内被枚举和评估，这应该是你应该使用的默认策略。

让我们依次仔细看看每一个。

函数优化的随机搜索

随机搜索也称为随机优化或随机采样。

随机搜索包括生成和评估目标函数的随机输入。它是有效的，因为它没有假设任何关于目标函数的结构。对于存在大量可能影响或影响优化策略的领域专业知识的问题，这可能是有益的，允许发现非直观的解决方案。

…随机采样，使用伪随机数发生器在设计空间上简单地抽取 m 个随机样本。为了生成随机样本 x，我们可以从分布中独立地对每个变量进行采样。

—第 236 页，优化算法，2019。

对于搜索空间中有噪声或不平滑(不连续)区域的高度复杂问题，随机搜索也可能是最佳策略，这些问题可能导致依赖可靠梯度的算法。

我们可以使用伪随机数发生器从一个域中生成随机样本。每个变量都需要一个定义明确的界限或范围，并且可以从该范围中采样一个统一的随机值，然后进行评估。

生成随机样本在计算上是微不足道的，并且不占用太多内存，因此，生成大量输入样本，然后对它们进行评估可能是有效的。每个样本都是独立的，因此如果需要加速过程，可以并行评估样本。

下面的例子给出了一个简单的一维最小化目标函数的例子，并生成然后评估 100 个输入的随机样本。然后报告具有最佳表现的输入。

# example of random search for function optimization
from numpy.random import rand

# objective function
def objective(x):
	return x**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# generate a random sample from the domain
sample = r_min + rand(100) * (r_max - r_min)
# evaluate the sample
sample_eval = objective(sample)
# locate the best solution
best_ix = 0
for i in range(len(sample)):
	if sample_eval[i] < sample_eval[best_ix]:
		best_ix = i
# summarize best solution
print('Best: f(%.5f) = %.5f' % (sample[best_ix], sample_eval[best_ix]))

运行该示例会生成输入值的随机样本，然后对其进行评估。然后确定并报告最佳表现点。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到结果非常接近 0.0 的最佳输入。

Best: f(-0.01762) = 0.00031

我们可以更新示例来绘制目标函数，并显示样本和最佳结果。下面列出了完整的示例。

# example of random search for function optimization with plot
from numpy import arange
from numpy.random import rand
from matplotlib import pyplot

# objective function
def objective(x):
	return x**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# generate a random sample from the domain
sample = r_min + rand(100) * (r_max - r_min)
# evaluate the sample
sample_eval = objective(sample)
# locate the best solution
best_ix = 0
for i in range(len(sample)):
	if sample_eval[i] < sample_eval[best_ix]:
		best_ix = i
# summarize best solution
print('Best: f(%.5f) = %.5f' % (sample[best_ix], sample_eval[best_ix]))
# sample input range uniformly at 0.1 increments
inputs = arange(r_min, r_max, 0.1)
# compute targets
results = objective(inputs)
# create a line plot of input vs result
pyplot.plot(inputs, results)
# plot the sample
pyplot.scatter(sample, sample_eval)
# draw a vertical line at the best input
pyplot.axvline(x=sample[best_ix], ls='--', color='red')
# show the plot
pyplot.show()

再次运行该示例会生成随机样本并报告最佳结果。

Best: f(0.01934) = 0.00037

然后创建一个线图，显示目标函数、随机样本的形状，以及从样本中找到最佳结果的红线。

Line Plot of One-Dimensional Objective Function With Random Sample

随机样本下一维目标函数的线图

函数优化的网格搜索

网格搜索也称为网格采样或全因子采样。

网格搜索包括为目标函数生成统一的网格输入。在一维中，这将是沿一条线均匀间隔的输入。在二维空间中，这将是一个由表面上均匀间隔的点组成的网格，对于更高的维度来说也是如此。

全因子采样计划在搜索空间上放置一个均匀间隔的点网格。这种方法易于实现，不依赖随机性，并且覆盖了空间，但是使用了大量的点。

—第 235 页，优化算法，2019。

像随机搜索一样，网格搜索在通常使用领域专业知识来影响特定优化算法的选择的问题上特别有效。网格可以帮助快速识别搜索空间中值得更多关注的区域。

样本网格通常是均匀的，尽管并非必须如此。例如，可以使用具有统一间距的对数-10 标度，从而允许跨数量级进行采样。

缺点是网格的粗糙程度可能会跨越搜索空间中存在好的解决方案的整个区域，随着问题的输入数量(搜索空间的维度)的增加，这个问题会变得更糟。

通过选择均匀的点间距，然后依次枚举每个变量，并按所选间距递增每个变量，可以生成样本网格。

下面的示例给出了一个简单的二维最小化目标函数的示例，然后生成并评估两个输入变量间距为 0.1 的网格样本。然后报告具有最佳表现的输入。

# example of grid search for function optimization
from numpy import arange
from numpy.random import rand

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# generate a grid sample from the domain
sample = list()
step = 0.1
for x in arange(r_min, r_max+step, step):
	for y in arange(r_min, r_max+step, step):
		sample.append([x,y])
# evaluate the sample
sample_eval = [objective(x,y) for x,y in sample]
# locate the best solution
best_ix = 0
for i in range(len(sample)):
	if sample_eval[i] < sample_eval[best_ix]:
		best_ix = i
# summarize best solution
print('Best: f(%.5f,%.5f) = %.5f' % (sample[best_ix][0], sample[best_ix][1], sample_eval[best_ix]))

运行该示例会生成一个输入值网格，然后对其进行评估。然后确定并报告最佳表现点。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到结果准确地找到了最优解。

Best: f(-0.00000,-0.00000) = 0.00000

我们可以更新示例来绘制目标函数，并显示样本和最佳结果。下面列出了完整的示例。

# example of grid search for function optimization with plot
from numpy import arange
from numpy import meshgrid
from numpy.random import rand
from matplotlib import pyplot

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# generate a grid sample from the domain
sample = list()
step = 0.5
for x in arange(r_min, r_max+step, step):
	for y in arange(r_min, r_max+step, step):
		sample.append([x,y])
# evaluate the sample
sample_eval = [objective(x,y) for x,y in sample]
# locate the best solution
best_ix = 0
for i in range(len(sample)):
	if sample_eval[i] < sample_eval[best_ix]:
		best_ix = i
# summarize best solution
print('Best: f(%.5f,%.5f) = %.5f' % (sample[best_ix][0], sample[best_ix][1], sample_eval[best_ix]))
# sample input range uniformly at 0.1 increments
xaxis = arange(r_min, r_max, 0.1)
yaxis = arange(r_min, r_max, 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a filled contour plot
pyplot.contourf(x, y, results, levels=50, cmap='jet')
# plot the sample as black circles
pyplot.plot([x for x,_ in sample], [y for _,y in sample], '.', color='black')
# draw the best result as a white star
pyplot.plot(sample[best_ix][0], sample[best_ix][1], '*', color='white')
# show the plot
pyplot.show()

再次运行该示例会生成网格示例并报告最佳结果。

Best: f(0.00000,0.00000) = 0.00000

然后创建一个等高线图，显示目标函数的形状，网格样本显示为黑点，白星显示样本的最佳结果。

请注意，域边缘的一些黑点似乎不在图中；这只是我们选择如何绘制点的一个假象(例如，不以样本为中心)。

Contour Plot of One-Dimensional Objective Function With Grid Sample

网格样本下一维目标函数的等高线图

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

书

优化算法，2019。

文章

摘要

在本教程中，您发现了函数优化的简单算法。

具体来说，您了解到:

朴素算法在函数优化项目中的作用。
如何生成和评估函数优化的随机搜索？
如何为函数优化生成和评估网格搜索？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

Python 中从零开始的简单遗传算法

原文：machinelearningmastery.com/simple-gene…

最后更新于 2021 年 10 月 12 日

遗传算法是一种随机全局优化算法。

它可能是最受欢迎和最广为人知的生物启发算法之一，与人工神经网络一起。

该算法是一种进化算法，通过具有二元表示的自然选择和基于遗传重组和遗传突变的简单算子，执行受生物进化理论启发的优化过程。

在本教程中，您将发现遗传算法优化算法。

完成本教程后，您将知道:

遗传算法是一种受进化启发的随机优化算法。
如何在 Python 中从头实现遗传算法？
如何将遗传算法应用于连续目标函数？

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. Simple Genetic Algorithm From Scratch in Python

Python 中从零开始的简单遗传算法图片由 Magharebia 提供，保留部分权利。

教程概述

本教程分为四个部分；它们是:

遗传算法
从零开始的遗传算法
OneMax 的遗传算法
连续函数优化的遗传算法

遗传算法

遗传算法是一种随机全局搜索优化算法。

它的灵感来自于通过自然选择进化的生物学理论。具体来说，就是将遗传学的理解和理论结合起来的新合成。

遗传算法(算法 9.4)从生物进化中借用灵感，在生物进化中，更健康的个体更有可能将他们的基因传递给下一代。

—第 148 页，优化算法，2019。

该算法使用遗传表示(位串)、适应度(功能评估)、遗传重组(位串交叉)和变异(翻转位)的类似物。

该算法首先创建一个固定大小的随机位串群体。算法的主循环重复固定的迭代次数，或者直到在给定的迭代次数上最佳解没有进一步的改进。

算法的一次迭代就像是进化的一代。

首先，使用目标函数评估位串(候选解)的总体。将每个候选解的目标函数评估作为解的适合度，该适合度可以最小化或最大化。

然后，根据他们的健康状况选择父母。给定的候选解可以被零次或多次用作父解。一种简单有效的选择方法包括从人群中随机抽取 k 个候选人，并从具有最佳适应度的组中选择成员。这被称为锦标赛选择，其中 k 是一个超参数，并被设置为一个值，如 3。这个简单的方法模拟了一个更昂贵的适合度比例选择方案。

在锦标赛选择中，每个父母是群体中随机选择的 k 条染色体中最合适的

—第 151 页，优化算法，2019。

父母被用作生成下一代候选点的基础，并且人口中的每个职位需要一个父母。

然后父母成对出现，用来生两个孩子。使用交叉算子执行重组。这包括在位串上选择一个随机分割点，然后创建一个子串，其位从第一个父串到分割点，从第二个父串到串的末尾。然后，对于第二个孩子，这个过程是相反的。

比如父母两个:

parent1 = 00000
parent2 = 11111

可能会导致两个交叉的孩子:

child1 = 00011
child2 = 11100

这被称为单点交叉，还有许多其他的操作符变体。

对每对父代应用概率交叉，这意味着在某些情况下，父代的副本被当作子代，而不是重组操作符。交叉由设置为较大值(如 80%或 90%)的超参数控制。

交叉是遗传算法的显著特点。它包括混合和匹配父母的部分来形成孩子。如何混合和匹配取决于个人的表现。

—第 36 页，元试探法精要，2011。

变异包括翻转已创建的子候选解中的位。通常，突变率设置为 1/L ，其中 L 是位串的长度。

二进制值染色体中的每一位通常都有很小的翻转概率。对于 m 位的染色体，这个突变率通常设置为 1/m，平均每个子染色体产生一个突变。

—第 155 页，优化算法，2019。

例如，如果一个问题使用 20 位的位串，那么良好的默认突变率将是(1/20) = 0.05 或概率为 5%。

这定义了简单的遗传算法程序。这是一个很大的研究领域，算法有很多扩展。

现在我们已经熟悉了简单的遗传算法过程，让我们看看如何从零开始实现它。

从零开始的遗传算法

在这一部分，我们将开发一个遗传算法的实现。

第一步是创建随机位串群体。我们可以使用布尔值真和假，字符串值“0”和“1”，或者整数值 0 和 1。在这种情况下，我们将使用整数值。

我们可以使用 randint()函数在一个范围内生成整数值数组，我们可以将该范围指定为从 0 开始小于 2 的值，例如 0 或 1。我们还将把一个候选解决方案表示为一个列表，而不是一个 NumPy 数组，以保持简单。

随机位串的初始群体可以如下创建，其中“ n_pop ”是控制群体大小的超参数，“ n_bits ”是定义单个候选解决方案中的位数的超参数:

...
# initial population of random bitstring
pop = [randint(0, 2, n_bits).tolist() for _ in range(n_pop)]

接下来，我们可以枚举固定次数的算法迭代，在这种情况下，由名为“ n_iter ”的超参数控制。

...
# enumerate generations
	for gen in range(n_iter):
		...

算法迭代的第一步是评估所有候选解。

我们将使用一个名为 objective() 的函数作为通用目标函数，并调用它来获得一个适应度得分，我们将最小化该得分。

...
# evaluate all candidates in the population
scores = [objective(c) for c in pop]

然后，我们可以选择将用于创建孩子的父母。

锦标赛选择过程可以实现为一个函数，该函数接受人口并返回一个选定的父代。 k 值用默认参数固定为 3，但是如果您愿意，可以尝试不同的值。

# tournament selection
def selection(pop, scores, k=3):
	# first random selection
	selection_ix = randint(len(pop))
	for ix in randint(0, len(pop), k-1):
		# check if better (e.g. perform a tournament)
		if scores[ix] < scores[selection_ix]:
			selection_ix = ix
	return pop[selection_ix]

然后，我们可以为群体中的每个位置调用这个函数一次，以创建一个父代列表。

...
# select parents
selected = [selection(pop, scores) for _ in range(n_pop)]

然后我们可以创造下一代。

这首先需要一个函数来执行交叉。这个函数将采用两个父节点和交叉率。交叉率是一个超参数，它决定是否执行交叉，如果不执行，父代将被复制到下一代。这是一个概率，通常具有接近 1.0 的大值。

下面的*交叉()*函数使用在范围[0，1]内抽取一个随机数来确定是否执行交叉，然后选择一个有效的分割点来执行交叉。

# crossover two parents to create two children
def crossover(p1, p2, r_cross):
	# children are copies of parents by default
	c1, c2 = p1.copy(), p2.copy()
	# check for recombination
	if rand() < r_cross:
		# select crossover point that is not on the end of the string
		pt = randint(1, len(p1)-2)
		# perform crossover
		c1 = p1[:pt] + p2[pt:]
		c2 = p2[:pt] + p1[pt:]
	return [c1, c2]

我们还需要一个功能来执行突变。

该过程只是以由“ r_mut ”超参数控制的低概率翻转位。

# mutation operator
def mutation(bitstring, r_mut):
	for i in range(len(bitstring)):
		# check for a mutation
		if rand() < r_mut:
			# flip the bit
			bitstring[i] = 1 - bitstring[i]

然后，我们可以遍历父代列表，创建一个子代列表作为下一代，根据需要调用交叉和变异函数。

...
# create the next generation
children = list()
for i in range(0, n_pop, 2):
	# get selected parents in pairs
	p1, p2 = selected[i], selected[i+1]
	# crossover and mutation
	for c in crossover(p1, p2, r_cross):
		# mutation
		mutation(c, r_mut)
		# store for next generation
		children.append(c)

我们可以将所有这些联系到一个名为 genetic_algorithm() 的函数中，该函数采用目标函数的名称和搜索的超参数，并返回搜索过程中找到的最佳解。

# genetic algorithm
def genetic_algorithm(objective, n_bits, n_iter, n_pop, r_cross, r_mut):
	# initial population of random bitstring
	pop = [randint(0, 2, n_bits).tolist() for _ in range(n_pop)]
	# keep track of best solution
	best, best_eval = 0, objective(pop[0])
	# enumerate generations
	for gen in range(n_iter):
		# evaluate all candidates in the population
		scores = [objective(c) for c in pop]
		# check for new best solution
		for i in range(n_pop):
			if scores[i] < best_eval:
				best, best_eval = pop[i], scores[i]
				print(">%d, new best f(%s) = %.3f" % (gen,  pop[i], scores[i]))
		# select parents
		selected = [selection(pop, scores) for _ in range(n_pop)]
		# create the next generation
		children = list()
		for i in range(0, n_pop, 2):
			# get selected parents in pairs
			p1, p2 = selected[i], selected[i+1]
			# crossover and mutation
			for c in crossover(p1, p2, r_cross):
				# mutation
				mutation(c, r_mut)
				# store for next generation
				children.append(c)
		# replace population
		pop = children
	return [best, best_eval]

现在我们已经开发了遗传算法的实现，让我们来探索如何将其应用于目标函数。

OneMax 的遗传算法

在这一节中，我们将把遗传算法应用于一个基于二进制字符串的优化问题。

这个问题叫做 OneMax，它根据字符串中 1 的数量来计算二进制字符串。例如，长度为 20 位的位串对于全 1 的串的得分为 20。

假设我们已经实现了最小化目标函数的遗传算法，我们可以给这个评估添加一个负号，使得大的正值变成大的负值。

下面的 onemax() 函数实现了这一点，它将整数值的位串作为输入，并返回这些值的负和。

# objective function
def onemax(x):
	return -sum(x)

接下来，我们可以配置搜索。

搜索将运行 100 次迭代，我们将在候选解决方案中使用 20 位，这意味着最佳适应度将为-20.0。

种群规模将为 100，我们将使用 90%的交叉率和 5%的变异率。这种配置是经过反复试验后选择的。

...
# define the total iterations
n_iter = 100
# bits
n_bits = 20
# define the population size
n_pop = 100
# crossover rate
r_cross = 0.9
# mutation rate
r_mut = 1.0 / float(n_bits)

然后可以调用搜索并报告最佳结果。

...
# perform the genetic algorithm search
best, score = genetic_algorithm(onemax, n_bits, n_iter, n_pop, r_cross, r_mut)
print('Done!')
print('f(%s) = %f' % (best, score))

将这些联系在一起，下面列出了将遗传算法应用于 OneMax 目标函数的完整示例。

# genetic algorithm search of the one max optimization problem
from numpy.random import randint
from numpy.random import rand

# objective function
def onemax(x):
	return -sum(x)

# tournament selection
def selection(pop, scores, k=3):
	# first random selection
	selection_ix = randint(len(pop))
	for ix in randint(0, len(pop), k-1):
		# check if better (e.g. perform a tournament)
		if scores[ix] < scores[selection_ix]:
			selection_ix = ix
	return pop[selection_ix]

# crossover two parents to create two children
def crossover(p1, p2, r_cross):
	# children are copies of parents by default
	c1, c2 = p1.copy(), p2.copy()
	# check for recombination
	if rand() < r_cross:
		# select crossover point that is not on the end of the string
		pt = randint(1, len(p1)-2)
		# perform crossover
		c1 = p1[:pt] + p2[pt:]
		c2 = p2[:pt] + p1[pt:]
	return [c1, c2]

# mutation operator
def mutation(bitstring, r_mut):
	for i in range(len(bitstring)):
		# check for a mutation
		if rand() < r_mut:
			# flip the bit
			bitstring[i] = 1 - bitstring[i]

# genetic algorithm
def genetic_algorithm(objective, n_bits, n_iter, n_pop, r_cross, r_mut):
	# initial population of random bitstring
	pop = [randint(0, 2, n_bits).tolist() for _ in range(n_pop)]
	# keep track of best solution
	best, best_eval = 0, objective(pop[0])
	# enumerate generations
	for gen in range(n_iter):
		# evaluate all candidates in the population
		scores = [objective(c) for c in pop]
		# check for new best solution
		for i in range(n_pop):
			if scores[i] < best_eval:
				best, best_eval = pop[i], scores[i]
				print(">%d, new best f(%s) = %.3f" % (gen,  pop[i], scores[i]))
		# select parents
		selected = [selection(pop, scores) for _ in range(n_pop)]
		# create the next generation
		children = list()
		for i in range(0, n_pop, 2):
			# get selected parents in pairs
			p1, p2 = selected[i], selected[i+1]
			# crossover and mutation
			for c in crossover(p1, p2, r_cross):
				# mutation
				mutation(c, r_mut)
				# store for next generation
				children.append(c)
		# replace population
		pop = children
	return [best, best_eval]

# define the total iterations
n_iter = 100
# bits
n_bits = 20
# define the population size
n_pop = 100
# crossover rate
r_cross = 0.9
# mutation rate
r_mut = 1.0 / float(n_bits)
# perform the genetic algorithm search
best, score = genetic_algorithm(onemax, n_bits, n_iter, n_pop, r_cross, r_mut)
print('Done!')
print('f(%s) = %f' % (best, score))

运行该示例将报告一路上找到的最佳结果，然后是搜索结束时的最终最佳解决方案，我们希望这是最佳解决方案。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到搜索在大约八代之后找到了最优解。

>0, new best f([1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]) = -14.000
>0, new best f([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0]) = -15.000
>1, new best f([1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1]) = -16.000
>2, new best f([0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1]) = -17.000
>2, new best f([1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) = -19.000
>8, new best f([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) = -20.000
Done!
f([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) = -20.000000

连续函数优化的遗传算法

优化 OneMax 函数不是很有趣；我们更可能想要优化连续函数。

例如，我们可以定义 x² 最小化函数，该函数接受输入变量，并且在 f(0，0) = 0.0 时具有最优值。

# objective function
def objective(x):
	return x[0]**2.0 + x[1]**2.0

我们可以用遗传算法最小化这个函数。

首先，我们必须定义每个输入变量的界限。

...
# define range for input
bounds = [[-5.0, 5.0], [-5.0, 5.0]]

我们将把“ n_bits ”超参数作为目标函数的每个输入变量的位数，并将其设置为 16 位。

...
# bits per variable
n_bits = 16

这意味着给定两个输入变量，我们的实际位串将有(16 * 2) = 32 位。

我们必须相应地更新我们的突变率。

...
# mutation rate
r_mut = 1.0 / (float(n_bits) * len(bounds))

接下来，我们需要确保初始群体创建足够大的随机位串。

...
# initial population of random bitstring
pop = [randint(0, 2, n_bits*len(bounds)).tolist() for _ in range(n_pop)]

最后，我们需要在用目标函数评估每个位串之前，将这些位串解码成数字。

我们可以通过首先将每个子串解码成整数，然后将整数缩放到所需的范围来实现这一点。这将给出一个范围内的值向量，然后可以提供给目标函数进行评估。

下面的 decode() 函数实现了这一点，将函数的边界、每个变量的位数和一个位串作为输入，并返回一个解码实值列表。

# decode bitstring to numbers
def decode(bounds, n_bits, bitstring):
	decoded = list()
	largest = 2**n_bits
	for i in range(len(bounds)):
		# extract the substring
		start, end = i * n_bits, (i * n_bits)+n_bits
		substring = bitstring[start:end]
		# convert bitstring to a string of chars
		chars = ''.join([str(s) for s in substring])
		# convert string to integer
		integer = int(chars, 2)
		# scale integer to desired range
		value = bounds[i][0] + (integer/largest) * (bounds[i][1] - bounds[i][0])
		# store
		decoded.append(value)
	return decoded

然后，我们可以在算法循环的开始调用它来解码种群，然后评估种群的解码版本。

...
# decode population
decoded = [decode(bounds, n_bits, p) for p in pop]
# evaluate all candidates in the population
scores = [objective(d) for d in decoded]

将这些联系在一起，下面列出了用于连续函数优化的遗传算法的完整示例。

# genetic algorithm search for continuous function optimization
from numpy.random import randint
from numpy.random import rand

# objective function
def objective(x):
	return x[0]**2.0 + x[1]**2.0

# decode bitstring to numbers
def decode(bounds, n_bits, bitstring):
	decoded = list()
	largest = 2**n_bits
	for i in range(len(bounds)):
		# extract the substring
		start, end = i * n_bits, (i * n_bits)+n_bits
		substring = bitstring[start:end]
		# convert bitstring to a string of chars
		chars = ''.join([str(s) for s in substring])
		# convert string to integer
		integer = int(chars, 2)
		# scale integer to desired range
		value = bounds[i][0] + (integer/largest) * (bounds[i][1] - bounds[i][0])
		# store
		decoded.append(value)
	return decoded

# tournament selection
def selection(pop, scores, k=3):
	# first random selection
	selection_ix = randint(len(pop))
	for ix in randint(0, len(pop), k-1):
		# check if better (e.g. perform a tournament)
		if scores[ix] < scores[selection_ix]:
			selection_ix = ix
	return pop[selection_ix]

# crossover two parents to create two children
def crossover(p1, p2, r_cross):
	# children are copies of parents by default
	c1, c2 = p1.copy(), p2.copy()
	# check for recombination
	if rand() < r_cross:
		# select crossover point that is not on the end of the string
		pt = randint(1, len(p1)-2)
		# perform crossover
		c1 = p1[:pt] + p2[pt:]
		c2 = p2[:pt] + p1[pt:]
	return [c1, c2]

# mutation operator
def mutation(bitstring, r_mut):
	for i in range(len(bitstring)):
		# check for a mutation
		if rand() < r_mut:
			# flip the bit
			bitstring[i] = 1 - bitstring[i]

# genetic algorithm
def genetic_algorithm(objective, bounds, n_bits, n_iter, n_pop, r_cross, r_mut):
	# initial population of random bitstring
	pop = [randint(0, 2, n_bits*len(bounds)).tolist() for _ in range(n_pop)]
	# keep track of best solution
	best, best_eval = 0, objective(decode(bounds, n_bits, pop[0]))
	# enumerate generations
	for gen in range(n_iter):
		# decode population
		decoded = [decode(bounds, n_bits, p) for p in pop]
		# evaluate all candidates in the population
		scores = [objective(d) for d in decoded]
		# check for new best solution
		for i in range(n_pop):
			if scores[i] < best_eval:
				best, best_eval = pop[i], scores[i]
				print(">%d, new best f(%s) = %f" % (gen,  decoded[i], scores[i]))
		# select parents
		selected = [selection(pop, scores) for _ in range(n_pop)]
		# create the next generation
		children = list()
		for i in range(0, n_pop, 2):
			# get selected parents in pairs
			p1, p2 = selected[i], selected[i+1]
			# crossover and mutation
			for c in crossover(p1, p2, r_cross):
				# mutation
				mutation(c, r_mut)
				# store for next generation
				children.append(c)
		# replace population
		pop = children
	return [best, best_eval]

# define range for input
bounds = [[-5.0, 5.0], [-5.0, 5.0]]
# define the total iterations
n_iter = 100
# bits per variable
n_bits = 16
# define the population size
n_pop = 100
# crossover rate
r_cross = 0.9
# mutation rate
r_mut = 1.0 / (float(n_bits) * len(bounds))
# perform the genetic algorithm search
best, score = genetic_algorithm(objective, bounds, n_bits, n_iter, n_pop, r_cross, r_mut)
print('Done!')
decoded = decode(bounds, n_bits, best)
print('f(%s) = %f' % (decoded, score))

运行该示例会报告沿途的最佳解码结果以及运行结束时的最佳解码解决方案。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到算法发现一个非常接近 f(0.0，0.0) = 0.0 的输入。

>0, new best f([-0.785064697265625, -0.807647705078125]) = 1.268621
>0, new best f([0.385894775390625, 0.342864990234375]) = 0.266471
>1, new best f([-0.342559814453125, -0.1068115234375]) = 0.128756
>2, new best f([-0.038909912109375, 0.30242919921875]) = 0.092977
>2, new best f([0.145721435546875, 0.1849365234375]) = 0.055436
>3, new best f([0.14404296875, -0.029754638671875]) = 0.021634
>5, new best f([0.066680908203125, 0.096435546875]) = 0.013746
>5, new best f([-0.036468505859375, -0.10711669921875]) = 0.012804
>6, new best f([-0.038909912109375, -0.099639892578125]) = 0.011442
>7, new best f([-0.033111572265625, 0.09674072265625]) = 0.010455
>7, new best f([-0.036468505859375, 0.05584716796875]) = 0.004449
>10, new best f([0.058746337890625, 0.008087158203125]) = 0.003517
>10, new best f([-0.031585693359375, 0.008087158203125]) = 0.001063
>12, new best f([0.022125244140625, 0.008087158203125]) = 0.000555
>13, new best f([0.022125244140625, 0.00701904296875]) = 0.000539
>13, new best f([-0.013885498046875, 0.008087158203125]) = 0.000258
>16, new best f([-0.011444091796875, 0.00518798828125]) = 0.000158
>17, new best f([-0.0115966796875, 0.00091552734375]) = 0.000135
>17, new best f([-0.004730224609375, 0.00335693359375]) = 0.000034
>20, new best f([-0.004425048828125, 0.00274658203125]) = 0.000027
>21, new best f([-0.002288818359375, 0.00091552734375]) = 0.000006
>22, new best f([-0.001983642578125, 0.00091552734375]) = 0.000005
>22, new best f([-0.001983642578125, 0.0006103515625]) = 0.000004
>24, new best f([-0.001373291015625, 0.001068115234375]) = 0.000003
>25, new best f([-0.001373291015625, 0.00091552734375]) = 0.000003
>26, new best f([-0.001373291015625, 0.0006103515625]) = 0.000002
>27, new best f([-0.001068115234375, 0.0006103515625]) = 0.000002
>29, new best f([-0.000152587890625, 0.00091552734375]) = 0.000001
>33, new best f([-0.0006103515625, 0.0]) = 0.000000
>34, new best f([-0.000152587890625, 0.00030517578125]) = 0.000000
>43, new best f([-0.00030517578125, 0.0]) = 0.000000
>60, new best f([-0.000152587890625, 0.000152587890625]) = 0.000000
>65, new best f([-0.000152587890625, 0.0]) = 0.000000
Done!
f([-0.000152587890625, 0.0]) = 0.000000

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

书

应用程序接口

num py . random . ranint API。

文章

摘要

在本教程中，您发现了遗传算法优化。

具体来说，您了解到:

遗传算法是一种受进化启发的随机优化算法。
如何在 Python 中从头实现遗传算法？
如何将遗传算法应用于连续目标函数？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

Python 中从零开始的模拟退火

原文：machinelearningmastery.com/simulated-a…

最后更新于 2021 年 10 月 12 日

模拟退火是一种随机全局搜索优化算法。

这意味着它利用随机性作为搜索过程的一部分。这使得该算法适用于其他局部搜索算法运行不佳的非线性目标函数。

像随机爬山局部搜索算法一样，它修改单个解并搜索搜索空间的相对局部区域，直到找到局部最优解。与爬山算法不同，它可能接受更差的解作为当前的工作解。

接受更差解决方案的可能性在搜索开始时很高，并随着搜索的进行而降低，这使得算法有机会首先定位全局最优区域，避开局部最优区域，然后爬山到最优区域本身。

在本教程中，您将发现用于函数优化的模拟退火优化算法。

完成本教程后，您将知道:

模拟退火是一种用于函数优化的随机全局搜索算法。
如何在 Python 中从头实现模拟退火算法？
如何使用模拟退火算法并检验算法的结果？

我们开始吧。

Simulated Annealing From Scratch in Python

Python 中的模拟退火从零开始图片由 Susanne Nilsson 提供，保留部分权利。

教程概述

本教程分为三个部分；它们是:

模拟退火
实现模拟退火
模拟退火工作示例

模拟退火

模拟退火是一种随机全局搜索优化算法。

该算法的灵感来自冶金中的退火，金属快速加热到高温，然后缓慢冷却，这增加了其强度，使其更容易处理。

退火过程的工作原理是首先在高温下激发材料中的原子，让原子四处移动，然后慢慢降低它们的兴奋度，让原子落入一个新的、更稳定的构型。

当热的时候，材料中的原子更自由地移动，并且通过随机运动，趋向于稳定在更好的位置。缓慢冷却使材料进入有序的结晶状态。

—第 128 页，优化算法，2019。

模拟退火优化算法可以看作是随机爬山的一种改进形式。

随机爬山保持单个候选解，并在搜索空间中从候选中采取随机但受限大小的步骤。如果新点优于当前点，则当前点将被新点替换。这个过程持续固定的迭代次数。

模拟退火以同样的方式执行搜索。主要区别在于，有时会接受不如当前点(更差点)的新点。

从概率上来说，接受比当前解更差的解的可能性是搜索温度和该解比当前解差多少的函数。

该算法不同于爬山算法，它决定什么时候用新调整的子解 R 来替换原来的候选解 S。具体来说:如果 R 比 S 好，我们会一如既往地用 R 代替 S。但是如果 R 比 S 差，我们还是有一定概率用 R 代替 S

—第 23 页，元试探法精要，2011。

搜索的初始温度作为超参数提供，并随着搜索的进行而降低。在从初始值到非常低的值的搜索期间，可以使用许多不同的方案(退火计划)来降低温度，尽管通常将温度作为迭代次数的函数来计算。

计算温度的一个流行例子是所谓的“快速模拟退火”，计算如下

温度=初始温度/(迭代次数+ 1)

在迭代次数从零开始的情况下，我们给迭代次数加一，以避免被零除的错误。

较差解的接受使用温度以及较差解和当前解的目标函数评估之间的差异。使用此信息计算 0 到 1 之间的值，表示接受更差解决方案的可能性。然后使用随机数对该分布进行采样，如果小于该值，则意味着接受更差的解。

正是这种被称为 Metropolis 准则的接受概率，使得算法在温度较高时能够避开局部极小值。

—第 128 页，优化算法，2019。

这被称为大都市接受标准，最小化的计算如下:

标准= exp( -(目标(新)-目标(当前))/温度)

其中 exp() 为 e (数学常数)升至所提供参数的幂，而目标(新)、*目标(当前)*为新的(更差的)和当前候选解的目标函数评估。

结果是，糟糕的解决方案在搜索的早期被接受的可能性更大，而在搜索的后期被接受的可能性较小。其目的是，搜索开始时的高温将有助于搜索定位全局最优的盆地，而搜索后期的低温将有助于算法磨练全局最优。

温度开始很高，允许过程在搜索空间内自由移动，希望在这个阶段过程会找到一个具有最佳局部最小值的好区域。然后温度慢慢降低，减少随机性，迫使搜索收敛到最小值

—第 128 页，优化算法，2019。

现在我们已经熟悉了模拟退火算法，让我们看看如何从零开始实现它。

实现模拟退火

在本节中，我们将探索如何从零开始实现模拟退火优化算法。

首先，我们必须定义我们的目标函数和目标函数的每个输入变量的界限。目标函数只是一个 Python 函数，我们将其命名为目标()。边界将是一个 2D 数组，每个输入变量都有一个维度，定义了变量的最小值和最大值。

例如，一维目标函数和边界可以定义如下:

# objective function
def objective(x):
	return 0

# define range for input
bounds = asarray([[-5.0, 5.0]])

接下来，我们可以将初始点生成为问题边界内的随机点，然后使用目标函数对其进行评估。

...
# generate an initial point
best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# evaluate the initial point
best_eval = objective(best)

我们需要保持“”当前的”解决方案，这是搜索的重点，可能会被更好的解决方案所取代。

...
# current working solution
curr, curr_eval = best, best_eval

现在，我们可以循环定义为“ n_iterations ”的算法的预定义迭代次数，例如 100 或 1000 次。

...
# run the algorithm
for i in range(n_iterations):
	...

算法迭代的第一步是从当前工作解生成新的候选解，例如，采取一个步骤。

这需要一个预定义的“步长”参数，该参数与搜索空间的边界相关。我们将采用高斯分布的随机步长，其中平均值是当前点，标准偏差由“步长定义。这意味着大约 99%的步骤将在当前点的 *3 步长之内。

...
# take a step
candidate = solution + randn(len(bounds)) * step_size

我们没有必要以这种方式采取措施。您可能希望在 0 和步长之间使用均匀分布。例如:

...
# take a step
candidate = solution + rand(len(bounds)) * step_size

接下来，我们需要对其进行评估。

...
# evaluate candidate point
candidate_eval = objective(candidate)

然后，我们需要检查这个新点的评估是否与当前最佳点一样好或更好，如果是，用这个新点替换我们当前的最佳点。

这与当前的工作解决方案是分开的，当前的工作解决方案是搜索的焦点。

...
# check for new best solution
if candidate_eval < best_eval:
	# store new best point
	best, best_eval = candidate, candidate_eval
	# report progress
	print('>%d f(%s) = %.5f' % (i, best, best_eval))

接下来，我们需要准备替换当前的工作解决方案。

第一步是计算当前解与当前工作解的目标函数评估之差。

...
# difference between candidate and current point evaluation
diff = candidate_eval - curr_eval

接下来，我们需要使用快速退火时间表计算当前温度，其中“ temp ”是作为参数提供的初始温度。

...
# calculate temperature for current epoch
t = temp / float(i + 1)

然后，我们可以计算接受表现比当前工作解决方案更差的解决方案的可能性。

...
# calculate metropolis acceptance criterion
metropolis = exp(-diff / t)

最后，如果新点具有更好的目标函数评估(差值为负)或者如果目标函数更差，我们可以接受新点作为当前工作解，但是我们可能决定接受它。

...
# check if we should keep the new point
if diff < 0 or rand() < metropolis:
	# store the new current point
	curr, curr_eval = candidate, candidate_eval

就这样。

我们可以将这个模拟退火算法实现为一个可重用的函数，该函数以目标函数的名称、每个输入变量的边界、总迭代次数、步长和初始温度为参数，并返回找到的最佳解及其评估。

# simulated annealing algorithm
def simulated_annealing(objective, bounds, n_iterations, step_size, temp):
	# generate an initial point
	best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate the initial point
	best_eval = objective(best)
	# current working solution
	curr, curr_eval = best, best_eval
	# run the algorithm
	for i in range(n_iterations):
		# take a step
		candidate = curr + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidate_eval = objective(candidate)
		# check for new best solution
		if candidate_eval < best_eval:
			# store new best point
			best, best_eval = candidate, candidate_eval
			# report progress
			print('>%d f(%s) = %.5f' % (i, best, best_eval))
		# difference between candidate and current point evaluation
		diff = candidate_eval - curr_eval
		# calculate temperature for current epoch
		t = temp / float(i + 1)
		# calculate metropolis acceptance criterion
		metropolis = exp(-diff / t)
		# check if we should keep the new point
		if diff < 0 or rand() < metropolis:
			# store the new current point
			curr, curr_eval = candidate, candidate_eval
	return [best, best_eval]

现在我们知道了如何在 Python 中实现模拟退火算法，让我们看看如何使用它来优化目标函数。

模拟退火工作示例

在本节中，我们将把模拟退火优化算法应用于目标函数。

首先，让我们定义我们的目标函数。

我们将使用一个简单的一维 x² 目标函数，其边界为[-5，5]。

下面的示例定义了函数，然后为输入值网格创建了函数响应面的线图，并用红线标记 f(0.0) = 0.0 处的 optima

# convex unimodal optimization function
from numpy import arange
from matplotlib import pyplot

# objective function
def objective(x):
	return x[0]**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# sample input range uniformly at 0.1 increments
inputs = arange(r_min, r_max, 0.1)
# compute targets
results = [objective([x]) for x in inputs]
# create a line plot of input vs result
pyplot.plot(inputs, results)
# define optimal input value
x_optima = 0.0
# draw a vertical line at the optimal input
pyplot.axvline(x=x_optima, ls='--', color='red')
# show the plot
pyplot.show()

运行该示例会创建目标函数的线图，并清楚地标记函数最优值。

Line Plot of Objective Function With Optima Marked With a Dashed Red Line

用红色虚线标记最优值的目标函数线图

在我们将优化算法应用到问题之前，让我们花点时间更好地理解接受标准。

首先，快速退火计划是迭代次数的指数函数。我们可以通过为每次算法迭代创建一个温度图来明确这一点。

我们将使用 10 和 100 次算法迭代的初始温度，两者都是任意选择的。

下面列出了完整的示例。

# explore temperature vs algorithm iteration for simulated annealing
from matplotlib import pyplot
# total iterations of algorithm
iterations = 100
# initial temperature
initial_temp = 10
# array of iterations from 0 to iterations - 1
iterations = [i for i in range(iterations)]
# temperatures for each iterations
temperatures = [initial_temp/float(i + 1) for i in iterations]
# plot iterations vs temperatures
pyplot.plot(iterations, temperatures)
pyplot.xlabel('Iteration')
pyplot.ylabel('Temperature')
pyplot.show()

运行该示例计算每次算法迭代的温度，并创建算法迭代(x 轴)与温度(y 轴)的关系图。

我们可以看到，温度快速下降，呈指数下降，而不是线性下降，因此在 20 次迭代后，温度低于 1，并在剩余的搜索中保持较低。

Line Plot of Temperature vs. Algorithm Iteration for Fast Annealing

快速退火温度对算法迭代的线图

接下来，我们可以更好地了解大都市的接受标准是如何随着温度的变化而变化的。

回想一下，该标准是温度的函数，但也是新点的客观评估与当前工作解决方案的差异的函数。因此，我们将绘制目标函数值中几个不同的差异的标准，以查看其对接受概率的影响。

下面列出了完整的示例。

# explore metropolis acceptance criterion for simulated annealing
from math import exp
from matplotlib import pyplot
# total iterations of algorithm
iterations = 100
# initial temperature
initial_temp = 10
# array of iterations from 0 to iterations - 1
iterations = [i for i in range(iterations)]
# temperatures for each iterations
temperatures = [initial_temp/float(i + 1) for i in iterations]
# metropolis acceptance criterion
differences = [0.01, 0.1, 1.0]
for d in differences:
	metropolis = [exp(-d/t) for t in temperatures]
	# plot iterations vs metropolis
	label = 'diff=%.2f' % d
	pyplot.plot(iterations, metropolis, label=label)
# inalize plot
pyplot.xlabel('Iteration')
pyplot.ylabel('Metropolis Criterion')
pyplot.legend()
pyplot.show()

运行该示例使用每次迭代显示的温度(如前一节所示)计算每次算法迭代的大都会验收标准。

该图有三条线，表示新的更差解决方案和当前工作解决方案之间的三个差异。

我们可以看到，解越差(差异越大)，无论算法迭代如何，模型接受更差解的可能性就越小，正如我们可能预期的那样。我们还可以看到，在所有情况下，接受更差解的可能性随着算法迭代而降低。

Line Plot of Metropolis Acceptance Criterion vs. Algorithm Iteration for Simulated Annealing

大都市验收标准线图与模拟退火算法迭代

现在我们更熟悉温度和大都会验收标准随时间的变化，让我们将模拟退火应用于我们的测试问题。

首先，我们将播种伪随机数发生器。

一般来说，这不是必需的，但是在这种情况下，我希望确保每次运行算法时都得到相同的结果(相同的随机数序列)，这样我们就可以在以后绘制结果。

...
# seed the pseudorandom number generator
seed(1)

接下来，我们可以定义搜索的配置。

在这种情况下，我们将搜索算法的 1000 次迭代，并使用 0.1 的步长。假设我们使用高斯函数来生成步长，这意味着大约 99%的所有步长将在给定点的(0.1 * 3)距离内，例如三个标准偏差。

我们还将使用 10.0 的初始温度。搜索过程对退火程序比对初始温度更敏感，因此，初始温度值几乎是任意的。

...
n_iterations = 1000
# define the maximum step size
step_size = 0.1
# initial temperature
temp = 10

接下来，我们可以执行搜索并报告结果。

...
# perform the simulated annealing search
best, score = simulated_annealing(objective, bounds, n_iterations, step_size, temp)
print('Done!')
print('f(%s) = %f' % (best, score))

将这些结合在一起，完整的示例如下所示。

# simulated annealing search of a one-dimensional objective function
from numpy import asarray
from numpy import exp
from numpy.random import randn
from numpy.random import rand
from numpy.random import seed

# objective function
def objective(x):
	return x[0]**2.0

# simulated annealing algorithm
def simulated_annealing(objective, bounds, n_iterations, step_size, temp):
	# generate an initial point
	best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate the initial point
	best_eval = objective(best)
	# current working solution
	curr, curr_eval = best, best_eval
	# run the algorithm
	for i in range(n_iterations):
		# take a step
		candidate = curr + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidate_eval = objective(candidate)
		# check for new best solution
		if candidate_eval < best_eval:
			# store new best point
			best, best_eval = candidate, candidate_eval
			# report progress
			print('>%d f(%s) = %.5f' % (i, best, best_eval))
		# difference between candidate and current point evaluation
		diff = candidate_eval - curr_eval
		# calculate temperature for current epoch
		t = temp / float(i + 1)
		# calculate metropolis acceptance criterion
		metropolis = exp(-diff / t)
		# check if we should keep the new point
		if diff < 0 or rand() < metropolis:
			# store the new current point
			curr, curr_eval = candidate, candidate_eval
	return [best, best_eval]

# seed the pseudorandom number generator
seed(1)
# define range for input
bounds = asarray([[-5.0, 5.0]])
# define the total iterations
n_iterations = 1000
# define the maximum step size
step_size = 0.1
# initial temperature
temp = 10
# perform the simulated annealing search
best, score = simulated_annealing(objective, bounds, n_iterations, step_size, temp)
print('Done!')
print('f(%s) = %f' % (best, score))

运行该示例会报告搜索进度，包括迭代次数、函数输入以及每次检测到改进时目标函数的响应。

在搜索结束时，找到最佳解决方案并报告其评估。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到算法的 1000 次迭代中有大约 20 次改进，并且有一个非常接近最佳输入 0.0 的解，其计算结果为 f(0.0) = 0.0。

>34 f([-0.78753544]) = 0.62021
>35 f([-0.76914239]) = 0.59158
>37 f([-0.68574854]) = 0.47025
>39 f([-0.64797564]) = 0.41987
>40 f([-0.58914623]) = 0.34709
>41 f([-0.55446029]) = 0.30743
>42 f([-0.41775702]) = 0.17452
>43 f([-0.35038542]) = 0.12277
>50 f([-0.15799045]) = 0.02496
>66 f([-0.11089772]) = 0.01230
>67 f([-0.09238208]) = 0.00853
>72 f([-0.09145261]) = 0.00836
>75 f([-0.05129162]) = 0.00263
>93 f([-0.02854417]) = 0.00081
>144 f([0.00864136]) = 0.00007
>149 f([0.00753953]) = 0.00006
>167 f([-0.00640394]) = 0.00004
>225 f([-0.00044965]) = 0.00000
>503 f([-0.00036261]) = 0.00000
>512 f([0.00013605]) = 0.00000
Done!
f([0.00013605]) = 0.000000

将搜索的进度作为一个线形图来回顾可能会很有趣，它显示了每次出现改进时最佳解决方案评估的变化。

我们可以更新*模拟退火()*来跟踪每次有改进的目标函数评估，并返回这个分数列表。

# simulated annealing algorithm
def simulated_annealing(objective, bounds, n_iterations, step_size, temp):
	# generate an initial point
	best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate the initial point
	best_eval = objective(best)
	# current working solution
	curr, curr_eval = best, best_eval
	# run the algorithm
	for i in range(n_iterations):
		# take a step
		candidate = curr + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidate_eval = objective(candidate)
		# check for new best solution
		if candidate_eval < best_eval:
			# store new best point
			best, best_eval = candidate, candidate_eval
			# keep track of scores
			scores.append(best_eval)
			# report progress
			print('>%d f(%s) = %.5f' % (i, best, best_eval))
		# difference between candidate and current point evaluation
		diff = candidate_eval - curr_eval
		# calculate temperature for current epoch
		t = temp / float(i + 1)
		# calculate metropolis acceptance criterion
		metropolis = exp(-diff / t)
		# check if we should keep the new point
		if diff < 0 or rand() < metropolis:
			# store the new current point
			curr, curr_eval = candidate, candidate_eval
	return [best, best_eval, scores]

然后，我们可以创建这些分数的折线图，以查看搜索过程中发现的每个改进的目标函数的相对变化。

...
# line plot of best scores
pyplot.plot(scores, '.-')
pyplot.xlabel('Improvement Number')
pyplot.ylabel('Evaluation f(x)')
pyplot.show()

将这些联系在一起，下面列出了在搜索过程中执行搜索并绘制改进解决方案的目标函数分数的完整示例。

# simulated annealing search of a one-dimensional objective function
from numpy import asarray
from numpy import exp
from numpy.random import randn
from numpy.random import rand
from numpy.random import seed
from matplotlib import pyplot

# objective function
def objective(x):
	return x[0]**2.0

# simulated annealing algorithm
def simulated_annealing(objective, bounds, n_iterations, step_size, temp):
	# generate an initial point
	best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate the initial point
	best_eval = objective(best)
	# current working solution
	curr, curr_eval = best, best_eval
	scores = list()
	# run the algorithm
	for i in range(n_iterations):
		# take a step
		candidate = curr + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidate_eval = objective(candidate)
		# check for new best solution
		if candidate_eval < best_eval:
			# store new best point
			best, best_eval = candidate, candidate_eval
			# keep track of scores
			scores.append(best_eval)
			# report progress
			print('>%d f(%s) = %.5f' % (i, best, best_eval))
		# difference between candidate and current point evaluation
		diff = candidate_eval - curr_eval
		# calculate temperature for current epoch
		t = temp / float(i + 1)
		# calculate metropolis acceptance criterion
		metropolis = exp(-diff / t)
		# check if we should keep the new point
		if diff < 0 or rand() < metropolis:
			# store the new current point
			curr, curr_eval = candidate, candidate_eval
	return [best, best_eval, scores]

# seed the pseudorandom number generator
seed(1)
# define range for input
bounds = asarray([[-5.0, 5.0]])
# define the total iterations
n_iterations = 1000
# define the maximum step size
step_size = 0.1
# initial temperature
temp = 10
# perform the simulated annealing search
best, score, scores = simulated_annealing(objective, bounds, n_iterations, step_size, temp)
print('Done!')
print('f(%s) = %f' % (best, score))
# line plot of best scores
pyplot.plot(scores, '.-')
pyplot.xlabel('Improvement Number')
pyplot.ylabel('Evaluation f(x)')
pyplot.show()

运行该示例执行搜索并像以前一样报告结果。

创建一个线形图，显示爬山搜索过程中每项改进的目标函数评估。在搜索过程中，我们可以看到目标函数评估的大约 20 个变化，随着算法收敛到最优值，这些变化在开始时很大，在搜索结束时很小，甚至察觉不到。

Line Plot of Objective Function Evaluation for Each Improvement During the Simulated Annealing Search

模拟退火搜索过程中每次改进的目标函数评估线图

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

模拟退火优化，1983。

书

优化算法，2019。
元试探法精要，2011。

文章

摘要

在本教程中，您发现了用于函数优化的模拟退火优化算法。

具体来说，您了解到:

模拟退火是一种用于函数优化的随机全局搜索算法。
如何在 Python 中从头实现模拟退火算法？
如何使用模拟退火算法并检验算法的结果？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。