Machine-Learning-Mastery-优化教程-五-Machine Learning Mastery 优化教

Machine Learning Mastery 优化教程（五）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

从零开始的 RMSProp 梯度下降

原文：machinelearningmastery.com/gradient-de…

最后更新于 2021 年 10 月 12 日

梯度下降是一种优化算法，它遵循目标函数的负梯度来定位函数的最小值。

梯度下降的一个限制是，它对每个输入变量使用相同的步长(学习率)。简而言之，AdaGrad 是梯度下降优化算法的扩展，它允许优化算法使用的每个维度中的步长根据搜索过程中变量(偏导数)的梯度自动调整。

AdaGrad 的一个限制是，在搜索结束时，它会导致每个参数的步长非常小，这可能会大大减慢搜索进度，并可能意味着找不到最优解。

均方根传播或均方根传播是梯度下降的扩展，是梯度下降的 AdaGrad 版本，它使用部分梯度的衰减平均值来适应每个参数的步长。衰减移动平均线的使用允许算法忘记早期的梯度，并专注于搜索过程中最近观察到的部分梯度，克服了 AdaGrad 的限制。

在本教程中，您将发现如何使用 RMSProp 优化算法从零开始开发梯度下降。

完成本教程后，您将知道:

梯度下降是一种优化算法，它使用目标函数的梯度来导航搜索空间。
梯度下降可以更新为使用一个自动自适应的步长为每个输入变量使用一个衰减的移动平均偏导数，称为 RMSProp。
如何从零开始实现 RMSProp 优化算法，并将其应用于目标函数并评估结果。

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. Gradient Descent With RMSProp from Scratch

带 RMSProp 的梯度下降从零开始照片由帕维尔·艾哈迈德拍摄，保留部分权利。

教程概述

本教程分为三个部分；它们是:

梯度下降
均方根传播
带 RMSProp 的梯度下降
1. 二维测试问题
2. 基于 RMSProp 的梯度下降优化
3. RMSProp 的可视化

梯度下降

梯度下降是一种优化算法。

它在技术上被称为一阶优化算法，因为它明确地利用了目标函数的一阶导数。

一阶方法依赖于梯度信息来帮助指导搜索最小值…

—第 69 页，优化算法，2019。

一阶导数，或简称为“导数”，是目标函数在特定点(例如特定输入)的变化率或斜率。

如果目标函数有多个输入变量，它被称为多元函数，输入变量可以被认为是一个向量。反过来，多元目标函数的导数也可以作为向量，并且通常被称为梯度。

梯度:多元目标函数的一阶导数。

对于特定输入，导数或梯度指向目标函数最陡上升的方向。

梯度下降是指一种最小化优化算法，它遵循目标函数梯度下降的负值来定位函数的最小值。

梯度下降算法需要一个正在优化的目标函数和目标函数的导数函数。目标函数 f() 返回给定输入集的得分，导数函数 f'() 给出给定输入集的目标函数的导数。

梯度下降算法要求问题中有一个起点( x )，比如输入空间中随机选择的一个点。

然后计算导数，并在输入空间中采取一个步骤，该步骤预计会导致目标函数的下坡运动，假设我们正在最小化目标函数。

下坡移动是通过首先计算在输入空间中移动多远来实现的，计算方法是步长(称为 alpha 或学习率)乘以梯度。然后从当前点减去这一点，确保我们逆着梯度或目标函数向下移动。

x = x–步长* f'(x)

给定点处的目标函数越陡，梯度的幅度就越大，反过来，搜索空间中的步长就越大。使用步长超参数来缩放所采取的步长。

步长(α):超参数，控制算法每次迭代在搜索空间中逆着梯度移动多远。

如果步长太小，搜索空间中的移动将会很小，并且搜索将会花费很长时间。如果步长过大，搜索可能会绕过搜索空间并跳过 optima。

现在我们已经熟悉了梯度下降优化算法，让我们来看看 RMSProp。

均方根传播

均方根传播，简称 RMSProp，是梯度下降优化算法的扩展。

这是一个未发表的扩展，首先在杰弗里·辛顿的神经网络课程讲义中描述，特别是第 6e 讲，题目是“ rmsprop:将梯度除以其最近幅度的运行平均值”

RMSProp 旨在加速优化过程，例如减少达到最优所需的函数求值次数，或者提高优化算法的能力，例如产生更好的最终结果。

它与梯度下降的另一个扩展有关，称为自适应梯度，或 AdaGrad。

AdaGrad 旨在专门探索为搜索空间中的每个参数自动定制步长(学习率)的想法。这是通过首先计算给定维度的步长，然后使用计算出的步长在该维度上使用偏导数进行移动来实现的。然后对搜索空间中的每个维度重复这个过程。

Adagrad 计算每个参数的步长，首先对搜索过程中所见参数的偏导数求和，然后将初始步长超参数除以偏导数平方和的平方根。

一个参数的自定义步长计算如下:

cust _ step _ size = step _ size/(1e-8+sqrt)

其中 cust_step_size 是搜索过程中给定点输入变量的计算步长， step_size 是初始步长， sqrt() 是平方根运算， s 是迄今为止搜索过程中看到的输入变量的平方偏导数之和。

这具有平滑搜索空间中曲率很大的优化问题的振荡的效果。

AdaGrad 根据平方梯度的整个历史来缩小学习率，在达到这样的凸结构之前，可能已经使学习率变得太小了。

—第 307-308 页，深度学习，2016。

AdaGrad 的一个问题是，它会使搜索速度减慢太多，导致运行结束时搜索的每个参数或维度的学习率非常小。这具有在找到最小值之前过早停止搜索的效果。

RMSProp 扩展了 Adagrad，以避免单调递减的学习率的影响。

—第 78 页，优化算法，2019。

RMSProp 可以被认为是 AdaGrad 的扩展，因为它在计算每个参数的学习率时使用偏导数的衰减平均值或移动平均值，而不是总和。

这是通过增加一个新的超参数来实现的，我们称之为ρ，它的作用类似于偏导数的动量。

RMSProp 保持平方梯度的衰减平均值。

—第 78 页，优化算法，2019。

使用偏导数的衰减移动平均允许搜索忘记早期的偏导数值，并且集中于搜索空间的最近看到的形状。

RMSProp 使用指数衰减的平均值来丢弃极端过去的历史，这样它就可以在找到一个凸碗后快速收敛，就好像它是在那个碗内初始化的 AdaGrad 算法的一个实例。

—第 308 页，深度学习，2016。

一个参数的均方偏导数计算如下:

s(t+1)=(s(t)*ρ)+(f'(x(t))² *(1.0-ρ))

其中 s(t+1)是算法当前迭代的一个参数的平方偏导数的衰减移动平均，s(t)是前一次迭代的衰减移动平均平方偏导数，f'(x(t))² 是当前参数的平方偏导数，ρ是超参数，通常具有类似动量的 0.9 的值。

假设我们使用的是偏导数的衰减平均值，并计算该平均值的平方根，这种技术就有了它的名字，例如，偏导数的均方根或均方根(RMS)。例如，参数的自定义步长可以写成:

cust _ step _ size(t+1)= step _ size/(1e-8+RMS(s(t+1)))

一旦我们有了参数的自定义步长，我们就可以使用自定义步长和偏导数*f’(x(t))*来更新参数。

x(t+1)= x(t)–cust _ step _ size(t+1)* f '(x(t))

然后对每个输入变量重复这个过程，直到在搜索空间中创建一个新的点并可以对其进行评估。

RMSProp 是梯度下降的非常有效的扩展，并且是通常用于拟合深度学习神经网络的优选方法之一。

经验证明，RMSProp 是一种有效而实用的深度神经网络优化算法。它是深度学习实践者经常使用的优化方法之一。

—第 308 页，深度学习，2016。

现在我们已经熟悉了 RMSprop 算法，让我们探索如何实现它并评估它的表现。

带 RMSProp 的梯度下降

在本节中，我们将探索如何使用 RMSProp 算法实现具有自适应梯度的梯度下降优化算法。

二维测试问题

首先，让我们定义一个优化函数。

我们将使用一个简单的二维函数，它对每个维度的输入进行平方，并定义从-1.0 到 1.0 的有效输入范围。

下面的*目标()*函数实现该功能

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

我们可以创建数据集的三维图，以获得对响应表面曲率的感觉。

下面列出了绘制目标函数的完整示例。

# 3d plot of the test function
from numpy import arange
from numpy import meshgrid
from matplotlib import pyplot

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

# define range for input
r_min, r_max = -1.0, 1.0
# sample input range uniformly at 0.1 increments
xaxis = arange(r_min, r_max, 0.1)
yaxis = arange(r_min, r_max, 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a surface plot with the jet color scheme
figure = pyplot.figure()
axis = figure.gca(projection='3d')
axis.plot_surface(x, y, results, cmap='jet')
# show the plot
pyplot.show()

运行该示例会创建目标函数的三维表面图。

我们可以看到熟悉的碗形，全局最小值在 f(0，0) = 0。

Three-Dimensional Plot of the Test Objective Function

测试目标函数的三维图

我们还可以创建函数的二维图。这将有助于我们以后绘制搜索进度。

以下示例创建了目标函数的等高线图。

# contour plot of the test function
from numpy import asarray
from numpy import arange
from numpy import meshgrid
from matplotlib import pyplot

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

# define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
# sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap='jet')
# show the plot
pyplot.show()

运行该示例会创建目标函数的二维等高线图。

我们可以看到碗的形状被压缩成带有颜色梯度的轮廓。我们将使用此图来绘制搜索过程中探索的具体点。

Two-Dimensional Contour Plot of the Test Objective Function

测试目标函数的二维等高线图

现在我们有了一个测试目标函数，让我们看看如何实现 RMSProp 优化算法。

基于 RMSProp 的梯度下降优化

我们可以将带有 RMSProp 的梯度下降应用于测试问题。

首先，我们需要一个函数来计算这个函数的导数。

f(x) = x²
f'(x) = x * 2

x² 的导数在每个维度上都是 x * 2。*导数()*函数实现如下。

# derivative of objective function
def derivative(x, y):
	return asarray([x * 2.0, y * 2.0])

接下来，我们可以实现梯度下降优化。

首先，我们可以在问题的边界中选择一个随机点作为搜索的起点。

这假设我们有一个定义搜索范围的数组，每个维度有一行，第一列定义维度的最小值，第二列定义维度的最大值。

...
# generate an initial point
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])

接下来，我们需要将每个维度的平方偏导数的衰减平均值初始化为 0.0。

...
# list of the average square gradients for each variable
sq_grad_avg = [0.0 for _ in range(bounds.shape[0])]

然后，我们可以枚举由“ n_iter ”超参数定义的搜索优化算法的固定迭代次数。

...
# run the gradient descent
for it in range(n_iter):
	...

第一步是使用*导数()*函数计算当前解的梯度。

...
# calculate gradient
gradient = derivative(solution[0], solution[1])

然后我们需要计算偏导数的平方，并用“ρ”超参数更新平方偏导数的衰减平均值。

...
# update the average of the squared partial derivatives
for i in range(gradient.shape[0]):
	# calculate the squared gradient
	sg = gradient[i]**2.0
	# update the moving average of the squared gradient
	sq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))

然后，我们可以使用平方偏导数和梯度的移动平均值来计算下一个点的步长。

我们将一次处理一个变量，首先计算变量的步长，然后计算变量的新值。这些值在一个数组中建立，直到我们有一个全新的解决方案，该解决方案使用自定义步长从当前点开始以最陡的下降方向下降。

...
# build a solution one variable at a time
new_solution = list()
for i in range(solution.shape[0]):
	# calculate the step size for this variable
	alpha = step_size / (1e-8 + sqrt(sq_grad_avg[i]))
	# calculate the new position in this variable
	value = solution[i] - alpha * gradient[i]
	# store this variable
	new_solution.append(value)

然后，可以使用*客观()*函数来评估这个新的解决方案，并且可以报告搜索的表现。

...
# evaluate candidate point
solution = asarray(new_solution)
solution_eval = objective(solution[0], solution[1])
# report progress
print('>%d f(%s) = %.5f' % (it, solution, solution_eval))

就这样。

我们可以将所有这些联系到一个名为 rmsprop() 的函数中，该函数采用目标函数和导数函数的名称，一个数组，该数组具有算法迭代总数和初始学习率的域和超参数值的边界，并返回最终解及其评估。

下面列出了完整的功能。

# gradient descent algorithm with rmsprop
def rmsprop(objective, derivative, bounds, n_iter, step_size, rho):
	# generate an initial point
	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# list of the average square gradients for each variable
	sq_grad_avg = [0.0 for _ in range(bounds.shape[0])]
	# run the gradient descent
	for it in range(n_iter):
		# calculate gradient
		gradient = derivative(solution[0], solution[1])
		# update the average of the squared partial derivatives
		for i in range(gradient.shape[0]):
			# calculate the squared gradient
			sg = gradient[i]**2.0
			# update the moving average of the squared gradient
			sq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))
		# build a solution one variable at a time
		new_solution = list()
		for i in range(solution.shape[0]):
			# calculate the step size for this variable
			alpha = step_size / (1e-8 + sqrt(sq_grad_avg[i]))
			# calculate the new position in this variable
			value = solution[i] - alpha * gradient[i]
			# store this variable
			new_solution.append(value)
		# evaluate candidate point
		solution = asarray(new_solution)
		solution_eval = objective(solution[0], solution[1])
		# report progress
		print('>%d f(%s) = %.5f' % (it, solution, solution_eval))
	return [solution, solution_eval]

注:为了可读性，我们特意使用了列表和命令式编码风格，而不是矢量化操作。请随意将该实现调整为使用 NumPy 数组的矢量化实现，以获得更好的表现。

然后，我们可以定义我们的超参数，并调用 rmsprop() 函数来优化我们的测试目标函数。

在这种情况下，我们将使用算法的 50 次迭代，初始学习率为 0.01，rho 超参数的值为 0.99，所有这些都是在一点点尝试和错误之后选择的。

...
# seed the pseudo random number generator
seed(1)
# define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
# define the total iterations
n_iter = 50
# define the step size
step_size = 0.01
# momentum for rmsprop
rho = 0.99
# perform the gradient descent search with rmsprop
best, score = rmsprop(objective, derivative, bounds, n_iter, step_size, rho)
print('Done!')
print('f(%s) = %f' % (best, score))

将所有这些结合在一起，下面列出了使用 RMSProp 进行梯度下降优化的完整示例。

# gradient descent optimization with rmsprop for a two-dimensional test function
from math import sqrt
from numpy import asarray
from numpy.random import rand
from numpy.random import seed

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

# derivative of objective function
def derivative(x, y):
	return asarray([x * 2.0, y * 2.0])

# gradient descent algorithm with rmsprop
def rmsprop(objective, derivative, bounds, n_iter, step_size, rho):
	# generate an initial point
	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# list of the average square gradients for each variable
	sq_grad_avg = [0.0 for _ in range(bounds.shape[0])]
	# run the gradient descent
	for it in range(n_iter):
		# calculate gradient
		gradient = derivative(solution[0], solution[1])
		# update the average of the squared partial derivatives
		for i in range(gradient.shape[0]):
			# calculate the squared gradient
			sg = gradient[i]**2.0
			# update the moving average of the squared gradient
			sq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))
		# build a solution one variable at a time
		new_solution = list()
		for i in range(solution.shape[0]):
			# calculate the step size for this variable
			alpha = step_size / (1e-8 + sqrt(sq_grad_avg[i]))
			# calculate the new position in this variable
			value = solution[i] - alpha * gradient[i]
			# store this variable
			new_solution.append(value)
		# evaluate candidate point
		solution = asarray(new_solution)
		solution_eval = objective(solution[0], solution[1])
		# report progress
		print('>%d f(%s) = %.5f' % (it, solution, solution_eval))
	return [solution, solution_eval]

# seed the pseudo random number generator
seed(1)
# define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
# define the total iterations
n_iter = 50
# define the step size
step_size = 0.01
# momentum for rmsprop
rho = 0.99
# perform the gradient descent search with rmsprop
best, score = rmsprop(objective, derivative, bounds, n_iter, step_size, rho)
print('Done!')
print('f(%s) = %f' % (best, score))

运行该示例将 RMSProp 优化算法应用于我们的测试问题，并报告算法每次迭代的搜索表现。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到，在大约 33 次搜索迭代后，找到了接近最优的解，输入值接近 0.0 和 0.0，评估为 0.0。

...
>30 f([-9.61030898e-14 3.19352553e-03]) = 0.00001
>31 f([-3.42767893e-14 2.71513758e-03]) = 0.00001
>32 f([-1.21143047e-14 2.30636623e-03]) = 0.00001
>33 f([-4.24204875e-15 1.95738936e-03]) = 0.00000
>34 f([-1.47154482e-15 1.65972553e-03]) = 0.00000
>35 f([-5.05629595e-16 1.40605727e-03]) = 0.00000
>36 f([-1.72064649e-16 1.19007691e-03]) = 0.00000
>37 f([-5.79813754e-17 1.00635204e-03]) = 0.00000
>38 f([-1.93445677e-17 8.50208253e-04]) = 0.00000
>39 f([-6.38906842e-18 7.17626999e-04]) = 0.00000
>40 f([-2.08860690e-18 6.05156738e-04]) = 0.00000
>41 f([-6.75689941e-19 5.09835645e-04]) = 0.00000
>42 f([-2.16291217e-19 4.29124484e-04]) = 0.00000
>43 f([-6.84948980e-20 3.60848338e-04]) = 0.00000
>44 f([-2.14551097e-20 3.03146089e-04]) = 0.00000
>45 f([-6.64629576e-21 2.54426642e-04]) = 0.00000
>46 f([-2.03575780e-21 2.13331041e-04]) = 0.00000
>47 f([-6.16437387e-22 1.78699710e-04]) = 0.00000
>48 f([-1.84495110e-22 1.49544152e-04]) = 0.00000
>49 f([-5.45667355e-23 1.25022522e-04]) = 0.00000
Done!
f([-5.45667355e-23 1.25022522e-04]) = 0.000000

RMSProp 的可视化

我们可以在域的等高线图上绘制搜索进度。

这可以为算法迭代过程中的搜索进度提供直觉。

我们必须更新 rmsprop()函数，以维护搜索过程中找到的所有解决方案的列表，然后在搜索结束时返回该列表。

下面列出了带有这些更改的功能的更新版本。

# gradient descent algorithm with rmsprop
def rmsprop(objective, derivative, bounds, n_iter, step_size, rho):
	# track all solutions
	solutions = list()
	# generate an initial point
	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# list of the average square gradients for each variable
	sq_grad_avg = [0.0 for _ in range(bounds.shape[0])]
	# run the gradient descent
	for it in range(n_iter):
		# calculate gradient
		gradient = derivative(solution[0], solution[1])
		# update the average of the squared partial derivatives
		for i in range(gradient.shape[0]):
			# calculate the squared gradient
			sg = gradient[i]**2.0
			# update the moving average of the squared gradient
			sq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))
		# build solution
		new_solution = list()
		for i in range(solution.shape[0]):
			# calculate the learning rate for this variable
			alpha = step_size / (1e-8 + sqrt(sq_grad_avg[i]))
			# calculate the new position in this variable
			value = solution[i] - alpha * gradient[i]
			new_solution.append(value)
		# store the new solution
		solution = asarray(new_solution)
		solutions.append(solution)
		# evaluate candidate point
		solution_eval = objective(solution[0], solution[1])
		# report progress
		print('>%d f(%s) = %.5f' % (it, solution, solution_eval))
	return solutions

然后，我们可以像以前一样执行搜索，这次检索解决方案列表，而不是最佳最终解决方案。

...
# seed the pseudo random number generator
seed(1)
# define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
# define the total iterations
n_iter = 50
# define the step size
step_size = 0.01
# momentum for rmsprop
rho = 0.99
# perform the gradient descent search with rmsprop
solutions = rmsprop(objective, derivative, bounds, n_iter, step_size, rho)

然后，我们可以像以前一样创建目标函数的等高线图。

...
# sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap='jet')

最后，我们可以将搜索过程中找到的每个解决方案绘制成由一条线连接的白点。

...
# plot the sample as black circles
solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], '.-', color='w')

将所有这些结合起来，下面列出了对测试问题执行 RMSProp 优化并将结果绘制在等高线图上的完整示例。

# example of plotting the rmsprop search on a contour plot of the test function
from math import sqrt
from numpy import asarray
from numpy import arange
from numpy.random import rand
from numpy.random import seed
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D

# objective function
def objective(x, y):
	return x**2.0 + y**2.0

# derivative of objective function
def derivative(x, y):
	return asarray([x * 2.0, y * 2.0])

# gradient descent algorithm with rmsprop
def rmsprop(objective, derivative, bounds, n_iter, step_size, rho):
	# track all solutions
	solutions = list()
	# generate an initial point
	solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# list of the average square gradients for each variable
	sq_grad_avg = [0.0 for _ in range(bounds.shape[0])]
	# run the gradient descent
	for it in range(n_iter):
		# calculate gradient
		gradient = derivative(solution[0], solution[1])
		# update the average of the squared partial derivatives
		for i in range(gradient.shape[0]):
			# calculate the squared gradient
			sg = gradient[i]**2.0
			# update the moving average of the squared gradient
			sq_grad_avg[i] = (sq_grad_avg[i] * rho) + (sg * (1.0-rho))
		# build solution
		new_solution = list()
		for i in range(solution.shape[0]):
			# calculate the learning rate for this variable
			alpha = step_size / (1e-8 + sqrt(sq_grad_avg[i]))
			# calculate the new position in this variable
			value = solution[i] - alpha * gradient[i]
			new_solution.append(value)
		# store the new solution
		solution = asarray(new_solution)
		solutions.append(solution)
		# evaluate candidate point
		solution_eval = objective(solution[0], solution[1])
		# report progress
		print('>%d f(%s) = %.5f' % (it, solution, solution_eval))
	return solutions

# seed the pseudo random number generator
seed(1)
# define range for input
bounds = asarray([[-1.0, 1.0], [-1.0, 1.0]])
# define the total iterations
n_iter = 50
# define the step size
step_size = 0.01
# momentum for rmsprop
rho = 0.99
# perform the gradient descent search with rmsprop
solutions = rmsprop(objective, derivative, bounds, n_iter, step_size, rho)
# sample input range uniformly at 0.1 increments
xaxis = arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = arange(bounds[1,0], bounds[1,1], 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a filled contour plot with 50 levels and jet color scheme
pyplot.contourf(x, y, results, levels=50, cmap='jet')
# plot the sample as black circles
solutions = asarray(solutions)
pyplot.plot(solutions[:, 0], solutions[:, 1], '.-', color='w')
# show the plot
pyplot.show()

运行该示例会像以前一样执行搜索，只是在这种情况下，会创建目标函数的等高线图。

在这种情况下，我们可以看到，搜索过程中找到的每个解决方案都显示一个白点，从 optima 上方开始，逐渐靠近图中心的 optima。

Contour Plot of the Test Objective Function With RMSProp Search Results Shown

显示 RMSProp 搜索结果的测试目标函数等高线图

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

报纸

第 6e 讲， rmsprop:用梯度最近大小的运行平均值来划分梯度，机器学习的神经网络，杰弗里·辛顿。

书

优化算法，2019。
深度学习，2016 年。

蜜蜂

num py . random . rand API。
num py . asar ray API。
Matplotlib API 。

文章

摘要

在本教程中，您发现了如何使用 RMSProp 优化算法从零开始开发梯度下降。

具体来说，您了解到:

梯度下降是一种优化算法，它使用目标函数的梯度来导航搜索空间。
梯度下降可以更新，以使用一个衰减的偏导数平均值(称为 RMSProp)为每个输入变量使用自动自适应步长。
如何从零开始实现 RMSProp 优化算法，并将其应用于目标函数并评估结果。

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

什么是机器学习中的梯度？

原文：machinelearningmastery.com/gradient-in…

最后更新于 2021 年 10 月 12 日

梯度是优化和机器学习中常用的术语。

例如，深度学习神经网络使用随机梯度下降进行拟合，许多用于拟合机器学习算法的标准优化算法使用梯度信息。

为了理解什么是梯度，你需要从微积分领域理解什么是导数。这包括如何计算导数和解释值。对导数的理解直接适用于理解如何计算和解释优化和机器学习中使用的梯度。

在本教程中，您将发现机器学习中导数和梯度的温和介绍。

完成本教程后，您将知道:

函数的导数是给定输入下函数的变化。
梯度只是多元函数的导数向量。
如何计算和解释简单函数的导数？

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. What Is a Gradient in Machine Learning?

什么是机器学习中的梯度？图片由roanesh提供，保留部分权利。

教程概述

本教程分为五个部分；它们是:

什么是导数？
什么是梯度？
计算导数的实例
如何解读导数
如何计算函数的导数

什么是导数？

在微积分中，导数是实值函数中给定点的变化率。

例如，函数 f() 对于变量 x 的导数 f'(x) 是函数 f() 在点 x 处变化的速率。

它可能变化很大，例如非常弯曲，或者可能变化很小，例如轻微弯曲，或者它可能根本不变，例如平坦或静止。

如果我们可以计算函数变量在所有输入点的导数，那么一个函数是可微的。不是所有的函数都是可微的。

一旦我们计算了导数，我们可以用很多方法来使用它。

例如，给定一个输入值 x 和在该点f’(x)的导数，我们可以使用导数估计函数 f(x) 在附近点δ_ x(在 x 中的变化)的值，如下所示:

f(x+δ_ x)= f(x)+f '(x)*δ_ x

在这里，我们可以看到 f'(x) 是一条线，我们通过δx沿着这条线移动来估计附近点的函数值。

我们可以在优化问题中使用导数，因为它们告诉我们如何以增加或减少函数输出的方式改变目标函数的输入，因此我们可以更接近函数的最小值或最大值。

导数在优化中很有用，因为它们提供了如何改变给定点以改进目标函数的信息。

—第 32 页，优化算法，2019。

找到可以用来逼近附近值的线是分化最初发展的主要原因。这条线被称为切线或函数在给定点的斜率。

寻找曲线的切线的问题涉及到寻找相同类型的极限………。这种特殊类型的极限被称为导数，我们将看到它可以被解释为任何科学或工程中的变化率。

—第 104 页，微积分，2015 年第 8 版。

下面提供了一个函数点切线的例子，摘自“优化算法”第 19 页

Tangent Line of a Function at a Given Point

函数在给定点的切线取自优化算法。

从技术上讲，到目前为止描述的导数被称为一阶导数或一阶导数。

二阶导数(或二阶导数)是导数函数的导数。即变化率的变化率或函数的变化量。

一阶导数:目标函数的变化率。
二阶导数:一阶导数函数的变化率。

二阶导数的一个自然用法是在一个邻近点近似一阶导数，就像我们可以用一阶导数来估计目标函数在一个邻近点的值一样。

现在我们知道导数是什么了，让我们来看看梯度。

什么是梯度？

一个梯度是一个有多个输入变量的函数的导数。

从线性代数领域的角度来看，这是一个用来指函数导数的术语。特别是线性代数遇到微积分的时候，叫做向量微积分。

梯度是多元函数导数的推广。它捕捉了函数的局部斜率，使我们能够预测从一个点向任何方向迈出一小步的效果。

—第 21 页，优化算法，2019。

多个输入变量一起定义了一个值向量，例如输入空间中可以提供给目标函数的一个点。

目标函数与输入变量向量的导数同样是向量。每个输入变量的导数向量就是梯度。

梯度(向量演算):取输入变量向量的函数的导数向量。

你可能还记得高中代数或微积分前，梯度也通常指二维图上一条线的斜率。

它的计算方法是函数的上升(y 轴上的变化)除以函数的运行(x 轴上的变化)，简化为以下规则:上升超过运行:

**斜率(代数):**直线的斜率，计算为上升超过运行。

我们可以看到，这是一个简单而粗略的一元函数导数的近似。微积分中的导数函数更精确，因为它使用极限来找到函数在某一点的精确斜率。这种来自代数的梯度思想与最优化和机器学习中使用的梯度思想相关，但并不直接有用。

采用多个输入变量的函数，例如输入变量的向量，可以被称为多元函数。

一个函数相对于一个变量的偏导数是假设所有其他输入变量保持不变的导数。

—第 21 页，优化算法，2019。

梯度(导数向量)中的每个分量称为目标函数的偏导数。

偏导数假设函数的所有其他变量保持不变。

偏导数:多元函数的一个变量的导数。

线性代数中处理方阵是很有用的，二阶导数的方阵称为黑森矩阵。

多元函数的 Hessian 是包含关于输入的所有二阶导数的矩阵

—第 21 页，优化算法，2019。

我们可以互换使用梯度和导数，尽管在优化和机器学习领域，我们通常使用“梯度”，因为我们通常关注多元函数。

导数的直觉直接转化为梯度，只是维数更多。

现在我们已经熟悉了导数和梯度的概念，让我们来看一个计算导数的工作示例。

计算导数的实例

让我们用一个工作实例来具体说明这个导数。

首先，让我们定义一个简单的一维函数，它对输入进行平方，并定义从-1.0 到 1.0 的有效输入范围。

f(x) = x²

以下示例以 0.1 的增量对此函数的输入进行采样，计算每个输入的函数值，并绘制结果图。

# plot of simple function
from numpy import arange
from matplotlib import pyplot

# objective function
def objective(x):
	return x**2.0

# define range for input
r_min, r_max = -1.0, 1.0
# sample input range uniformly at 0.1 increments
inputs = arange(r_min, r_max+0.1, 0.1)
# compute targets
results = objective(inputs)
# create a line plot of input vs result
pyplot.plot(inputs, results)
# show the plot
pyplot.show()

运行该示例会创建函数输入(x 轴)和函数计算输出(y 轴)的线图。

我们可以看到熟悉的 U 型叫抛物线。

Line Plot of Simple One Dimensional Function

简单一维函数的线图

我们可以在形状的边上看到一个很大的变化或陡峭的曲线，在函数的中间看到一个平坦的区域，在那里我们可以看到一个很小的导数。

让我们通过计算-0.5 和 0.5(陡峭)和 0.0(平坦)的导数来确认这些预期。

函数的导数计算如下:

f'(x) = x * 2

下面的例子为我们的目标函数计算了特定输入点的导数。

# calculate the derivative of the objective function

# derivative of objective function
def derivative(x):
	return x * 2.0

# calculate derivatives
d1 = derivative(-0.5)
print('f\'(-0.5) = %.3f' % d1)
d2 = derivative(0.5)
print('f\'(0.5) = %.3f' % d2)
d3 = derivative(0.0)
print('f\'(0.0) = %.3f' % d3)

运行该示例将打印特定输入值的导数值。

我们可以看到，函数陡点处的导数为-1 和 1，函数平坦部分的导数为 0.0。

f'(-0.5) = -1.000
f'(0.5) = 1.000
f'(0.0) = 0.000

既然我们知道了如何计算一个函数的导数，我们就来看看如何解释导数的值。

如何解读导数

导数的值可以解释为变化率(大小)和方向(符号)。

导数的大小:变化有多大。
导数符号:变化方向。

0.0 的导数表示目标函数没有变化，称为静止点。

函数可以有一个或多个静止点，函数的局部或全局最小值(谷底)或最大值(山峰)就是静止点的例子。

梯度指向切线超平面的最陡上升方向…

—第 21 页，优化算法，2019。

导数的符号告诉你目标函数在那个点是增加还是减少。

正导数:在那个点上函数在增加。
负导数:函数在该点递减

这可能会令人困惑，因为从上一节的图来看，函数 f(x)的值在 y 轴上增加了-0.5 和 0.5。

这里的技巧是始终从左到右读取函数的曲线，例如，从左到右跟随 y 轴上的值输入 x 值。

实际上，如果从左向右读，x=-0.5 附近的值会减少，因此会有负导数，x=0.5 附近的值会增加，因此会有正导数。

我们可以想象，如果我们想仅使用梯度信息来找到上一节中函数的最小值，那么如果梯度为负则增加 x 输入值以走下坡路，或者如果梯度为正则减少 x 输入值以走下坡路。

这是可以访问函数梯度信息的梯度下降(和梯度上升)类优化算法的基础。

现在我们知道如何解释导数值，让我们看看如何找到函数的导数。

如何计算函数的导数

求输出目标函数 f() 变化率的导数函数 f'() 称为微分。

有许多计算函数导数的方法(算法)。

在某些情况下，我们可以使用微积分工具来计算函数的导数，手动或使用自动求解器。

计算函数导数的一般技术包括:

SymPy Python 库可用于符号微分。

计算库如和张量流可用于自动微分。

*如果您的功能易于以纯文本形式指定，也可以使用在线服务。

一个例子是 Wolfram Alpha 网站，它将为您计算函数的导数；例如:

计算 x² 导数

并不是所有的函数都是可微的，一些可微的函数可能会使某些方法很难求导数。

计算函数的导数超出了本教程的范围。查阅一本好的微积分教科书，比如在进一步阅读部分的那些。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

书

优化算法，2019。
微积分，2017 年第 3 版。(吉尔伯特·斯特朗)
微积分，第 8 版，2015。(詹姆斯·斯图尔特)

文章

摘要

在本教程中，您发现了机器学习中导数和梯度的温和介绍。

具体来说，您了解到:

函数的导数是给定输入下函数的变化。
梯度只是多元函数的导数向量。
如何计算和解释简单函数的导数？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。*

如何在 Python 中使用 NelderMead 优化

原文：machinelearningmastery.com/how-to-use-…

最后更新于 2021 年 10 月 12 日

NelderMead 优化算法是一种广泛用于不可微目标函数的方法。

因此，它通常被称为模式搜索算法，并被用作局部或全局搜索过程，挑战非线性和潜在的噪声和多峰函数优化问题。

在本教程中，您将发现 NelderMead 优化算法。

完成本教程后，您将知道:

奈尔德-米德优化算法是一种不使用函数梯度的模式搜索。
如何在 Python 中应用 NelderMead 算法进行函数优化。
如何解释 NelderMead 算法对噪声和多模态目标函数的结果。

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. How to Use the Nelder-Mead Optimization in Python

如何使用 Python 中的 NelderMead 优化图片由唐·格雷厄姆提供，保留部分权利。

教程概述

本教程分为三个部分；它们是:

NelderMead 算法
Python 中的 NelderMead 示例
NelderMead 挑战函数
1. 噪声优化问题
2. 多模态优化问题

NelderMead 算法

NelderMead是一种优化算法，以该技术的开发者约翰内尔德和罗杰米德的名字命名。

该算法在他们 1965 年发表的题为“函数最小化的单纯形法”的论文中进行了描述，并已成为函数优化的标准和广泛使用的技术。

它适用于具有数字输入的一维或多维函数。

NelderMead 是一种模式搜索优化算法，这意味着它不需要或使用函数梯度信息，适用于函数梯度未知或无法合理计算的优化问题。

它经常被用于多维非线性函数优化问题，尽管它会陷入局部最优。

NelderMead 算法的实际表现通常是合理的，尽管已经观察到停滞出现在非最佳点。当检测到停滞时，可以使用重启。

—第 239 页，数值优化，2006。

必须为算法提供一个起点，该起点可以是另一个全局优化算法的终点，也可以是从域中提取的随机点。

考虑到算法可能会卡住，它可能会受益于具有不同起始点的多次重启。

奈尔德-米德单纯形法使用一个单纯形遍历空间以寻找最小值。

—第 105 页，优化算法，2019。

该算法通过使用由 n + 1 个点(顶点)组成的形状结构(称为单纯形)工作，其中 n 是函数的输入维数。

例如，在一个可以绘制为曲面的二维问题上，形状结构将由三个表示为三角形的点组成。

奈尔德-米德方法使用一系列规则，这些规则规定了如何根据目标函数在其顶点的评估来更新单纯形。

—第 106 页，优化算法，2019。

评估形状结构的点，并使用简单的规则来决定如何根据它们的相对评估来移动形状的点。这包括诸如目标函数表面上的单纯形形状的“反射”、“扩展、“收缩”和“收缩”的操作。

在 NelderMead 算法的单次迭代中，我们试图移除函数值最差的顶点，并用另一个具有更好值的点来替换它。通过沿连接最差顶点和其余顶点质心的直线反射、扩展或收缩单形来获得新点。如果我们不能以这种方式找到更好的点，我们只保留具有最佳函数值的顶点，并通过将所有其他顶点移向该值来收缩单纯形。

—第 238 页，数值优化，2006。

当点收敛到最优值时，当观察到评估之间的最小差异时，或者当执行最大数量的功能评估时，搜索停止。

现在，我们已经对算法的工作原理有了一个高层次的了解，让我们看看如何在实践中使用它。

Python 中的 NelderMead 示例

NelderMead 优化算法可以通过最小化()函数在 Python 中使用。

该函数要求将“方法”参数设置为“NelderMead”以使用 NelderMead 算法。它将目标函数最小化，并作为搜索的起始点。

...
# perform the search
result = minimize(objective, pt, method='nelder-mead')

结果是一个optimizer result对象，该对象包含可通过键访问的优化结果信息。

例如，“成功”布尔表示搜索是否成功完成，“消息提供关于搜索成功或失败的人类可读消息，“ nfev 键表示执行的功能评估的数量。

重要的是，“ x ”键指定输入值，如果搜索成功，该输入值指示通过搜索找到的最优值。

...
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
print('Solution: %s' % result['x'])

我们可以在一个表现良好的函数上演示 NelderMead 优化算法，以表明它可以快速有效地找到最优解，而无需使用函数的任何导数信息。

在这种情况下，我们将使用二维的 x² 函数，在-5.0 到 5.0 的范围内定义，已知的 optima 为[0.0，0.0]。

我们可以在下面定义*目标()*函数。

# objective function
def objective(x):
	return x[0]**2.0 + x[1]**2.0

我们将使用定义的域中的随机点作为搜索的起点。

...
# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(2) * (r_max - r_min)

然后可以执行搜索。我们使用通过“ maxiter 设置的默认最大函数求值次数，并设置为 N*200，其中 N 是输入变量的数量，在这种情况下为 2，例如 400 次求值。

...
# perform the search
result = minimize(objective, pt, method='nelder-mead')

搜索完成后，我们将报告用于查找 optima 的总功能评估和搜索成功消息，在这种情况下，我们希望是肯定的。

...
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])

最后，我们将检索已定位 optima 的输入值，使用目标函数对其进行评估，并以人类可读的方式报告两者。

...
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

将这些联系在一起，下面列出了在简单凸目标函数上使用 NelderMead 优化算法的完整示例。

# nelder-mead optimization of a convex function
from scipy.optimize import minimize
from numpy.random import rand

# objective function
def objective(x):
	return x[0]**2.0 + x[1]**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(2) * (r_max - r_min)
# perform the search
result = minimize(objective, pt, method='nelder-mead')
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

运行该示例会执行优化，然后报告结果。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到搜索是成功的，正如我们所料，并且在 88 次功能评估后完成。

我们可以看到 optima 位于输入非常接近[0，0]的位置，其评估为最小目标值 0.0。

Status: Optimization terminated successfully.
Total Evaluations: 88
Solution: f([ 2.25680716e-05 -3.87021351e-05]) = 0.00000

既然我们已经看到了如何成功地使用 NelderMead 优化算法，让我们看看一些它表现不太好的例子。

NelderMead 挑战函数

NelderMead 优化算法适用于一系列具有挑战性的非线性和不可微的目标函数。

然而，它会陷入多模态优化问题和噪声问题。

为了使这一点具体化，让我们看一个例子。

噪声优化问题

噪声目标函数是每次评估相同输入时给出不同答案的函数。

我们可以通过在评估之前向输入中添加小的高斯随机数来人为地制造一个有噪声的目标函数。

例如，我们可以定义一维版本的 x² 函数，并使用 randn()函数将均值为 0.0、标准差为 0.3 的小高斯随机数添加到输入中。

# objective function
def objective(x):
	return (x + randn(len(x))*0.3)**2.0

噪声将使函数难以优化算法，并且很可能无法在 x=0.0 时找到最优值。

下面列出了使用 NelderMead 优化噪声目标函数的完整示例。

# nelder-mead optimization of noisy one-dimensional convex function
from scipy.optimize import minimize
from numpy.random import rand
from numpy.random import randn

# objective function
def objective(x):
	return (x + randn(len(x))*0.3)**2.0

# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(1) * (r_max - r_min)
# perform the search
result = minimize(objective, pt, method='nelder-mead')
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

运行该示例会执行优化，然后报告结果。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，算法不收敛，而是使用最大函数求值次数，即 200。

Status: Maximum number of function evaluations has been exceeded.
Total Evaluations: 200
Solution: f([-0.6918238]) = 0.79431

该算法可能会在代码的某些运行中收敛，但会到达远离最优值的某个点。

多模态优化问题

许多非线性目标函数可能有多个最优解，称为多峰问题。

问题的结构可能是，它有多个具有等价函数求值的全局最优解，或者一个全局最优解和多个局部最优解，其中像 NelderMead 这样的算法可能会陷入局部最优解的搜索。

阿克利函数就是后者的一个例子。它是一个二维目标函数，在[0，0]有一个全局最优值，但有许多局部最优值。

下面的示例实现了阿克利，并创建了一个显示全局最优值和多个局部最优值的三维图。

# ackley multimodal function
from numpy import arange
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D

# objective function
def objective(x, y):
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# define range for input
r_min, r_max = -5.0, 5.0
# sample input range uniformly at 0.1 increments
xaxis = arange(r_min, r_max, 0.1)
yaxis = arange(r_min, r_max, 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a surface plot with the jet color scheme
figure = pyplot.figure()
axis = figure.gca(projection='3d')
axis.plot_surface(x, y, results, cmap='jet')
# show the plot
pyplot.show()

运行该示例会创建阿克利函数的曲面图，显示大量的局部最优值。

3D Surface Plot of the Ackley Multimodal Function

阿克利多峰函数的三维表面图

我们希望奈尔德-米德函数在寻找全局最优解时，会陷入局部最优解。

最初，当单纯形很大时，算法可能会跳过许多局部最优解，但当它收缩时，就会卡住。

我们可以通过下面的例子来探讨这一点，这个例子演示了阿克利函数的 NelderMead 算法。

# nelder-mead for multimodal function optimization
from scipy.optimize import minimize
from numpy.random import rand
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi

# objective function
def objective(v):
	x, y = v
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# define range for input
r_min, r_max = -5.0, 5.0
# define the starting point as a random sample from the domain
pt = r_min + rand(2) * (r_max - r_min)
# perform the search
result = minimize(objective, pt, method='nelder-mead')
# summarize the result
print('Status : %s' % result['message'])
print('Total Evaluations: %d' % result['nfev'])
# evaluate solution
solution = result['x']
evaluation = objective(solution)
print('Solution: f(%s) = %.5f' % (solution, evaluation))

运行该示例会执行优化，然后报告结果。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以看到搜索成功完成，但没有找到全局最优解。它卡住了，找到了一个本地的 optima。

每次我们运行这个例子，给定不同的搜索随机起点，我们会找到不同的局部最优解。

Status: Optimization terminated successfully.
Total Evaluations: 62
Solution: f([-4.9831427 -3.98656015]) = 11.90126

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

文章

摘要

在本教程中，您发现了 NelderMead 优化算法。

具体来说，您了解到:

奈尔德-米德优化算法是一种不使用函数梯度的模式搜索。
如何在 Python 中应用 NelderMead 算法进行函数优化。
如何解释 NelderMead 算法对噪声和多模态目标函数的结果。

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

函数优化的温和介绍

原文：machinelearningmastery.com/introductio…

最后更新于 2021 年 10 月 12 日

函数优化是一个基础研究领域，该技术几乎用于每个定量领域。

重要的是，函数优化是几乎所有机器学习算法和预测建模项目的核心。因此，理解什么是函数优化、该领域中使用的术语以及构成函数优化问题的要素是至关重要的。

在本教程中，您将发现函数优化的温和介绍。

完成本教程后，您将知道:

函数优化的三个要素，即候选解、目标函数和成本。
将函数优化概念化为导航搜索空间和响应面。
求解函数优化问题时全局最优与局部最优的区别。

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. A Gentle Introduction to Function Optimization

教程概述

本教程分为四个部分；它们是:

函数优化
候选解决方案
目标函数
评估成本

函数优化

函数优化是数学的一个分支，在现代是用数值计算方法解决的。

连续函数优化(“函数优化”这里简称)属于一个更广泛的研究领域叫做数学优化。

它不同于其他类型的优化，因为它涉及寻找由数字输入变量组成的最佳候选解，而不是由序列或组合组成的候选解(例如组合优化)。

函数优化是一个广泛使用的工具包，包含了几乎所有科学和工程学科采用的技术。

人优化。投资者寻求创建避免过度风险同时实现高回报率的投资组合。[……]优化是决策科学和物理系统分析中的重要工具。

—第 2 页，数值优化，2006。

它在机器学习中起着核心作用，因为几乎所有的机器学习算法都使用函数优化来使模型适合训练数据集。

例如，将一条线拟合到一组点需要解决一个优化问题。在训练数据集中拟合线性回归或神经网络模型也是如此。

通过这种方式，优化提供了一种工具来使一般模型适应特定的情况。学习被视为一个优化或搜索问题。

实际上，函数优化描述了一类寻找给定函数的输入的问题，该问题导致函数的最小或最大输出。

目标取决于系统的某些特征，称为变量或未知数。我们的目标是找到优化目标的变量值。

—第 2 页，数值优化，2006。

函数优化涉及三个要素:函数的输入(如 x )、目标函数本身(如 f() )和函数的输出(如成本)。

输入(x) :待评估函数的输入，例如候选解。
函数(f()) :评估输入的目标函数或目标函数。
成本:以最小化或最大化为目标函数对候选解进行评估的结果。

让我们依次仔细看看每个元素。

候选解决方案

候选解是目标函数的单一输入。

候选解的形式取决于目标函数的细节。它可以是单个浮点数、数字向量、数字矩阵，也可以是特定问题域所需的复杂数字。

最常见的是数字向量。对于一个测试问题，向量表示函数的每个输入变量的具体值( x = x1，x2，x3，…，xn )。对于机器学习模型，向量可以表示模型系数或权重。

从数学上讲，最优化是一个函数在变量约束下的最小化或最大化。

—第 2 页，数值优化，2006。

问题域或目标函数可能会对候选解施加约束。这可能包括以下方面:

变量的数量(1，20，1，000，000 等。)
变量的数据类型(整数、二进制、实值等)。)
可接受值的范围(在 0 和 1 之间，等等。)

重要的是，候选解决方案是离散的，并且有很多。

候选解决方案的范围可能很广，大到无法列举。相反，我们能做的最好的事情是在搜索空间中对候选解决方案进行采样。作为一名实践者，我们寻求一种优化算法，该算法最大限度地利用关于问题的可用信息，有效地对搜索空间进行采样，并找到一个好的或最佳的候选解决方案。

搜索空间:由目标函数的可接受输入的数量、类型和范围定义的候选解的宇宙。

最后，候选解可以根据目标函数对它们的评估进行排序，这意味着有些解比其他解更好。

目标函数

目标函数特定于问题域。

它可以是一个测试函数，例如一个众所周知的具有特定数量的输入变量的方程，其计算返回输入的成本。测试函数的最优值是已知的，允许算法基于它们有效导航搜索空间的能力进行比较。

在机器学习中，目标函数可能涉及将候选解插入模型中，并根据训练数据集的一部分对其进行评估，成本可能是错误分数，通常称为模型损失。

目标函数很容易定义，尽管评估起来很昂贵。函数优化的效率是指最小化函数求值的总数。

虽然目标函数很容易定义，但优化可能很有挑战性。目标函数的难度可能从能够直接使用微积分或线性代数解析求解该函数(简单)到使用局部搜索算法(中等)再到使用全局搜索算法(困难)。

目标函数的难度取决于对该函数的了解程度。这通常不能通过简单地回顾评估候选解的等式或代码来确定。相反，它指的是响应面的结构。

响应面(或搜索范围)是成本相对于候选解决方案搜索空间的几何结构。例如，平滑的响应面表明输入(候选解)的微小变化会导致目标函数的输出(成本)的微小变化。

响应面:响应候选解变化的目标函数的成本几何属性。

响应面可以低维可视化，例如对于具有一个或两个输入变量的候选解。一维输入可以绘制为 2D 散点图，输入值在 x 轴，成本在 y 轴。二维输入可以绘制为 3D 表面图，x 轴和 y 轴上有输入变量，表面高度代表成本。

在最小化问题中，差的解将在响应面中表示为小山，好的解将由山谷表示。为了最大化问题，这将是颠倒的。

这个响应面的结构和形状决定了算法在搜索空间中找到解决方案的难度。

真实目标函数的复杂性意味着我们不能分析表面，并且函数评估的输入和计算成本的高维数使得映射和绘制真实目标函数变得困难。

评估成本

候选解的代价几乎总是一个实数。

成本值的比例将根据目标函数的具体情况而变化。一般来说，成本值的唯一有意义的比较是与由同一目标函数计算的其他成本值进行比较。

函数的最小或最大输出称为函数的最优值，通常简化为最小值。我们希望最大化的任何函数都可以通过在函数返回的成本前面加上负号来转换为最小化。

在全局优化中，找到优化问题的真正全局解；折衷是效率。全局优化方法的最坏情况复杂性随着问题规模呈指数增长…

—第 10 页，凸优化，2004。

目标函数可能有一个最佳解，称为目标函数的全局最优解。或者，目标函数可能有许多全局最优解，在这种情况下，我们可能有兴趣定位它们中的一个或全部。

许多数值优化方法寻求局部极小值。局部极小值是局部最优的，但是我们通常不知道局部极小值是否是全局极小值。

—第 8 页，优化算法，2019。

除了全局最优解之外，一个函数可能还有局部最优解，这是很好的候选解，可能相对容易定位，但不如全局最优解。对于搜索算法来说，局部最优可能看起来是全局最优，例如可能在响应面的谷中，在这种情况下，我们可能称它们是欺骗性的，因为算法会很容易定位它们并卡住，无法定位全局最优。

全局最优:目标函数中成本最优的候选解。
本地 Optima 。候选解决方案很好，但不如全局最优。

成本值的相对性质意味着可以使用简单的搜索算法(例如随机)建立具有挑战性的问题的表现基线，并且可以相对于基线比较由更复杂的搜索算法找到的最优解的“优度”。

候选解决方案通常很容易描述，也很容易构建。函数优化的挑战部分是评估候选解。

求解函数优化问题或目标函数就是寻找最优解。项目的整个目标是找到一个有好的或最好的成本的特定候选解决方案，给出可用的时间和资源。在简单和适度的问题中，我们可能能够准确地定位最佳候选解，并且对我们这样做有一定的信心。

许多非线性优化问题的算法只寻求局部解，即目标函数小于附近所有其他可行点的点。它们并不总是找到全局解，全局解是所有可行点中函数值最低的点。在一些应用中需要全局解决方案，但是对于许多问题，它们很难识别，甚至更难定位。

—第 6 页，数值优化，2006。

对于更具挑战性的问题，考虑到项目可用的时间，我们可能会对相对较好的候选解决方案感到满意(例如足够好)。

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

书

优化算法，2019。
凸优化，2004。
数值优化，2006。

文章

数学优化，维基百科。

摘要

在本教程中，您发现了函数优化的温和介绍。

具体来说，您了解到:

函数优化的三个要素为候选解、目标函数和成本。
将函数优化概念化为导航搜索空间和响应面。
求解函数优化问题时全局最优与局部最优的区别。

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。

Python 中从零开始的迭代式局部搜索

原文：machinelearningmastery.com/iterated-lo…

最后更新于 2021 年 10 月 12 日

迭代局部搜索是一种随机全局优化算法。

它包括将本地搜索算法重复应用于先前找到的好解决方案的修改版本。这样，它就像一个随机爬山的聪明版本，随机重启算法。

该算法背后的直觉是，随机重启可以帮助定位问题中的许多局部最优解，并且更好的局部最优解通常接近其他局部最优解。因此，对现有局部最优值的适度扰动可以找到优化问题的更好甚至最好的解。

在本教程中，您将发现如何从零开始实现迭代局部搜索算法。

完成本教程后，您将知道:

迭代局部搜索是一种随机全局搜索优化算法，它是随机重新启动的随机爬山的更智能版本。
如何实现从零开始随机重启的随机爬山？
如何实现迭代局部搜索算法并将其应用于非线性目标函数？

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. Iterated Local Search From Scratch in Python

在 Python 中从零开始迭代本地搜索图片由 Susanne Nilsson 提供，保留部分权利。

教程概述

本教程分为五个部分；它们是:

什么是迭代局部搜索
阿克利目标函数
随机爬山算法
随机重新开始的随机爬山
迭代局部搜索算法

什么是迭代局部搜索

迭代局部搜索，简称 ILS，是一种随机全局搜索优化算法。

它与随机爬坡和随机开始的随机爬坡有关，或者是随机爬坡的延伸。

本质上，这是一个更聪明的随机重启爬山版本。

—第 26 页，元试探法精要，2011。

随机爬山是一种局部搜索算法，它涉及对现有解进行随机修改，并且只有当修改产生比当前工作解更好的结果时才接受修改。

局部搜索算法通常会陷入局部最优。解决这个问题的一种方法是从一个新的随机选择的起点重新开始搜索。重启过程可以执行多次，并且可以在固定数量的功能评估之后或者如果对于给定数量的算法迭代没有看到进一步的改进，则可以触发重启过程。这种算法被称为随机重新开始的随机爬山。

改进本地搜索发现的成本的最简单的可能性是从另一个起点重复搜索。

—第 132 页，元启发式手册，2019 年第 3 版。

迭代局部搜索类似于随机重新开始的随机爬山，除了不是为每次重新开始选择随机起点，而是基于在更广泛的搜索期间迄今为止找到的最佳点的修改版本来选择点。

到目前为止，最佳解的扰动就像是搜索空间中一个新区域的大跳跃，而随机爬山算法产生的扰动要小得多，局限于搜索空间的特定区域。

这里的启发是，你经常可以在你目前所处的位置附近找到更好的局部最优解，并且以这种方式从局部最优走到局部最优通常比完全随机地尝试新的位置要好。

—第 26 页，元试探法精要，2011。

这允许在两个级别执行搜索。爬山算法是局部搜索，以最大限度地利用搜索空间的特定候选解或区域，重启方法允许探索搜索空间的不同区域。

通过这种方式，迭代局部搜索算法探索搜索空间中的多个局部最优解，增加了定位全局最优解的可能性。

迭代局部搜索被提出用于组合优化问题，例如旅行商问题，尽管它可以通过在搜索空间中使用不同的步长应用于连续函数优化:用于爬山的较小步长和用于随机重启的较大步长。

现在我们已经熟悉了迭代局部搜索算法，让我们来探索如何从零开始实现该算法。

阿克利目标函数

首先，让我们定义一个通道优化问题，作为实现迭代局部搜索算法的基础。

阿克利函数是多模态目标函数的一个例子，它有一个全局最优解和多个局部最优解，局部搜索可能会陷入其中。

因此，需要一种全局优化技术。它是一个二维目标函数，其全局最优值为[0，0]，计算结果为 0.0。

下面的示例实现了 Ackley，并创建了一个显示全局最优值和多个局部最优值的三维曲面图。

# ackley multimodal function
from numpy import arange
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi
from numpy import meshgrid
from matplotlib import pyplot
from mpl_toolkits.mplot3d import Axes3D

# objective function
def objective(x, y):
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# define range for input
r_min, r_max = -5.0, 5.0
# sample input range uniformly at 0.1 increments
xaxis = arange(r_min, r_max, 0.1)
yaxis = arange(r_min, r_max, 0.1)
# create a mesh from the axis
x, y = meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a surface plot with the jet color scheme
figure = pyplot.figure()
axis = figure.gca(projection='3d')
axis.plot_surface(x, y, results, cmap='jet')
# show the plot
pyplot.show()

运行该示例会创建阿克利函数的曲面图，显示大量的局部最优值。

3D Surface Plot of the Ackley Multimodal Function

阿克利多峰函数的三维表面图

我们将以此为基础实现和比较一个简单的随机爬山算法，随机爬山与随机重启，最后迭代局部搜索。

我们预计随机爬山算法很容易陷入局部极小值。我们期望重新开始的随机爬山能够找到许多局部极小值，如果配置得当，我们期望迭代局部搜索在这个问题上比任何一种方法都表现得更好。

随机爬山算法

迭代局部搜索算法的核心是局部搜索，在本教程中，我们将为此目的使用随机爬山算法。

随机爬山算法包括首先生成随机起点和当前工作解，然后生成当前工作解的扰动版本，如果它们优于当前工作解，则接受它们。

假设我们正在处理一个连续优化问题，解是由目标函数评估的值的向量，在这种情况下，是二维空间中由-5 和 5 界定的点。

我们可以通过对具有均匀概率分布的搜索空间进行采样来生成随机点。例如:

...
# generate a random point in the search space
solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])

我们可以使用高斯概率分布生成当前工作解的扰动版本，该概率分布具有解中当前值的平均值和由超参数控制的标准偏差，该超参数控制允许搜索从当前工作解探索多远。

我们将这个超参数称为“步长”，例如:

...
# generate a perturbed version of a current working solution
candidate = solution + randn(len(bounds)) * step_size

重要的是，我们必须检查生成的解决方案是否在搜索空间内。

这可以通过名为 in_bounds() 的自定义函数来实现，该函数获取候选解和搜索空间的边界，如果该点在搜索空间中，则返回真，否则返回假*。*

# check if a point is within the bounds of the search
def in_bounds(point, bounds):
	# enumerate all dimensions of the point
	for d in range(len(bounds)):
		# check if out of bounds for this dimension
		if point[d] < bounds[d, 0] or point[d] > bounds[d, 1]:
			return False
	return True

然后可以在爬山过程中调用该函数，以确认新的点在搜索空间的边界内，如果不在，则可以生成新的点。

将这些联系在一起，下面的函数*爬山()*实现了随机爬山局部搜索算法。它将目标函数的名称、问题的边界、迭代次数和步长作为参数，并返回最佳解及其评估。

# hill climbing local search algorithm
def hillclimbing(objective, bounds, n_iterations, step_size):
	# generate an initial point
	solution = None
	while solution is None or not in_bounds(solution, bounds):
		solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate the initial point
	solution_eval = objective(solution)
	# run the hill climb
	for i in range(n_iterations):
		# take a step
		candidate = None
		while candidate is None or not in_bounds(candidate, bounds):
			candidate = solution + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidte_eval = objective(candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
	return [solution, solution_eval]

我们可以在阿克利函数上测试这个算法。

我们将固定伪随机数发生器的种子，以确保每次运行代码时得到相同的结果。

该算法将运行 1000 次迭代，将使用 0.05 个单位的步长；这两个超参数都是经过一点点反复试验后选择的。

运行结束时，我们将报告找到的最佳解决方案。

...
# seed the pseudorandom number generator
seed(1)
# define range for input
bounds = asarray([[-5.0, 5.0], [-5.0, 5.0]])
# define the total iterations
n_iterations = 1000
# define the maximum step size
step_size = 0.05
# perform the hill climbing search
best, score = hillclimbing(objective, bounds, n_iterations, step_size)
print('Done!')
print('f(%s) = %f' % (best, score))

下面列出了将随机爬山算法应用于阿克利目标函数的完整示例。

# hill climbing search of the ackley objective function
from numpy import asarray
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi
from numpy.random import randn
from numpy.random import rand
from numpy.random import seed

# objective function
def objective(v):
	x, y = v
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# check if a point is within the bounds of the search
def in_bounds(point, bounds):
	# enumerate all dimensions of the point
	for d in range(len(bounds)):
		# check if out of bounds for this dimension
		if point[d] < bounds[d, 0] or point[d] > bounds[d, 1]:
			return False
	return True

# hill climbing local search algorithm
def hillclimbing(objective, bounds, n_iterations, step_size):
	# generate an initial point
	solution = None
	while solution is None or not in_bounds(solution, bounds):
		solution = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate the initial point
	solution_eval = objective(solution)
	# run the hill climb
	for i in range(n_iterations):
		# take a step
		candidate = None
		while candidate is None or not in_bounds(candidate, bounds):
			candidate = solution + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidte_eval = objective(candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
			# report progress
			print('>%d f(%s) = %.5f' % (i, solution, solution_eval))
	return [solution, solution_eval]

# seed the pseudorandom number generator
seed(1)
# define range for input
bounds = asarray([[-5.0, 5.0], [-5.0, 5.0]])
# define the total iterations
n_iterations = 1000
# define the maximum step size
step_size = 0.05
# perform the hill climbing search
best, score = hillclimbing(objective, bounds, n_iterations, step_size)
print('Done!')
print('f(%s) = %f' % (best, score))

运行示例在目标函数上执行随机爬山搜索。搜索过程中发现的每个改进都会被报告，然后在搜索结束时报告最佳解决方案。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以在搜索过程中看到大约 13 个改进，最终的解约为 f(-0.981，1.965)，得到的评估值约为 5.381，与 f(0.0，0.0) = 0 相差甚远。

>0 f([-0.85618854 2.1495965 ]) = 6.46986
>1 f([-0.81291816 2.03451957]) = 6.07149
>5 f([-0.82903902 2.01531685]) = 5.93526
>7 f([-0.83766043 1.97142393]) = 5.82047
>9 f([-0.89269139 2.02866012]) = 5.68283
>12 f([-0.8988359 1.98187164]) = 5.55899
>13 f([-0.9122303 2.00838942]) = 5.55566
>14 f([-0.94681334 1.98855174]) = 5.43024
>15 f([-0.98117198 1.94629146]) = 5.39010
>23 f([-0.97516403 1.97715161]) = 5.38735
>39 f([-0.98628044 1.96711371]) = 5.38241
>362 f([-0.9808789 1.96858459]) = 5.38233
>629 f([-0.98102417 1.96555308]) = 5.38194
Done!
f([-0.98102417 1.96555308]) = 5.381939

接下来，我们将修改算法以执行随机重启，并看看我们是否能获得更好的结果。

随机重新开始的随机爬山

具有随机重启的随机爬山算法包括重复运行随机爬山算法和跟踪找到的最佳解。

首先，我们修改*爬山()*函数，取搜索的起点，而不是随机生成。这将有助于我们稍后实现迭代局部搜索算法。

# hill climbing local search algorithm
def hillclimbing(objective, bounds, n_iterations, step_size, start_pt):
	# store the initial point
	solution = start_pt
	# evaluate the initial point
	solution_eval = objective(solution)
	# run the hill climb
	for i in range(n_iterations):
		# take a step
		candidate = None
		while candidate is None or not in_bounds(candidate, bounds):
			candidate = solution + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidte_eval = objective(candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
	return [solution, solution_eval]

接下来，我们可以通过重复调用*爬山()*函数固定次数来实现随机重启算法。

每次通话，我们都会为爬山搜索生成一个新的随机选择的起点。

...
# generate a random initial point for the search
start_pt = None
while start_pt is None or not in_bounds(start_pt, bounds):
	start_pt = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
# perform a stochastic hill climbing search
solution, solution_eval = hillclimbing(objective, bounds, n_iter, step_size, start_pt)

然后，我们可以检查结果，如果它比我们迄今为止看到的任何搜索结果都好，就保留它。

...
# check for new best
if solution_eval < best_eval:
	best, best_eval = solution, solution_eval
print('Restart %d, best: f(%s) = %.5f' % (n, best, best_eval))

将这些联系在一起，*random _ restaults()*函数实现了随机重启的随机爬山算法。

# hill climbing with random restarts algorithm
def random_restarts(objective, bounds, n_iter, step_size, n_restarts):
	best, best_eval = None, 1e+10
	# enumerate restarts
	for n in range(n_restarts):
		# generate a random initial point for the search
		start_pt = None
		while start_pt is None or not in_bounds(start_pt, bounds):
			start_pt = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
		# perform a stochastic hill climbing search
		solution, solution_eval = hillclimbing(objective, bounds, n_iter, step_size, start_pt)
		# check for new best
		if solution_eval < best_eval:
			best, best_eval = solution, solution_eval
			print('Restart %d, best: f(%s) = %.5f' % (n, best, best_eval))
	return [best, best_eval]

然后，我们可以将该算法应用于阿克利目标函数。在这种情况下，我们将随机重启的次数限制为 30 次，可以任意选择。

下面列出了完整的示例。

# hill climbing search with random restarts of the ackley objective function
from numpy import asarray
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi
from numpy.random import randn
from numpy.random import rand
from numpy.random import seed

# objective function
def objective(v):
	x, y = v
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# check if a point is within the bounds of the search
def in_bounds(point, bounds):
	# enumerate all dimensions of the point
	for d in range(len(bounds)):
		# check if out of bounds for this dimension
		if point[d] < bounds[d, 0] or point[d] > bounds[d, 1]:
			return False
	return True

# hill climbing local search algorithm
def hillclimbing(objective, bounds, n_iterations, step_size, start_pt):
	# store the initial point
	solution = start_pt
	# evaluate the initial point
	solution_eval = objective(solution)
	# run the hill climb
	for i in range(n_iterations):
		# take a step
		candidate = None
		while candidate is None or not in_bounds(candidate, bounds):
			candidate = solution + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidte_eval = objective(candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
	return [solution, solution_eval]

# hill climbing with random restarts algorithm
def random_restarts(objective, bounds, n_iter, step_size, n_restarts):
	best, best_eval = None, 1e+10
	# enumerate restarts
	for n in range(n_restarts):
		# generate a random initial point for the search
		start_pt = None
		while start_pt is None or not in_bounds(start_pt, bounds):
			start_pt = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
		# perform a stochastic hill climbing search
		solution, solution_eval = hillclimbing(objective, bounds, n_iter, step_size, start_pt)
		# check for new best
		if solution_eval < best_eval:
			best, best_eval = solution, solution_eval
			print('Restart %d, best: f(%s) = %.5f' % (n, best, best_eval))
	return [best, best_eval]

# seed the pseudorandom number generator
seed(1)
# define range for input
bounds = asarray([[-5.0, 5.0], [-5.0, 5.0]])
# define the total iterations
n_iter = 1000
# define the maximum step size
step_size = 0.05
# total number of random restarts
n_restarts = 30
# perform the hill climbing search
best, score = random_restarts(objective, bounds, n_iter, step_size, n_restarts)
print('Done!')
print('f(%s) = %f' % (best, score))

运行该示例将执行随机爬山，随机重新开始搜索阿克利目标函数。每次发现改进的整体解决方案时，都会进行报告，并总结通过搜索找到的最终最佳解决方案。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以在搜索过程中看到三个改进，并且找到的最佳解决方案大约是 f(0.002，0.002)，其评估为大约 0.009，这比爬山算法的单次运行好得多。

Restart 0, best: f([-0.98102417 1.96555308]) = 5.38194
Restart 2, best: f([1.96522236 0.98120013]) = 5.38191
Restart 4, best: f([0.00223194 0.00258853]) = 0.00998
Done!
f([0.00223194 0.00258853]) = 0.009978

接下来，让我们看看如何实现迭代局部搜索算法。

迭代局部搜索算法

迭代局部搜索算法是随机爬山和随机重启算法的改进版本。

重要的区别在于，随机爬山算法的每次应用的起点是迄今为止找到的最佳点的扰动版本。

我们可以使用*random _ Reuters()*函数作为起点来实现这个算法。每次重启迭代，我们可以生成一个到目前为止找到的最佳解决方案的修改版本，而不是一个随机的起点。

这可以通过使用步长超参数来实现，就像随机爬山器中使用的一样。在这种情况下，考虑到在搜索空间中需要更大的扰动，将使用更大的步长值。

...
# generate an initial point as a perturbed version of the last best
start_pt = None
while start_pt is None or not in_bounds(start_pt, bounds):
	start_pt = best + randn(len(bounds)) * p_size

将这些联系在一起，下面定义了*迭代 _ 局部 _ 搜索()*函数。

# iterated local search algorithm
def iterated_local_search(objective, bounds, n_iter, step_size, n_restarts, p_size):
	# define starting point
	best = None
	while best is None or not in_bounds(best, bounds):
		best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate current best point
	best_eval = objective(best)
	# enumerate restarts
	for n in range(n_restarts):
		# generate an initial point as a perturbed version of the last best
		start_pt = None
		while start_pt is None or not in_bounds(start_pt, bounds):
			start_pt = best + randn(len(bounds)) * p_size
		# perform a stochastic hill climbing search
		solution, solution_eval = hillclimbing(objective, bounds, n_iter, step_size, start_pt)
		# check for new best
		if solution_eval < best_eval:
			best, best_eval = solution, solution_eval
			print('Restart %d, best: f(%s) = %.5f' % (n, best, best_eval))
	return [best, best_eval]

然后，我们可以将该算法应用于阿克利目标函数。在这种情况下，我们将使用一个更大的步长值 1.0 来进行随机重启，这是在一点点反复试验后选择的。

下面列出了完整的示例。

# iterated local search of the ackley objective function
from numpy import asarray
from numpy import exp
from numpy import sqrt
from numpy import cos
from numpy import e
from numpy import pi
from numpy.random import randn
from numpy.random import rand
from numpy.random import seed

# objective function
def objective(v):
	x, y = v
	return -20.0 * exp(-0.2 * sqrt(0.5 * (x**2 + y**2))) - exp(0.5 * (cos(2 * pi * x) + cos(2 * pi * y))) + e + 20

# check if a point is within the bounds of the search
def in_bounds(point, bounds):
	# enumerate all dimensions of the point
	for d in range(len(bounds)):
		# check if out of bounds for this dimension
		if point[d] < bounds[d, 0] or point[d] > bounds[d, 1]:
			return False
	return True

# hill climbing local search algorithm
def hillclimbing(objective, bounds, n_iterations, step_size, start_pt):
	# store the initial point
	solution = start_pt
	# evaluate the initial point
	solution_eval = objective(solution)
	# run the hill climb
	for i in range(n_iterations):
		# take a step
		candidate = None
		while candidate is None or not in_bounds(candidate, bounds):
			candidate = solution + randn(len(bounds)) * step_size
		# evaluate candidate point
		candidte_eval = objective(candidate)
		# check if we should keep the new point
		if candidte_eval <= solution_eval:
			# store the new point
			solution, solution_eval = candidate, candidte_eval
	return [solution, solution_eval]

# iterated local search algorithm
def iterated_local_search(objective, bounds, n_iter, step_size, n_restarts, p_size):
	# define starting point
	best = None
	while best is None or not in_bounds(best, bounds):
		best = bounds[:, 0] + rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
	# evaluate current best point
	best_eval = objective(best)
	# enumerate restarts
	for n in range(n_restarts):
		# generate an initial point as a perturbed version of the last best
		start_pt = None
		while start_pt is None or not in_bounds(start_pt, bounds):
			start_pt = best + randn(len(bounds)) * p_size
		# perform a stochastic hill climbing search
		solution, solution_eval = hillclimbing(objective, bounds, n_iter, step_size, start_pt)
		# check for new best
		if solution_eval < best_eval:
			best, best_eval = solution, solution_eval
			print('Restart %d, best: f(%s) = %.5f' % (n, best, best_eval))
	return [best, best_eval]

# seed the pseudorandom number generator
seed(1)
# define range for input
bounds = asarray([[-5.0, 5.0], [-5.0, 5.0]])
# define the total iterations
n_iter = 1000
# define the maximum step size
s_size = 0.05
# total number of random restarts
n_restarts = 30
# perturbation step size
p_size = 1.0
# perform the hill climbing search
best, score = iterated_local_search(objective, bounds, n_iter, s_size, n_restarts, p_size)
print('Done!')
print('f(%s) = %f' % (best, score))

运行该示例将执行阿克利目标函数的迭代局部搜索。

每次发现改进的整体解决方案时，都会进行报告，并在运行结束时总结通过搜索找到的最终最佳解决方案。

注:考虑到算法或评估程序的随机性，或数值准确率的差异，您的结果可能会有所不同。考虑运行该示例几次，并比较平均结果。

在这种情况下，我们可以在搜索过程中看到四个改进，找到的最佳解决方案是两个非常小的输入，接近于零，估计约为 0.0003，这比爬山者的单次运行或重新启动爬山者要好。

Restart 0, best: f([-0.96775653 0.96853129]) = 3.57447
Restart 3, best: f([-4.50618519e-04 9.51020713e-01]) = 2.57996
Restart 5, best: f([ 0.00137423 -0.00047059]) = 0.00416
Restart 22, best: f([ 1.16431936e-04 -3.31358206e-06]) = 0.00033
Done!
f([ 1.16431936e-04 -3.31358206e-06]) = 0.000330

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

书

元试探法精要，2011。
元启发式手册，2019 年第 3 版。

文章

摘要

在本教程中，您发现了如何从零开始实现迭代局部搜索算法。

具体来说，您了解到:

迭代局部搜索是一种随机全局搜索优化算法，它是随机重新启动的随机爬山的更智能版本。
如何实现从零开始随机重启的随机爬山？
如何实现迭代局部搜索算法并将其应用于非线性目标函数？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。*

Python 线性搜索优化

原文：machinelearningmastery.com/line-search…

最后更新于 2021 年 10 月 12 日

线性搜索是一种优化算法，可用于具有一个或多个变量的目标函数。

它提供了一种使用单变量优化算法的方法，就像对多变量目标函数的二等分搜索一样，通过使用搜索来定位从已知点到最优值的每个维度中的最佳步长。

在本教程中，您将发现如何在 Python 中执行行搜索优化。

完成本教程后，您将知道:

线性搜索是一种单变量和多变量优化问题的优化算法。
SciPy 库提供了一个执行线性搜索的应用编程接口，要求您知道如何计算目标函数的一阶导数。
如何对目标函数执行行搜索并使用结果？

用我的新书机器学习优化启动你的项目，包括分步教程和所有示例的 Python 源代码文件。

Let’s get started. Line Search Optimization With Python

使用 Python 进行线性搜索优化图片由 Nathalie 提供，保留部分权利。

教程概述

本教程分为三个部分；它们是:

什么是线性搜索
Python 中的行搜索
如何执行行搜索
1. 定义目标函数
2. 执行行搜索
3. 处理线路搜索失败案例

什么是线性搜索

线性搜索是一种单变量或多变量优化的优化算法。

该算法需要搜索空间中的初始位置和搜索方向。然后，它将从初始位置选择搜索空间中的下一个位置，这将导致更好或最佳的目标函数评估。

方向是一个量值，指示沿线的符号(正或负)和搜索的最大范围。因此，该方向被更好地认为是候选搜索区域，并且必须足够大以包含最佳点，或者比起点更好的点。

线性搜索将自动从当前位置为步长(方向)选择称为 alpha 的比例因子，以最小化目标函数。这包括使用另一个单变量优化算法来找到所选方向上的最佳点，以便选择合适的α。

一种方法是使用线性搜索，它选择最小化一维函数的步长因子[……]我们可以应用我们选择的单变量优化方法。

—第 54 页，优化算法，2019。

Alpha 是方向的比例因子，因此在搜索中只考虑 0.0 到 1.0 之间的值。线性搜索的单个步骤解决了最小化问题，该最小化问题最小化了当前位置加上缩放方向的目标函数:

最小化目标(位置+α*方向)

因此，线性搜索一次在一个维度上运行，并返回在选定方向上移动的距离。

线性搜索方法的每次迭代计算一个搜索方向 pk，然后决定沿着该方向移动多远。

—第 30 页，数值优化，2006。

线性搜索可以被重复调用以在搜索空间中导航到解，并且如果所选择的方向不包含具有较低目标函数值的点，例如如果算法被引导向上搜索，则线性搜索可能失败。

该解是近似的或不精确的，并且根据搜索空间的形状可能不是全局解。该算法适用的条件称为狼条件。

现在我们已经熟悉了行搜索，让我们探索一下如何在 Python 中执行行搜索。

Python 中的行搜索

我们可以使用 line_search()函数在 Python 中手动执行行搜索。

它支持单变量优化，以及多变量优化问题。

该函数采用目标函数的名称和目标函数的梯度名称，以及搜索空间中的当前位置和移动方向。

因此，你必须知道目标函数的一阶导数。您还必须知道从哪里开始搜索以及执行搜索的范围。回想一下，您可以使用不同的方向(符号和幅度)多次执行搜索。

...
result = line_search(objective, gradient, point, direction)

该函数返回一个由六个元素组成的元组，其中包括称为 alpha 的方向的比例因子、已执行的函数求值次数以及其他值。

结果元组中的第一个元素包含 alpha。如果搜索未能收敛，阿尔法将具有值无。

...
# retrieve the alpha value found as part of the line search
alpha = result[0]

α、起点和方向可用于构建单线性搜索的终点。

...
# construct the end point of a line search
end = point + alpha * direction

对于具有多个输入变量的优化问题，例如多元优化， line_search()函数将为所有维度返回一个 alpha 值。

这意味着函数假设 optima 在所有维度上与起点等距，这是一个显著的限制。

现在我们已经熟悉了如何在 Python 中执行行搜索，让我们来探索一个成功的例子。

如何执行行搜索

我们可以用一个简单的单变量目标函数及其导数来演示如何使用线性搜索。

本部分分为多个部分，包括定义测试功能、执行线路搜索以及处理 optima 不在的故障情况。

定义目标函数

首先，我们可以定义目标函数。

在这种情况下，我们将使用一维目标函数，具体来说，x² 远离零移动了少量。这是一个凸函数，选择它是因为它很容易理解和计算一阶导数。

目标(x) = (-5 + x)²

注意，线性搜索不限于一维函数或凸函数。

下面列出了该功能的实现。

# objective function
def objective(x):
	return (-5.0 + x)**2.0

该函数的一阶导数可以解析计算，如下所示:

梯度(x)= 2 *(5+x)

每个输入值的梯度仅指示每个点处朝向最优值的斜率。下面列出了该功能的实现。

# gradient for the objective function
def gradient(x):
	return 2.0 * (-5.0 + x)

我们可以为 x 定义一个从-10 到 20 的输入范围，并计算每个输入的目标值。

...
# define range
r_min, r_max = -10.0, 20.0
# prepare inputs
inputs = arange(r_min, r_max, 0.1)
# compute targets
targets = [objective(x) for x in inputs]

然后，我们可以绘制输入值与目标值的关系图，以了解函数的形状。

...
# plot inputs vs objective
pyplot.plot(inputs, targets, '-', label='objective')
pyplot.legend()
pyplot.show()

将这些联系在一起，完整的示例如下所示。

# plot a convex objective function
from numpy import arange
from matplotlib import pyplot

# objective function
def objective(x):
	return (-5.0 + x)**2.0

# gradient for the objective function
def gradient(x):
	return 2.0 * (-5.0 + x)

# define range
r_min, r_max = -10.0, 20.0
# prepare inputs
inputs = arange(r_min, r_max, 0.1)
# compute targets
targets = [objective(x) for x in inputs]
# plot inputs vs objective
pyplot.plot(inputs, targets, '-', label='objective')
pyplot.legend()
pyplot.show()

运行该示例评估-10 到 20 范围内的输入值(x)，并创建一个显示熟悉的抛物线 U 形的图。

函数的最佳值似乎在 x=5.0，目标值为 0.0。

Line Plot of Convex Objective Function

凸目标函数的线图

执行行搜索

接下来，我们可以对函数执行行搜索。

首先，我们必须确定搜索的起点和方向。

在这种情况下，我们将使用 x=-5 的起点，距离 optima 大约 10 个单位。我们将向右迈出一大步，例如正向，在这种情况下为 100 个单位，这将大大超过最佳值。

回想方向就像步长，搜索将缩放步长以找到最优值。

...
# define the starting point
point = -5.0
# define the direction to move
direction = 100.0
# print the initial conditions
print('start=%.1f, direction=%.1f' % (point, direction))
# perform the line search
result = line_search(objective, gradient, point, direction)

然后，搜索将找出最优值，并返回α或距离来修改方向。

我们可以从结果中检索 alpha，以及执行的函数评估的数量。

...
# summarize the result
alpha = result[0]
print('Alpha: %.3f' % alpha)
print('Function evaluations: %d' % result[1])

我们可以使用α，以及我们的起点和步长来计算最优值的位置，并计算该点的目标函数(我们希望该值等于 0.0)。

...
# define objective function minima 
end = point + alpha * direction
# evaluate objective function minima
print('f(end) = %.3f' % objective(end))

然后，为了好玩，我们可以再次绘制函数，并将起点显示为绿色正方形，终点显示为红色正方形。

...
# define range
r_min, r_max = -10.0, 20.0
# prepare inputs
inputs = arange(r_min, r_max, 0.1)
# compute targets
targets = [objective(x) for x in inputs]
# plot inputs vs objective
pyplot.plot(inputs, targets, '--', label='objective')
# plot start and end of the search
pyplot.plot([point], [objective(point)], 's', color='g')
pyplot.plot([end], [objective(end)], 's', color='r')
pyplot.legend()
pyplot.show()

将这些联系在一起，下面列出了对凸目标函数执行线性搜索的完整示例。

# perform a line search on a convex objective function
from numpy import arange
from scipy.optimize import line_search
from matplotlib import pyplot

# objective function
def objective(x):
	return (-5.0 + x)**2.0

# gradient for the objective function
def gradient(x):
	return 2.0 * (-5.0 + x)

# define the starting point
point = -5.0
# define the direction to move
direction = 100.0
# print the initial conditions
print('start=%.1f, direction=%.1f' % (point, direction))
# perform the line search
result = line_search(objective, gradient, point, direction)
# summarize the result
alpha = result[0]
print('Alpha: %.3f' % alpha)
print('Function evaluations: %d' % result[1])
# define objective function minima
end = point + alpha * direction
# evaluate objective function minima
print('f(end) = f(%.3f) = %.3f' % (end, objective(end)))
# define range
r_min, r_max = -10.0, 20.0
# prepare inputs
inputs = arange(r_min, r_max, 0.1)
# compute targets
targets = [objective(x) for x in inputs]
# plot inputs vs objective
pyplot.plot(inputs, targets, '--', label='objective')
# plot start and end of the search
pyplot.plot([point], [objective(point)], 's', color='g')
pyplot.plot([end], [objective(end)], 's', color='r')
pyplot.legend()
pyplot.show()

运行示例首先报告起点和方向。

执行搜索并定位α，α修改方向以定位最优值，在本例中为 0.1，它是在三次功能评估后找到的。

optima 的点位于 5.0，如预期的那样，评估为 0.0。

start=-5.0, direction=100.0
Alpha: 0.100
Function evaluations: 3
f(end) = f(5.000) = 0.000

最后，创建一个函数图，显示起点(绿色)和目标(红色)。

Line Plot of Objective Function With Search Starting Point and Optima

具有搜索起点和最优值的目标函数线图

处理线路搜索失败案例

搜索不能保证找到函数的最优解。

如果指定的方向不够大，无法包含最佳方向，就会发生这种情况。

例如，如果我们使用三个方向，那么搜索将无法找到最优值。我们可以用下面列出的一个完整的例子来演示这一点。

# perform a line search on a convex objective function with a direction that is too small
from numpy import arange
from scipy.optimize import line_search
from matplotlib import pyplot

# objective function
def objective(x):
	return (-5.0 + x)**2.0

# gradient for the objective function
def gradient(x):
	return 2.0 * (-5.0 + x)

# define the starting point
point = -5.0
# define the direction to move
direction = 3.0
# print the initial conditions
print('start=%.1f, direction=%.1f' % (point, direction))
# perform the line search
result = line_search(objective, gradient, point, direction)
# summarize the result
alpha = result[0]
print('Alpha: %.3f' % alpha)
# define objective function minima
end = point + alpha * direction
# evaluate objective function minima
print('f(end) = f(%.3f) = %.3f' % (end, objective(end)))

运行该示例时，搜索达到了 1.0 的α极限，这使得-2 的端点评估为 49。f(5) = 0.0 时，距离最优值很远。

start=-5.0, direction=3.0
Alpha: 1.000
f(end) = f(-2.000) = 49.000

此外，我们可以选择错误的方向，这只会导致比起点更糟糕的评估。

在这种情况下，错误的方向将是远离最佳方向的负方向，例如从起点开始的所有上坡。

...
# define the starting point
point = -5.0
# define the direction to move
direction = -3.0

预期搜索不会收敛，因为它无法找到比起点更好的点。

下面列出了未能收敛的搜索的完整示例。

# perform a line search on a convex objective function that does not converge
from numpy import arange
from scipy.optimize import line_search
from matplotlib import pyplot

# objective function
def objective(x):
	return (-5.0 + x)**2.0

# gradient for the objective function
def gradient(x):
	return 2.0 * (-5.0 + x)

# define the starting point
point = -5.0
# define the direction to move
direction = -3.0
# print the initial conditions
print('start=%.1f, direction=%.1f' % (point, direction))
# perform the line search
result = line_search(objective, gradient, point, direction)
# summarize the result
print('Alpha: %s' % result[0])

运行该示例会产生一个 LineSearchWarning，表明搜索无法像预期的那样收敛。

搜索返回的 alpha 值为“无”。

start=-5.0, direction=-3.0
LineSearchWarning: The line search algorithm did not converge
warn('The line search algorithm did not converge', LineSearchWarning)
Alpha: None

进一步阅读

如果您想更深入地了解这个主题，本节将提供更多资源。

书

优化算法，2019。
数值优化，2006。

蜜蜂

文章

行搜索，维基百科。
沃尔夫条件，维基百科。

摘要

在本教程中，您发现了如何在 Python 中执行行搜索优化。

具体来说，您了解到:

线性搜索是一种单变量和多变量优化问题的优化算法。
SciPy 库提供了一个执行线性搜索的应用编程接口，要求您知道如何计算目标函数的一阶导数。
如何对目标函数执行行搜索并使用结果？

你有什么问题吗？ 在下面的评论中提问，我会尽力回答。