3. 梯度下降
3.1 分析
接上节,现实中穷举法不可能
使用梯度下降算法,是一个贪心算法,不一定全局最优,能得到局部最优
我们根据上节的Loss求梯度
3.2 梯度下降代码
代码如下:
import matplotlib
# 设置后端
matplotlib.use('TkAgg')
import numpy as np
import matplotlib.pyplot as plt
# 1. 定义训练数据集
x_data = np.array([1.0, 2.0, 3.0])
y_data = np.array([2.0, 4.0, 6.0])
# 初始化参数,猜一个1
w = 1.0
# 2. 定义模型
def forward(x):
return x * w
# 3. 定义损失函数
def loss(x, y):
l_sum = 0
for x_val, y_val in zip(x, y):
y_pred = forward(x_val)
l = (y_pred - y_val) * (y_pred - y_val)
l_sum += l
return l_sum / len(x)
# 4. 定义梯度
def gradient(x, y):
grad = 0
for x_val, y_val in zip(x, y):
grad += 2 * x_val * (x_val * w - y_val)
return grad / len(x)
# 5. 训练100次
Epoch = []
Cost = []
for epoch in range(100):
l = loss(x_data, y_data)
grad = gradient(x_data, y_data)
# 设定学习率为0.01
w = w - 0.01 * grad
Epoch.append(epoch)
Cost.append(l)
print("Epoch {:02d}: w = {:.2f}, loss = {:.4f}".format(epoch, w, l))
# Loss图像
plt.figure(figsize=(10, 5))
plt.plot(Epoch, Cost)
plt.title('train loss')
plt.ylabel('Cost')
plt.xlabel('Epoch')
plt.show()
- 定义训练数据集
- 定义模型
- 定义Loss
- 定义Gradient
- 训练
注意点:若收敛不了,可适当调整学习率
3.3 结果
观察结果
Epoch 00: w = 1.09, loss = 4.6667
Epoch 01: w = 1.18, loss = 3.8362
Epoch 02: w = 1.25, loss = 3.1535
Epoch 03: w = 1.32, loss = 2.5923
Epoch 04: w = 1.39, loss = 2.1310
Epoch 05: w = 1.44, loss = 1.7518
Epoch 06: w = 1.50, loss = 1.4401
……
Epoch 98: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000
发现w收敛与2.00
图像如下:
3.4 随机梯度下降SGD
随机取单个样本,用单个样本的Loss去优化,而不是用样本平均Loss
可以通过随机噪声帮我们越过鞍点
代码如下:
import matplotlib
# 设置后端
matplotlib.use('TkAgg')
import numpy as np
import matplotlib.pyplot as plt
# 1. 定义训练数据集
x_data = np.array([1.0, 2.0, 3.0])
y_data = np.array([2.0, 4.0, 6.0])
# 初始化参数,猜一个1
w = 1.0
# 2. 定义模型
def forward(x):
return x * w
# 3. 定义损失函数
def loss(x, y):
y_pred = forward(x)
return (y_pred - y) * (y_pred - y)
# 4. 定义梯度
def gradient(x, y):
y_pred = forward(x)
return 2 * x * (y_pred - y)
# 5. 训练100次
for epoch in range(100):
# 对每一个样本求
for x, y in zip(x_data, y_data):
l = loss(x, y)
grad = gradient(x, y)
# 设定学习率为0.01
w = w - 0.01 * grad
print("Epoch {:02d}: w = {:.2f}, loss = {:.4f}".format(epoch, w, l))
结果如下:
Epoch 00: w = 1.02, loss = 1.0000
Epoch 00: w = 1.10, loss = 3.8416
Epoch 00: w = 1.26, loss = 7.3159
Epoch 01: w = 1.28, loss = 0.5466
Epoch 01: w = 1.33, loss = 2.0998
Epoch 01: w = 1.45, loss = 3.9988
……
Epoch 97: w = 2.00, loss = 0.0000
Epoch 97: w = 2.00, loss = 0.0000
Epoch 97: w = 2.00, loss = 0.0000
Epoch 98: w = 2.00, loss = 0.0000
Epoch 98: w = 2.00, loss = 0.0000
Epoch 98: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000