刘二大人第3讲-梯度下降-有代码

55 阅读2分钟

3. 梯度下降

3.1 分析

接上节,现实中穷举法不可能

使用梯度下降算法,是一个贪心算法,不一定全局最优,能得到局部最优

求梯度:Gradient=Lossw更新:w=wαGradient,其中α为学习率,是一个hyperprarameter求梯度: Gradient = \frac{\partial Loss}{\partial w}\\ 更新: w = w - α * Gradient,其中α为学习率,是一个hyperprarameter

我们根据上节的Loss求梯度

cost=1Ni=1N(y^iyi)2=1Ni=1N(wxiyi)2Gradient=Lossw=1Ni=1N2(wxiyi)xi更新:w=wαGradient=wα1Ni=1N2(wxiyi)xicost = \frac{1}{N} * \sum_{i=1}^{N} (\hat{y}i - yi)^2 = \frac{1}{N} * \sum_{i=1}^{N} (w*xi - yi)^2\\ Gradient =\frac{\partial Loss}{\partial w} = \frac{1}{N} * \sum_{i=1}^{N} 2 * (w*xi - yi) * xi\\ 更新: w = w - α * Gradient = w - α * \frac{1}{N} * \sum_{i=1}^{N} 2 * (w*xi - yi) * xi\\

3.2 梯度下降代码

代码如下:

import matplotlib

# 设置后端
matplotlib.use('TkAgg')
import numpy as np
import matplotlib.pyplot as plt

# 1. 定义训练数据集
x_data = np.array([1.0, 2.0, 3.0])
y_data = np.array([2.0, 4.0, 6.0])

# 初始化参数,猜一个1
w = 1.0


# 2. 定义模型
def forward(x):
    return x * w


# 3. 定义损失函数
def loss(x, y):
    l_sum = 0
    for x_val, y_val in zip(x, y):
        y_pred = forward(x_val)
        l = (y_pred - y_val) * (y_pred - y_val)
        l_sum += l
    return l_sum / len(x)


# 4. 定义梯度
def gradient(x, y):
    grad = 0
    for x_val, y_val in zip(x, y):
        grad += 2 * x_val * (x_val * w - y_val)
    return grad / len(x)


# 5. 训练100次
Epoch = []
Cost = []
for epoch in range(100):
    l = loss(x_data, y_data)
    grad = gradient(x_data, y_data)
    # 设定学习率为0.01
    w = w - 0.01 * grad
    Epoch.append(epoch)
    Cost.append(l)
    print("Epoch {:02d}: w = {:.2f}, loss = {:.4f}".format(epoch, w, l))

# Loss图像
plt.figure(figsize=(10, 5))
plt.plot(Epoch, Cost)
plt.title('train loss')
plt.ylabel('Cost')
plt.xlabel('Epoch')
plt.show()
  1. 定义训练数据集
  2. 定义模型
  3. 定义Loss
  4. 定义Gradient
  5. 训练

注意点:若收敛不了,可适当调整学习率

3.3 结果

观察结果

Epoch 00: w = 1.09, loss = 4.6667
Epoch 01: w = 1.18, loss = 3.8362
Epoch 02: w = 1.25, loss = 3.1535
Epoch 03: w = 1.32, loss = 2.5923
Epoch 04: w = 1.39, loss = 2.1310
Epoch 05: w = 1.44, loss = 1.7518
Epoch 06: w = 1.50, loss = 1.4401
……
Epoch 98: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000
发现w收敛与2.00

图像如下:

梯度下降Loss图像.png

3.4 随机梯度下降SGD

随机取单个样本,用单个样本的Loss去优化,而不是用样本平均Loss

cost=(y^y)2=(wxy)2Gradient=Lossw=2(wxy)xi更新:w=wαGradient=wα2(wxy)xicost = (\hat{y} - y)^2 = (w*x - y)^2\\ Gradient =\frac{\partial Loss}{\partial w} = 2 * (w*x - y) * xi\\ 更新: w = w - α * Gradient = w - α * 2 * (w*x - y) * xi\\

可以通过随机噪声帮我们越过鞍点

代码如下:

import matplotlib

# 设置后端
matplotlib.use('TkAgg')
import numpy as np
import matplotlib.pyplot as plt

# 1. 定义训练数据集
x_data = np.array([1.0, 2.0, 3.0])
y_data = np.array([2.0, 4.0, 6.0])

# 初始化参数,猜一个1
w = 1.0


# 2. 定义模型
def forward(x):
    return x * w


# 3. 定义损失函数
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y) * (y_pred - y)


# 4. 定义梯度
def gradient(x, y):
    y_pred = forward(x)
    return 2 * x * (y_pred - y)


# 5. 训练100次
for epoch in range(100):
    # 对每一个样本求
    for x, y in zip(x_data, y_data):
        l = loss(x, y)
        grad = gradient(x, y)
        # 设定学习率为0.01
        w = w - 0.01 * grad
        print("Epoch {:02d}: w = {:.2f}, loss = {:.4f}".format(epoch, w, l))

结果如下:

Epoch 00: w = 1.02, loss = 1.0000
Epoch 00: w = 1.10, loss = 3.8416
Epoch 00: w = 1.26, loss = 7.3159
Epoch 01: w = 1.28, loss = 0.5466
Epoch 01: w = 1.33, loss = 2.0998
Epoch 01: w = 1.45, loss = 3.9988
……
Epoch 97: w = 2.00, loss = 0.0000
Epoch 97: w = 2.00, loss = 0.0000
Epoch 97: w = 2.00, loss = 0.0000
Epoch 98: w = 2.00, loss = 0.0000
Epoch 98: w = 2.00, loss = 0.0000
Epoch 98: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000
Epoch 99: w = 2.00, loss = 0.0000

总结

总结.png