深度学习-03-线性神经网络-线性回归3.1 线性回归 3.1.1 线性回归的基本元素这里只简单罗列几个名词训练集

3.1 线性回归

3.1.1 线性回归的基本元素

这里只简单罗列几个名词

训练集
样本
标签
特征预测所依据的自变量（如：预测房价时的面积、房龄）

3.1.1.1 线性模型

其中，w称为权重，b称为偏置

给定一个数据集，我们的目标是寻找模型的权重w和偏置b，使得根据模型做出的预测大体符合数据里的真实价格。

在开始寻找最好的参数模型（model parameters）w和b之前，我们还需要知道两个东西：

度量模型质量的方式（如何评价这个模型好不好？）
提高模型预测质量的方式（怎么去提高模型预测的准确率？）

3.1.1.2 损失函数

3.1.1.4 随机梯度下降

梯度下降法(gradient descent)，这种方法几乎可以优化所有深度学习模型。它通过不断地在损失函数递减的方向上更新参数来降低误差。

梯度下降最简单的用法是计算损失函数（数据集中所有样本的损失均值）关于模型参数的导数（在这里也可以称为梯度）。但实际中的执行可能会非常慢：因为在每一次更新参数之前，我们必须遍历整个数据集。因此，我们通常会在每次需要计算更新的时候随机抽取一小批样本，这种变体叫做小批量随机梯度下降（minibatch stochastic gradient descent）。

3.1.1.5 用模型进行预测

3.1.2 矢量化加速

Python中的for循环开销是高昂的，我们应该利用线性代数库，进行矢量化计算

import torch
from utils import Timer

n = 10000
a = torch.ones(n)
b = torch.ones(n)
c = torch.zeros(n)
timer = Timer.Timer()
for i in range(n):
    c[i] = a[i] + b[i]
print(f'{timer.stop():.5f} sec')

timer.start()
d = a + b
print(f'{timer.stop():.5f} sec')


0.12973 sec
0.00000 sec

3.1.5 小结

机器学习模型中的关键要素是训练数据、损失函数、优化算法，还有模型本身。
矢量化使数学表达上更简洁，同时运行的更快。
最小化目标函数和执行极大似然估计等价。
线性回归模型也是一个简单的神经网络。

3.2 线性回归的从零开始实现

3.2.1 生成数据集

# 生成数据集  生成y=Xw+b+噪声
def synthetic_data(w, b, num_examples):  
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

3.2.2 读入数据集

# 读入数据集
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # 这些样本是随机读取的，没有特定的顺序
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]

3.2.3 初始化模型参数

3.2.4 定义模型

# 线性回归模型
# return： y_hat   预测向量
def linreg(X, w, b): 
    return torch.matmul(X, w) + b

3.2.5 定义损失函数

# 均方损失
def squared_loss(y_hat, y):
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

3.2.6 定义优化算法

# 小批量随机梯度下降  small gradient descent
def sgd(params, lr, batch_size):
    with torch.no_grad():  # 保证param原地数值改变操作下requires_grad=True不变。
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

3.2.7 训练

lr = 0.03
num_epochs = 3
net = linreg
loss = squared_loss

for epoch in range(num_epochs):
    for X, y in data_iter(batch_size, features, labels):
        l = loss(net(X, w, b), y)  # X和y的小批量损失
        # 因为l形状是(batch_size,1)，而不是一个标量。l中的所有元素被加到一起，
        # 并以此计算关于[w,b]的梯度
        l.sum().backward()
        sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
    with torch.no_grad():
        train_l = loss(net(features, w, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

完整代码

import random
import torch
from d2l import torch as d2l


# 生成数据集
def synthetic_data(w, b, num_examples):
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))


# 读入数据集
def data_iter(batch_size, features, labels):
    num_examples = len(features)
    indices = list(range(num_examples))
    # 这些样本是随机读取的，没有特定的顺序
    random.shuffle(indices)
    for i in range(0, num_examples, batch_size):
        batch_indices = torch.tensor(
            indices[i: min(i + batch_size, num_examples)])
        yield features[batch_indices], labels[batch_indices]


# 线性回归模型
# return： y_hat   预测向量
def linreg(X, w, b):
    return torch.matmul(X, w) + b

# 均方损失
def squared_loss(y_hat, y):
    return (y_hat - y.reshape(y_hat.shape)) ** 2 / 2

# 小批量随机梯度下降  small gradient descent
def sgd(params, lr, batch_size):
    with torch.no_grad():  # 保证param原地数值改变操作下requires_grad=True不变。
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

if __name__ == '__main__':
    # 生成数据集，   自己定义一个w和b
    true_w = torch.tensor([2, -3.4])
    true_b = 4.2
    features, labels = synthetic_data(true_w, true_b, 1000)

    # 初始化w、b和batch_size
    batch_size = 10
    w = torch.normal(0, 0.01, size=(2, 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)

    lr = 0.03  # learning_rate
    num_epochs = 3  # 迭代周期
    net = linreg  # 给linreg模型取别名，  net
    loss = squared_loss

    for epoch in range(num_epochs):
        for X, y in data_iter(batch_size, features, labels):
            l = loss(net(X, w, b), y)  # X和y的小批量损失
            # 因为l形状是(batch_size,1)，而不是一个标量。l中的所有元素被加到一起，
            # 并以此计算关于[w,b]的梯度
            l.sum().backward()
            sgd([w, b], lr, batch_size)  # 使用参数的梯度更新参数
        with torch.no_grad():
            train_l = loss(net(features, w, b), labels)
            print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')

3.3 线性回归的简洁实现

import numpy as np
import torch
from torch.utils import data
from torch import nn
# 生成数据集
def synthetic_data(w, b, num_examples):
    """生成y=Xw+b+噪声"""
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

def load_array(data_arrays, batch_size, is_train=True):
    """构造一个PyTorch数据迭代器"""
    dataset = data.TensorDataset(*data_arrays)
    return data.DataLoader(dataset, batch_size, shuffle=is_train)


if __name__ == '__main__':
    true_w = torch.tensor([2, -3.4])
    true_b = 4.2
    features, labels = synthetic_data(true_w, true_b, 1000)

    batch_size = 10
    data_iter = load_array((features, labels), batch_size)
    # nn 是nerve net 神经网络的缩写
    # 全连接层在Linear类中定义。
    # 第一个指定输入特征形状，即2，第二个指定输出特征形状，输出特征形状为单个标量，因此为1。
    net = nn.Sequential(nn.Linear(2, 1))


    # 初始化模型参数
    net[0].weight.data.normal_(0, 0.01)
    net[0].bias.data.fill_(0)

    # 定义损失函数
    loss = nn.MSELoss()  #

    # 定义优化算法
    trainer = torch.optim.SGD(net.parameters(), lr=0.03)


    num_epochs = 3
    for epoch in range(num_epochs):
        for X, y in data_iter:
            l = loss(net(X), y)  # weight and bias is build_in
            trainer.zero_grad()  # 梯度清零
            l.backward()
            trainer.step()  # 对模型进行一次更新
        l = loss(net(features), labels)
        print(f'epoch {epoch + 1}, loss {l:f}')
    # trainer.step()
    # 在参数迭代的时候是如何知道batch_size的？
    # 因为loss = nn.MSELoss()，均方误差是对样本总量平均过得到的，所以trainer.step()
    # 使用的是平均过的grad。