hw0实现指南——softmax回归10-414/714: Deep Learning Systems的hw0，熟悉了如

hw0的目标是让您在学习本课程之前快速了解一些您应该熟悉的概念和想法。作业将要求您构建一个基本的softmax回归算法，外加一个简单的两层神经网络。您将在原生Python（使用numpy库）和原生C/C++中（对于softmax回归）创建这些实现。作业还将引导您完成将作业提交到我们的自动评分系统的过程。(ps但我没有账号，只用pytest进行测试)

所有代码地址

首先在conda环境中安装两个库，一个将C++绑定python，一个自动微分库（验证计算结果）

!pip3 install pybind11
!pip3 install numdifftools

Question 1: A basic `add` function, and testing/autograding basics

任务1要求实现最基础的add函数，来学习基本的实验流程。 hw0文件目录如下

.
├── data
│   ├── t10k-images-idx3-ubyte.gz
│   ├── t10k-labels-idx1-ubyte.gz
│   ├── train-images-idx3-ubyte.gz
│   └── train-labels-idx1-ubyte.gz
├── hw0.ipynb
├── README.md
├── src
│   ├── simple_ml_ext.cpp
│   └── simple_ml.py
└── tests
    └── test_simple_ml.py

data/目录包含此分配所需的数据（MNIST数据集的副本）；src/目录包含您将在其中编写实现的源文件；test/目录包含将（本地）评估您的解决方案的测试，并将它们提交以进行自动评分。Makefile文件是一个makefile，它将编译代码（与作业的C++部分相关）。

第一个任务是实现simple_ml.add()

def add(x, y):
    """ A trivial 'add' function you should implement to get used to the
    autograder and submission system.  The solution to this problem is in the
    the homework notebook.

    Args:
        x (Python number or numpy array)
        y (Python number or numpy array)

    Return:
        Sum of x + y
    """
    ### BEGIN YOUR CODE
    pass
    ### END YOUR CODE

将### BEGIN YOUR CODE和### END YOUR CODE中间的pass换成自己的代码即可。

def add(x, y):
    """
    ### BEGIN YOUR CODE
    return x+y
    ### END YOUR CODE

在本地进行测试

!python3 -m pytest -k "add"

疑问：pytest到底是如何工作的？有时间再研究。“add”怎么传进去的，怎么知道该执行哪个函数？

自动打分系统mugrade这里就不测试了。

Question 2: Loading MNIST data

任务2要求加载MNIST数据
看下MNIST数据格式：标签文件，需要偏移8位，因为前四位是magic number，中间四位是图片数量。图片文件，需要偏移16位，前面16位包括四个值，magic number，图片数量，每张图片长宽。

读取图片文件数据，并重构为图片数量784，784=2828，并除以255，要归一化。

def parse_mnist(image_filename, label_filename):
    with gzip.open(image_filename, "rb") as f:
        X = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 784).astype('float32') / 255
    with gzip.open(label_filename, "rb") as f:
        y = np.frombuffer(f.read(), np.uint8, offset=8)
    return X, y

本地测试

python3 -m pytest -k "parse_mnist"

Question 3: Softmax loss

任务三要求实现softmaxloss函数

softmax做归一化的事情
把一串数字，进行归一化，归一化后后的一串数字加起来为1，而且原来数字越大，归一化后也更大，所以适合用来表示概率。
为了避免数字过大溢出，可以同时减去一个数字（通常是原一串数字里最大的），不影响最终结果（因为是指数运算，常数可以提出）

softmax loss是用softmax方法衡量分类的损失, 推导如下

实现方法：

# 实现方法1
def softmax_loss(Z, y):
    Z_y = Z[np.arange(Z.shape[0]), y] # 取真实标签的预测值 
    Z_sum = np.log(np.exp(Z).sum(axis=1))
    return np.mean(Z_sum-Z_y) # 对应推导后的公式

# 实现方法2
def softmax(x):
	return np.exp(x-np.max(x))/np.sum(np.exp(x-np.max(x)),axis=-1).reshape(-1,1)
def softmax_loss(Z, y):
	return np.mean(-np.log(softmax(Z)[np.indices(y.shape)[0], y]))#对应公式推导前两行思路
# [np.arange(Z.shape[0]), y]等价于[np.indices(y.shape)[0], y]
# 都是要从二维np中取数据
# 方法1取的是softmax之前的值，方法2取的是softmax后的值

本地测试

python3 -m pytest -k "softmax_loss"

Question 4: Stochastic gradient descent for softmax regression

任务4，实现softmax回归的随机梯度下降

先处理划分batch的问题

再解决反向传播，计算梯度的问题
这里公式主要在第2节课ppt里面
这里附上个人理解

def softmax_regression_epoch(X, y, theta, lr = 0.1, batch=100):
    n = X.shape[0]
    step = n // batch
    index = np.arange(batch)
    for i in range(step + 1):
        start = i * batch
        end = min(start + batch, n)
        if start == end:
           break
        x1 = X[start: end]
        y1 = y[start: end]
        z = softmax(np.matmul(x1, theta)) #过softmax 
        z[index, y1] -= 1  #每行标签位置减去1
        # 也可以写成
        # I = np.zeros_like(z)
        # I[np.arange(x1.shape[0]), y_1] = 1
        # z = z-I
        grad = np.matmul(x1.transpose(), z) / batch # X转置乘z再除以batch（loss要除以batch）
        theta -= lr * grad # 更新 
        # 我之前一直以为是theta *= （1-lr*grad）， grad就是要减去的部分， 不用再乘本身值了

本地测试
python3 -m pytest -k "softmax_regression_epoch and not cpp"

在实现的softmax_regression_epoch基础上，hw0提供了一个train_softmax函数（其实就是初始化参数，然后调用epoch次softmax_regression_epoch, 并打印结果）
调用train_softmax函数, 训练手写数字识别模型
项目根目录新建文件train_softmax.py文件

import sys
sys.path.append("src/")
from simple_ml import train_softmax, parse_mnist

X_tr, y_tr = parse_mnist("data/train-images-idx3-ubyte.gz", 
                         "data/train-labels-idx1-ubyte.gz")
X_te, y_te = parse_mnist("data/t10k-images-idx3-ubyte.gz",
                         "data/t10k-labels-idx1-ubyte.gz")

train_softmax(X_tr, y_tr, X_te, y_te, epochs=10, lr=0.2, batch=100)

执行代码python train_softmax.py，结果如下

Question 5: SGD for a two-layer neural network

任务5，实现两层神经网络的随机梯度下降反向传播推导

def nn_epoch(X, y, W1, W2, lr = 0.1, batch=100):
    n = X.shape[0]
    step = n // batch
    for i in range(step + 1):
        start = i * batch
        end = min(start + batch, n)
        if start == end:
           break
        x1 = X[start: end]
        y1 = y[start: end]
        index = np.arange(x1.shape[0])
        Z1 = x1@W1
        flag = Z1<0
        Z1[flag] = 0 # relu
        G2 = softmax(Z1@W2)
        G2[index, y1] -= 1
        W2_grad = Z1.T@G2/(x1.shape[0]) # 不一定是batch
        G1 = G2@W2.T
        G1[flag] = 0 # 此时Z1不存在小于0的值,不能写成G1[Z1<0]
        W1_grad = x1.T@G1/(x1.shape[0])
        W1 -= lr * W1_grad
        W2 -= lr * W2_grad

本地测试python3 -m pytest -k "nn_epoch"

Question 6: Softmax regression in C++

使用C++重写任务4

void softmax_regression_epoch_cpp(const float *X, 
                    const unsigned char *y,  float *theta, size_t m, 
                    size_t n, size_t k, float lr, size_t batch)
{
    // 不考虑无法整除的情况
    for(size_t num=0; num<m/batch; num++){
        size_t base = num*batch*n;
        float *Z = new float[batch*k]; // 中间变量
        // exp(np.matmul(x1, theta))
        for(size_t i=0; i<batch; i++){
            for(size_t j=0; j<k; j++){
            float sum = 0;
            // Z[i][j] = sum(X[i][x]*theta[x][j])
            for(size_t x=0; x<n; x++){
                sum+= X[base+i*n+x]*theta[x*k+j];
            }
            Z[i*k+j] = exp(sum);
            }
        }
        //softmax
        float *Z_sum = new float[batch];
        for (size_t i=0; i<batch; i++){
            float sum = 0;
            for(size_t j=0; j<k; j++){
            sum += Z[i*k+j];
            }
            Z_sum[i] = sum;
        }
        for(size_t i=0; i<batch; i++){
            for(size_t j = 0; j<k; j++){
            Z[i*k+j]/=Z_sum[i];
            }
        }
        // Z-I
        for(size_t i=0; i<batch; i++){
            Z[i*k+y[num*batch+i]] -= 1.0;
        }
        // X.T@Z
        for(size_t i=0; i<n; i++){
            for(size_t j=0; j<k; j++){
                float sum = 0;
                // dtheta[i][j] = sum(X.T[i][x]*Z[x][j])
                // X.T[i][x] = X[x][i]
                for(size_t x=0; x<batch; x++){
                sum += X[base+x*n+i]*Z[x*k+j];
                }
                theta[i*k+j] -= lr*sum/batch; 
            }
        }
    }
}

make
python3 -m pytest -k "softmax_regression_epoch_cpp"

总结

本章只涉及基本函数，没有完整结构，但实验也很充实，完美覆盖了前三课的内容，并熟悉了如何进行编译和本地测试。所有测试代码整理如下：

python3 -m pytest -k "add"
python3 -m pytest -k "parse_mnist"
python3 -m pytest -k "softmax_loss"
python3 -m pytest -k "softmax_regression_epoch and not cpp"
python train_softmax.py
python3 -m pytest -k "nn_epoch"
make
python3 -m pytest -k "softmax_regression_epoch_cpp"

hw0实现指南——softmax回归

Question 1: A basic add function, and testing/autograding basics