自己动手写一个简单的神经网络框架(2)持续创作，加速成长！这是我参与「掘金日新计划 · 6 月更文挑战」的第28天，点击

持续创作，加速成长！这是我参与「掘金日新计划 · 6 月更文挑战」的第28天，点击查看活动详情

初始化参数，在整个网络中可学习的参数就是隐藏层 l1 和输出层 l2 的 w 这里偏置设置为 False

l1 = np.zeros((784,128),dtype=np.float32)
l2 = np.zeros((128,10),dtype=np.float32)

这里注意一下默认情况下，numpy 创建一个数组类型为 np.float64 ，所以需要手动指定一下类型为 np.float32,

暂时这里权重先用之前训练好的权重用于 Numpy 实现前向传播的参数

l1[:] = model.l1.weight.detach().numpy().transpose()
l2[:] = model.l2.weight.detach().numpy().transpose()

用 numpy 实现前向传播

def forward(x):
  x = x.dot(l1)
  x = np.maximum(x,0)
  x = x.dot(l2)
  return x

y_test_pred = np.argmax(forward(X_test.reshape((-1,28*28))),axis=1)
(y_test_pred==Y_test).mean()

这里前向传播就把 pytorch 的 ANet 代码，用 numpy 浮现一下
relu 激活函数，这里用 maximum 这个函数来实现。relu 函数非常简单就是矩阵每一个元素小于 0 的元素将其赋值为 0。relu 函数非常简单就是矩阵每一个元素小于 0 的元素将其赋值为 0

max 和 maximum 区别

a = np.random.randn(2,3)
a

直接调用 max 就会在矩阵所有数中找到最大数返回

print(a)
np.max(a)

[[ 1.6422496  -2.18358678  0.1270143 ]
 [-0.29133367  1.03585029 -0.2040185 ]]

返回最大数值

1.642249596546198

np.max(a,0)

np.max(a,0)

如果指定一个维度那么，例如给出 0 的话，那么就是在列方向找到每一个列最大值返回

array([1.6422496 , 1.03585029, 0.1270143 ])

np.maximum(a,0)

array([[1.6422496 , 0.        , 0.1270143 ],
       [0.        , 1.03585029, 0.        ]])

接下来将测试数据输入到前向网络看一下评估的效果

y_test_preds = np.argmax(forward(X_test.reshape((-1,28*28))),axis=1)
(Y_test == y_test_preds).mean()

大家看到下面数字 0.9477 记住这个数值，我们来看和上面之前 torch 预测结果是否一致

将计算进行分解，将前向传播结果作为中间过程赋值给 y_test_preds_out

y_test_preds_out = forward(X_test.reshape((-1,28*28)))
y_test_preds = np.argmax(y_test_preds_out,axis=1)

这是交叉熵的损失函数，简单说明一些，class 表示 label 标注类别，输出数据为 10 维向量，每一个元素对应该预测结果对样本属于这个类别的可能性大小，通常需要处理概率分布，也就是经过 softmax 然后 $-x[class]$ 表示 label 对应维度的数值 Y_test[0] 为我们就从

L = -x[class] + \log \left( \sum_j \exp(x[j]) \right)

-Y_test_preds_out[0,7] + np.log(np.exp(Y_test_preds_out[0]).sum())

1.9073486e-05

y_test_pred_out[sample,Y_test[sample]]

将样本作为一个变量 smaple

sample = 1
-Y_test_pred_out[sample,Y_test[sample]] + np.log(np.exp(Y_test_pred_out[sample]).sum())

接下来我们要根据预测结果计算损失值，我们计算出一个(样本索引，损失值)，接下来通过排序找出损失值最大的，看一看是不是难于分辨的手写数值。这里要做就是我们计算所有样本交叉熵损失函数

ret = - Y_test_preds_out[range(Y_test_preds_out.shape[0]),Y_test] + np.log(np.exp(Y_test_preds_out).sum(axis=1))

np.argmax(ret)#3520

imshow(X_test[np.argmax(ret)])

不难看出这个就是模型给出判断误差最大的一张图像，的确不好分辨，即使对于我们人类来说也是一张比较难于分辨的图像。

sorted(list(zip(ret,range(ret.shape[0]))),reverse=True)

grid = sorted(list(zip(ret,range(ret.shape[0]))),reverse=True)[0:16]
hard_classification_img = X_test[[x[1] for x in grid]]
#hard_classification_img.reshape(4,28*4,28).shape
hard_classification_img.shape
imshow(np.concatenate(hard_classification_img.reshape((4,28*4,28)),axis=1))

首先grid是一个 list 其中每一个元素是一个 tuple 类型，例如 (30.342394, 2607), 其中一个值 loss 值，另一个对应图像需要，我们需要根据图像需要拿到对应的图像，hard_classification_img 的 shape 为 $(16 \times 28 \times 28)$

np.concatenate(hard_classification_img.reshape((4,28*4,28)),axis=1)

关键是看代码是怎么把 16 个 $28 \times 28$ 图像拼接为 $4 \times 4$ 排列图像，接下来我们可以 G 变为参数来显示不同维度的数字列表。

G = 16
# zip 将 ret 样本索引号和对应 loss 值组成 zip(smaple number, loss)
grid = sorted(list(zip(ret,range(ret.shape[0]))),reverse=False)[0:G*G]
hard_classification_img = X_test[[x[1] for x in grid]]
#hard_classification_img.reshape(4,28*4,28).shape
hard_classification_img.shape
imshow(np.concatenate(hard_classification_img.reshape((G,28*G,28)),axis=1))