Background 背景
This article is not to explain the scientific side of (ANN) Artificial Neural Networks. It provides a simple C++ class without any complications in the mathematical calculations of the backpropagation algorithm. If you have good experience about ANN, you can skip to the next section, else, you can revise this very good resources about ANN. I have provided two use cases to facilitate code usage as much as possible.
本文并不是要解释(ANN)人工神经网络的科学方面。它提供了一个简单的 C++ 类,在反向传播算法的数学计算中没有任何复杂性。如果您对 ANN 有很好的经验,您可以跳到下一节,否则,您可以修改这个关于 ANN 的非常好的资源。我提供了两个用例来尽可能方便代码的使用。
- Introduction to Neural Networks
神经网络简介 - Neural Network Applications
神经网络应用 - Backpropagation Mathematics
反向传播数学 - Backpropagation Algorithm
反向传播算法
Introduction 介绍
Today, (ANN) Artificial neural networks has become dominant in many areas of life, whether an industry or at home. ANN enables machines to learn and to simulate human brain to recognize patterns, and make predictions as well as solve problems in every business sector. Smartphones and computers that we use on a daily basis are using ANN in some of its applications. For example, Finger Print and Face unlock services in smartphones and computers use ANN. Handwritten Signature Verification uses ANN. I have written a simple implementation for an Artificial Neural Network C++ class that handles backpropagation algorithm. The code depends on Eigen open-source templates to handle Matrices’ mathematics. I made code simple and fast as possible.
如今,(ANN)人工神经网络已经在生活的许多领域占据主导地位,无论是工业还是家庭。人工神经网络使机器能够学习并模拟人脑来识别模式、做出预测以及解决每个业务部门的问题。我们日常使用的智能手机和计算机在其某些应用程序中使用了人工神经网络。例如,智能手机和计算机中的指纹和面部解锁服务就使用 ANN。手写签名验证使用 ANN。我为处理反向传播算法的人工神经网络 C++ 类编写了一个简单的实现。该代码依赖于 Eigen 开源模板来处理 Matrices 的数学。我使代码尽可能简单、快速。
NeuralNetwork Class 神经网络类
NeuralNetwork is a simple C++ class with the following structure:
NeuralNetwork 是一个简单的 C++ 类,其结构如下:
he code uses RowVectorXd and MatrixXd from Eigen template library. The main functions "train" and "test" take input and desired output in RowVector format. Both of them call "forward" function which uses vector multiplication.
该代码使用 Eigen 模板库中的 RowVectorXd 和 MatrixXd 。主要函数 "train" 和 "test" 以 RowVector 格式获取输入和所需输出。它们都调用使用向量乘法的 "forward" 函数。
forward 向前
C++
void NeuralNetwork::forward(RowVector& input) {
// set first layer input
mNeurons.front()->block(0, 0, 1, input.size()) = input;
// propagate forward (vector multiplication)
for (unsigned int i = 1; i < mArchitecture.size(); i++) {
// copy values ingoring last neuron as it is a bias
mNeurons[i]->block(0, 0, 1, mArchitecture[i]) =
(*mNeurons[i - 1] * *mWeights[i - 1]).block(0, 0, 1, mArchitecture[i]);
for (int col = 0; col < mArchitecture[i]; col++)
mNeurons[i]->coeffRef(col) = activation(mNeurons[i]->coeffRef(col));
}
}
The function propagates with input through network layers to get output from the last layer. Each neuron in the hidden layer first computes a weighted sum of its inputs. Then it applies an activation function (segmoid) to this sum to derive its output. This function affects neurons values only. It doesn't affect connections weights or errors. This function does this sum with vector multiplication:
该函数将输入通过网络层进行传播,以从最后一层获取输出。隐藏层中的每个神经元首先计算其输入的加权和。然后,它将激活函数(segmoid)应用于该总和以导出其输出。该函数仅影响神经元值。它不会影响连接权重或错误。该函数通过向量乘法求和:
C++
(*mNeurons[i - 1] * *mWeights[i - 1])
Then, resultant values are passed through activation function.
然后,结果值通过 activation 函数传递。
C++
double NeuralNetwork::activation(double x) {
if (mActivation == TANH)
return tanh(x);
if (mActivation == SIGMOID)
return 1.0 / (1.0 + exp(-x));
return 0;
}
tanh 正值
sigmoid 乙状结肠
backward 落后
C++
void NeuralNetwork::backward(RowVector& output) {
// calculate last layer errors
*mErrors.back() = output - *mNeurons.back();
// calculate hidden layers' errors (vector multiplication)
for (size_t i = mErrors.size() - 2; i > 0; i--)
*mErrors[i] = *mErrors[i + 1] * mWeights[i]->transpose();
// update weights
size_t size = mWeights.size();
for (size_t i = 0; i < size; i++)
for (int col = 0, cols = (int)mWeights[i]->cols(); col < cols; col++)
for (int row = 0; row < mWeights[i]->rows(); row++) {
mWeights[i]->coeffRef(row, col) +=
mLearningRate *
mErrors[i + 1]->coeffRef(col) *
activationDerivative(mNeurons[i + 1]->coeffRef(col)) *
mNeurons[i]->coeffRef(row);
}
}
The function is key of the Backpropagation algorithm. It takes output of last layer and propagates backward through network layers, calculates each layer errors, and update connections weights depending on the rule:
该函数是反向传播算法的关键。它获取最后一层的输出并通过网络层向后传播,计算每层误差,并根据规则更新连接权重:
new weight = old weight + learingRate * next error * sigmoidDerivative(next neuron value)
C++
double NeuralNetwork::activationDerivative(double x) {
if (mActivation == TANH)
return 1 - tanh(x) * tanh(x);
if (mActivation == SIGMOID)
return x * (1.0 - x);
return 0;
}
tanh derivative tanh 导数
sigmoid derivative 乙状结肠导数
Note: The curve of sigmoidDerivative has a big significance. As its input ranges from 0 to 1 (neuron value), there are the three possible cases:
注: sigmoidDerivative 曲线意义重大。由于其输入范围为0到1(神经元值),因此存在三种可能的情况:
-
neuron value near
0, so weight value doesn't need support.
神经元值接近0,因此权重值不需要支持。 -
neuron value near
0.5, so weight value needs a slight change.
神经元值接近0.5,因此权重值需要稍微改变。 -
neuron value near
1, so weight value doesn't need support.
神经元值接近1,因此权重值不需要支持。
train 火车
C++
void NeuralNetwork::train(RowVector& input, RowVector& output) {
forward(input);
backward(output);
}
The function propagates input in forward direction, then propagates backward with the resultant output to adjust connections weight.
该函数向前传播输入,然后使用结果输出向后传播以调整连接权重。
test 测试
C++
void NeuralNetwork::test(RowVector& input, RowVector& output) {
forward(input);
// calculate last layer errors
*mErrors.back() = output - *mNeurons.back();
}
The function propagates input in forward direction, then calculates error between resultant output and desired output.
该函数向前传播输入,然后计算结果输出与期望输出之间的误差。
evaluate 评价
There are various ways to evaluate the performance of neural network model, such as Confusion matrix, Accuracy, Precision, Recall, and F1 score. I have added “Confusion Matrix” calculation to the code through the evaluate function call after each testing call.
评估神经网络模型性能的方法有多种,如混淆矩阵、准确度、精确度、召回率、F1 分数等。我在每次测试调用后通过评估函数调用将“混淆矩阵”计算添加到代码中。
C++
void NeuralNetwork::evaluate(RowVector& output) {
double desired = 0, actual = 0;
mConfusion->coeffRef(
vote(output, desired),
vote(*mNeurons.back(), actual)
)++;
}
This function simply fill the right cell in the confusion matrix depending on the match between the actual and desired output.
该函数只是根据实际输出和期望输出之间的匹配来填充混淆矩阵中的正确单元格。
After the hole testing the confusion matrix can be used to calculate Precision, Recall, and F1 score.
孔测试后,混淆矩阵可用于计算精度、召回率和 F1 分数。
C++
void NeuralNetwork::confusionMatrix(RowVector*& precision, RowVector*& recall) {
int rows = (int)mConfusion->rows();
int cols = (int)mConfusion->cols();
precision = new RowVector(cols);
for (int col = 0; col < cols; col++) {
double colSum = 0;
for (int row = 0; row < rows; row++)
colSum += mConfusion->coeffRef(row, col);
precision->coeffRef(col) = mConfusion->coeffRef(col, col) / colSum;
}
recall = new RowVector(rows);
for (int row = 0; row < rows; row++) {
double rowSum = 0;
for (int col = 0; col < cols; col++)
rowSum += mConfusion->coeffRef(row, col);
recall->coeffRef(row) = mConfusion->coeffRef(row, row) / rowSum;
}
...
}
This calclation will be clear in the second Usecase Handwritten Digits Recognition
这个计算将在第二个用例手写数字识别中清楚地说明
Use Cases 用例
Simple Counter 简单计数器
Neural network takes an input in binary (3 bits) and generates an output equals to input + 1. Then output is taken back as an input to the network. If input number equals to 7 (111 in binary) output should be 0. Network is trained using backpropagation algorithm to adjust network's connections weights. Training process takes about 2 minutes to minimize error between desired output and actual network output.
神经网络接受二进制(3 位)输入并生成等于输入 + 1 的输出。然后输出被取回作为网络的输入。如果输入数等于 7(二进制为 111),输出应为 0。使用反向传播算法来训练网络以调整网络的连接权重。训练过程大约需要 2 分钟,以最大限度地减少期望输出和实际网络输出之间的误差。
| Input 输入 | Output 输出 |
|---|---|
| 0 0 0 | 0 0 1 |
| 0 0 1 | 0 1 0 |
| 0 1 0 | 0 1 1 |
| 0 1 1 | 1 0 0 |
| 1 0 0 | 1 0 1 |
| 1 0 1 | 1 1 0 |
| 1 1 0 | 1 1 1 |
| 1 1 1 | 0 0 0 |
Simply, construct NeuralNetwork class with the required architecture and learningRate.
简而言之,使用所需的架构和 learningRate 构造 NeuralNetwork 类。
C++
NeuralNetwork net({ 3, 5, 3 }, 0.05, NeuralNetwork::Activation::TANH);
3 neurons in input layer, 5 neurons in hidden layer, and 3 neurons in output layer.
输入层有3个神经元,隐藏层有5个神经元,输出层有3个神经元。
0.05 learning rate. 0.05 学习率。
The following figure describes full training process for the network with 50,000 trials.
下图描述了经过 50,000 次试验的网络的完整训练过程。
Train Network 火车网络
C++
Shrink ▲ 收缩▲
void train(NeuralNetwork& net) {
cout << "Training:" << endl;
RowVector input(3), output(3);
int stop = 0;
for (int i = 0; stop < 8 && i < 50000; i++) {
cout << i + 1 << endl;
for (int num = 0; stop < 8 && num < 8; num++) {
input.coeffRef(0) = (num >> 2) & 1;
input.coeffRef(1) = (num >> 1) & 1;
input.coeffRef(2) = num & 1;
output.coeffRef(0) = ((num + 1) >> 2) & 1;
output.coeffRef(1) = ((num + 1) >> 1) & 1;
output.coeffRef(2) = (num + 1) & 1;
net.train(input, output);
double mse = net.mse();
cout << "In [" << input << "] "
<< " Desired [" << output << "] "
<< " Out [" << net.mNeurons.back()->unaryExpr(ptr_fun(unary)) << "] "
<< " MSE [" << mse << "]" << endl;
stop = mse < 0.1 ? stop + 1 : 0;
}
}
}
The function takes a network with an architecture { 3, 5, 3 } and does 50000x8 training call till it reaches acceptable error margin. After each training call, it displays input, output, and desired output.
该函数采用架构为 { 3, 5, 3 } 的网络并进行 50000x8 次训练调用,直到达到可接受的误差范围。每次训练调用后,它都会显示输入、输出和所需的输出。
-
In the first stages of training, the MSE (mean square error) is large, and output is so far from desired output.
在训练的第一阶段,MSE(均方误差)很大,输出与期望输出相差甚远。 -
After many rounds of training, the MSE decreased, and output came closer to desired output.
经过多轮训练后,MSE 下降,输出更接近期望输出。 -
Finally, after 788 rounds, the MSE became less than 0.1 and the output was close to desired output.
最终,经过 788 轮后,MSE 变得小于 0.1,输出接近期望输出。
Test Network 测试网络
C++
void test(NeuralNetwork& net) {
cout << "Testing:" << endl;
RowVector input(3), output(3);
for (int num = 0; num < 8; num++) {
input.coeffRef(0) = (num >> 2) & 1;
input.coeffRef(1) = (num >> 1) & 1;
input.coeffRef(2) = num & 1;
output.coeffRef(0) = ((num + 1) >> 2) & 1;
output.coeffRef(1) = ((num + 1) >> 1) & 1;
output.coeffRef(2) = (num + 1) & 1;
net.test(input, output);
double mse = net.mse();
cout << "In [" << input << "] "
<< " Desired [" << output << "] "
<< " Out [" << net.mNeurons.back()->unaryExpr(ptr_fun(unary)) << "] "
<< " MSE [" << mse << "]" << endl;
}
}
This function tests some inputs with the pre-trained network. It prints resultant output and MSE.
该函数使用预先训练的网络测试一些输入。它打印结果输出和 MSE。
Save Network 保存网络
C++
int main() {
NeuralNetwork net({ 3, 5, 3 }, 0.05);
RowVector input(3), output(3);
train(net, input, output);
test(net, input, output);
net.save("params.txt"); // Save architecture and weights
return 0;
}
After training and testing network, we can save network structure in a file to be loaded later for network usage without retraining.
在训练和测试网络之后,我们可以将网络结构保存在一个文件中,以便稍后加载以供网络使用,而无需重新训练。
For our case, resultant file contains:
对于我们的例子,生成的文件包含:
learningRate: 0.05
architecture: 3,5,3
activation: 0
weights:
-1.34013 0.811848 0.314629 1.85447 -0.343212 0.151176
0.98971 -0.684254 1.20649 0.260128 -6.50245 -2.31706
0.702027 -3.15824 -0.80735 1.07841 -2.57619 -2.17761
0.13025 3.17894 0.594173 -3.18092 -0.0574412 -2.39394,
-2.67379 0.467493 0.403606
-1.22918 1.67581 1.60877
1.1605 -1.95284 0.942444
-1.92978 -0.704029 -1.12284
-1.34765 -2.8206 1.44205
-0.996246 -1.52939 0.205469
The first line in weights section represents weights between first neuron in input layer and all neurons of next layer:
权重部分的第一行表示输入层中的第一个神经元与下一层所有神经元之间的权重:
-1.34013 0.811848 0.314629 1.85447 -0.343212 0.151176
The second line in weights section represents weights between second neuron in input layer and all neurons of next layer:
权重部分的第二行表示输入层中的第二个神经元与下一层所有神经元之间的权重:
0.98971 -0.684254 1.20649 0.260128 -6.50245 -2.31706
and, so on ...
等等 ...
Handwritten Digits Recognition 手写数字识别
Handwritten recognition is one of the most successful application for Artificial Neural Network. It is the "Hello world" application for Neural Network study. In the previous use case, I use a shallow neural network, which has three layers of neurons that process inputs and generate outputs. Shallow neural networks can handle equally complex problems. But, in Handwritten Recognition, we need more accuracy and nonlinearity. Therefore, I have to use Deep Neural Network (DNN). DNN has two or more hidden layers of neurons that process inputs.
手写识别是人工神经网络最成功的应用之一。它是神经网络研究的“ Hello world ”应用程序。在前面的用例中,我使用了浅层神经网络,它具有三层神经元,用于处理输入并生成输出。浅层神经网络可以处理同样复杂的问题。但是,在手写识别中,我们需要更高的准确性和非线性。因此,我必须使用深度神经网络(DNN)。 DNN 有两个或多个隐藏神经元来处理输入。
Network Architecture 网络架构
Using a network architecture {784, 64, 16, 10} (input - two hidden layers - output), I have achieved a success of 93.16%.
使用网络架构{784,64,16,10}(输入-两个隐藏层-输出),我取得了93.16%的成功率。
Activity Diagram 活动图
The following figure illustrates activity diagram of the whole process.
下图展示了整个流程的活动图。
Used Libraries 二手图书馆
This project uses: 该项目使用:
- MNIST dataset for network training and testing. You have to download MNIST dataset files and put them in project execution path.
用于网络训练和测试的 MNIST 数据集。您必须下载 MNIST 数据集文件并将它们放入项目执行路径中。 - libpng library for PNG files reading. You can download libpng16 (lib - h) files and put it in project build path.
用于读取 PNG 文件的 libpng 库。您可以下载 libpng16 (lib - h) 文件并将其放入项目构建路径中。 - zlib library used internally by libpng16 to decompress images.
libpng16 内部使用 zlib 库来解压缩图像。
MNIST dataset contains 60,000 training images of handwritten digits from zero to nine and 10,000 images for testing. So, the MNIST dataset has 10 different classes. The handwritten digits images are represented as a 28×28 matrix where each cell contains grayscale pixel value (0 to 1).
MNIST 数据集包含 60,000 个从 0 到 9 的手写数字的训练图像和 10,000 个用于测试的图像。因此,MNIST 数据集有 10 个不同的类别。手写数字图像表示为 28×28 矩阵,其中每个单元格包含灰度像素值(0 到 1)。
Training and Testing 培训和测试
During training and testing, digit is read from its PNG file and converted from 28x28 image to a 784 double value of gray scale. This vector represented the input to the input layer of the neural network.
在训练和测试期间,从 PNG 文件中读取数字并将其从 28x28 图像转换为 784 双灰度值。该向量表示神经网络输入层的输入。
C++
void readPng(const char* filepath, RowVector*& data) {
pngwriter image;
image.readfromfile(filepath);
int width = image.getwidth(); // 28
int height = image.getheight(); // 28
data = new RowVector(width * height); // 784
for (int y = 0; y < height; y++)
for (int x = 0; x < width; x++)
data->coeffRef(0, y * width + x) = image.dread(x, y);
}
The following figure describes full training and testing processes for the network with 60,000 images (50,000 training - 10,000 testing).
下图描述了具有 60,000 个图像的网络的完整训练和测试流程(50,000 个训练 - 10,000 个测试)。
-
In the first stages of training, error is large and the output is so far from the desired output.
在训练的第一阶段,误差很大,输出与期望的输出相差甚远。 -
After many rounds of training, the MSE decreased and the output came closer to the desired output.
经过多轮训练,MSE 下降,输出更接近期望输出。 -
After testing 10000 images:
测试10000张图像后: -
Display Training and Testing Cost and error percentage:
显示培训和测试成本和错误百分比:
Save Network 保存网络
C++
int main() {
.......
if (!testOnly)
net.save("params.txt");
return 0;
}
After training and testing network, we can save network structure in a file to be loaded later for network usage without retraining. If you are going to retrain, you have to delete the file "params.txt" from build path.
在训练和测试网络之后,我们可以将网络结构保存在一个文件中,以便稍后加载以供网络使用,而无需重新训练。如果要重新训练,则必须从构建路径中删除文件“params.txt”。
For our case resultant file contains:
对于我们的案例,结果文件包含:
Shrink ▲ 收缩▲
learningRate: 0.05
architecture: 784,64,16,10
activation: 1
weights:
-0.997497 -0.307718 -0.0558184 0.124485 -0.188635 0.557909 0.242286
-0.898618 -0.942442 0.355693 0.284951 0.100192 0.724357 -0.998474
0.763909 -0.127537 0.893246 -0.956969 -0.492111 -0.775506 -0.603442
-0.907712 -0.987793 -0.0556963 -0.510117 0.450484 0.644276 0.951292
0.105869 -0.76458 0.586596 0.480819 0.253029 -0.672964 -0.418134
0.117222 0.121494 0.439985 -0.459639 -0.514145 0.458296 0.639027
-0.926817 -0.581164 0.774529 -0.392315 -0.985656 0.405133 -0.0527665
-0.0163884 -0.00704978 0.138768 -0.2219 -0.927671 -0.880856 0.977355
-0.927854 0.253273 -0.154149 -0.877621 0.797845 0.388653 0.0682699
0.3361 -0.108066
0.127171 -0.962889 0.39848 -0.457381 0.470931 -0.574816 -0.820429
-0.851558 -0.925108 0.224769 0.575488 0.975402 -0.688955 0.78692
0.0274972 -0.218848 -0.790765 0.708121 0.144139 -0.574694 0.749809
0.781732 0.362285 -0.662099 -0.903134 0.375225 0.581286 -0.679678
0.0863369 0.295511 -0.418195 0.241249 -0.720573 -0.794733 0.0434278
-0.81109 0.895749 0.652699 0.970824 0.643422 -0.0625935 0.776421
-0.656117 0.23075 -0.18247 -0.250649 -0.197546 0.621632 0.804376
-0.976745 0.178747 0.137059 -0.404828 -0.564013 -0.309915 -0.376385
-0.66924 0.245216 -0.3961 0.160741 0.364788 0.150121 -0.811396
-0.837397 -0.901669
....
Evaluation 评估
After testing the network we can calculate evaluation items Precision, Recall, and F1 score from the Confusion Matrix.
测试网络后,我们可以从混淆矩阵中计算评估项 Precision、Recall 和 F1 分数。
Precision is the ratio between correct recognition (true positive) to predicted digit.
精度是正确识别(真阳性)与预测数字之间的比率。
C++
Precision = (0.95+0.97+0.95+0.95+0.92+0.93+0.96+0.95+0.94+0.89)/10 = 94%
Recall is the ratio between correct recognition (true positive) to actual digit.
召回率是正确识别(真阳性)与实际数字之间的比率。
C++
Recall = (0.98+0.98+0.93+0.93+0.92+0.94+0.93+0.94+0.92+0.92)/10 = 94%
C++
void evaluate(NeuralNetwork& net) {
RowVector* precision, * recall;
net.confusionMatrix(precision, recall);
double precisionVal = precision->sum() / precision->cols();
double recallVal = recall->sum() / recall->cols();
double f1score = 2 * precisionVal * recallVal / (precisionVal + recallVal);
cout << "Confusion matrix:" << endl;
cout << *net.mConfusion << endl;
cout << "Precision: " << (int)(precisionVal * 100) << '%' << endl;
cout << *precision << endl;
cout << "Recall: " << (int)(recallVal * 100) << '%' << endl;
cout << *recall << endl;
cout << "F1 score: " << (int)(f1score * 100) << '%' << endl;
delete precision;
delete recall;
}
The resultant values are like that:
结果值是这样的:
Shrink ▲ 收缩▲
Confusion matrix:
98.6735 0.102041 0 0.102041 0 0.204082 0.306122 0.306122 0.306122 0
5.659e-313 98.2379 0.264317 0.176211 0 0 0.264317 0.176211 0.792952 0.0881057
1.06589 0.290698 93.5078 0.387597 1.45349 0.290698 0.484496 1.16279 0.968992 0.387597
0 0.29703 1.18812 93.5644 0.0990099 1.48515 0.29703 0.990099 0.891089 1.18812
0.101833 0 0.407332 0.101833 92.9735 0 0.916497 0.203666 0 5.29532
0.44843 0.112108 0.112108 2.01794 0.336323 94.2825 0.44843 0.560538 1.00897 0.672646
1.46138 0.313152 0.417537 0 1.04384 2.71399 93.6326 0.104384 0.313152 0
0 1.07004 1.16732 0.194553 0.583658 0.194553 0 94.5525 0.0972763 2.14008
0.616016 0.513347 0.616016 1.12936 1.12936 0.718686 0.821355 0.821355 92.4025 1.23203
0.99108 0.396432 0 0.891972 3.07235 0.396432 0.099108 0.693756 0.49554 92.9633
Precision: 94%
0.95459 0.972949 0.958292 0.951662 0.922222 0.934444 0.961415 0.951076 0.948367 0.895893
Recall: 94%
0.986735 0.982379 0.935078 0.935644 0.929735 0.942825 0.936326 0.945525 0.924025 0.929633
F1 score: 94%
We can visualize the confusion matrix in the following table:
我们可以将混淆矩阵可视化如下表:
This table shows how often the model classified each digit correctly in blue, and which digits were most often confused for that label in gray.
该表显示了模型将每个数字正确分类为蓝色的频率,以及哪些数字最常与灰色标签混淆。