dl-ex-merge-1深度学习示例（二）第四章：快速入门 TensorFlow 在本章中，我们将概述一个最广泛使用

深度学习示例（二）

原文：annas-archive.org/md5/81c037237f3318d7e4e398047d4d8413

译者：飞龙

协议：CC BY-NC-SA 4.0

第四章：快速入门 TensorFlow

在本章中，我们将概述一个最广泛使用的深度学习框架。TensorFlow 拥有庞大的社区支持，并且日益壮大，使其成为构建复杂深度学习应用程序的一个良好选择。来自 TensorFlow 网站的介绍：

“TensorFlow 是一个开源软件库，旨在通过数据流图进行数值计算。图中的节点代表数学运算，而图的边缘代表在节点间传递的多维数据数组（张量）。灵活的架构允许你将计算部署到一台或多台 CPU 或 GPU 上，无论是在桌面、服务器还是移动设备上，都可以通过单一的 API 完成。TensorFlow 最初由谷歌机器智能研究组织中的 Google Brain 团队的研究人员和工程师开发，用于进行机器学习和深度神经网络的研究，但该系统足够通用，能够应用于许多其他领域。”

本章将涉及以下内容：

TensorFlow 安装
TensorFlow 环境
计算图
TensorFlow 数据类型、变量和占位符
获取 TensorFlow 输出
TensorBoard——可视化学习

TensorFlow 安装

TensorFlow 安装提供两种模式：CPU 和 GPU。我们将从安装 GPU 模式的 TensorFlow 开始。

TensorFlow GPU 安装教程（Ubuntu 16.04）

TensorFlow 的 GPU 模式安装需要最新版本的 NVIDIA 驱动程序，因为目前只有 GPU 版本的 TensorFlow 支持 CUDA。以下部分将带你通过逐步安装 NVIDIA 驱动程序和 CUDA 8 的过程。

安装 NVIDIA 驱动程序和 CUDA 8

首先，你需要根据你的 GPU 安装正确的 NVIDIA 驱动程序。我使用的是 GeForce GTX 960M GPU，所以我将安装 nvidia-375（如果你使用的是其他 GPU，可以使用 NVIDIA 搜索工具 www.nvidia.com/Download/index.aspx 来帮助你找到正确的驱动程序版本）。如果你想知道你的机器的 GPU 型号，可以在终端中执行以下命令：

lspci | grep -i nvidia

你应该在终端中看到以下输出：

接下来，我们需要添加一个专有的 NVIDIA 驱动程序仓库，以便能够使用 apt-get 安装驱动程序：

sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
sudo apt-get install nvidia-375

在成功安装 NVIDIA 驱动程序后，重新启动机器。要验证驱动程序是否正确安装，可以在终端中执行以下命令：

cat /proc/driver/nvidia/version

你应该在终端中看到以下输出：

接下来，我们需要安装 CUDA 8。打开以下 CUDA 下载链接：developer.nvidia.com/cuda-downloads。根据以下截图选择你的操作系统、架构、发行版、版本，最后选择安装程序类型：

安装文件大约 2 GB。你需要执行以下安装指令：

sudo dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

接下来，我们需要通过执行以下命令将库添加到 .bashrc 文件中：

echo 'export PATH=/usr/local/cuda/bin:$PATH' >> ~/.bashrc

echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc

source ~/.bashrc

接下来，你需要通过执行以下命令来验证 CUDA 8 的安装：

nvcc -V

你应该在终端中看到以下输出：

最后，在本节中，我们需要安装 cuDNN 6.0。NVIDIA CUDA 深度神经网络库（cuDNN）是一个为深度神经网络加速的 GPU 库。你可以从 NVIDIA 的网页下载。执行以下命令以解压并安装 cuDNN：

cd ~/Downloads/

tar xvf cudnn*.tgz

cd cuda

sudo cp */*.h /usr/local/cuda/include/

sudo cp */libcudnn* /usr/local/cuda/lib64/

sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

为了确保你的安装成功，你可以在终端中使用 nvidia-smi 工具。如果安装成功，该工具会提供关于 GPU 的监控信息，比如 RAM 和运行中的进程。

安装 TensorFlow

在为 TensorFlow 准备好 GPU 环境之后，我们现在可以安装 GPU 版本的 TensorFlow。但在安装 TensorFlow 之前，你可以先安装一些有用的 Python 包，这些包将在接下来的章节中帮助你，并使你的开发环境更为方便。

我们可以通过执行以下命令来安装一些数据处理、分析和可视化库：

sudo apt-get update && apt-get install -y python-numpy python-scipy python-nose python-h5py python-skimage python-matplotlib python-pandas python-sklearn python-sympy

sudo apt-get clean && sudo apt-get autoremove

sudo rm -rf /var/lib/apt/lists/*

接下来，你可以安装更多有用的库，如虚拟环境、Jupyter Notebook 等：

sudo apt-get update

sudo apt-get install git python-dev python3-dev python-numpy python3-numpy build-essential  python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev

sudo apt-get install -y libfreetype6-dev libpng12-dev

pip3 install -U matplotlib ipython[all] jupyter pandas scikit-image

最后，我们可以通过执行以下命令开始安装 GPU 版本的 TensorFlow：

pip3 install --upgrade tensorflow-gpu

你可以通过使用 Python 来验证 TensorFlow 是否成功安装：

python3
>>> import tensorflow as tf
>>> a = tf.constant(5)
>>> b = tf.constant(6)
>>> sess = tf.Session()
>>> sess.run(a+b)
// this should print bunch of messages showing device status etc. // If everything goes well, you should see gpu listed in device
>>> sess.close()

你应该在终端中看到以下输出：

TensorFlow CPU 安装（适用于 Ubuntu 16.04）

在本节中，我们将安装 CPU 版本，这个版本在安装之前不需要任何驱动程序。所以，首先让我们安装一些有用的数据处理和可视化的包：

sudo apt-get update && apt-get install -y python-numpy python-scipy python-nose python-h5py python-skimage python-matplotlib python-pandas python-sklearn python-sympy

sudo apt-get clean && sudo apt-get autoremove

sudo rm -rf /var/lib/apt/lists/*

接下来，你可以安装一些有用的库，比如虚拟环境、Jupyter Notebook 等：

sudo apt-get update

sudo apt-get install git python-dev python3-dev python-numpy python3-numpy build-essential  python-pip python3-pip python-virtualenv swig python-wheel libcurl3-dev

sudo apt-get install -y libfreetype6-dev libpng12-dev

pip3 install -U matplotlib ipython[all] jupyter pandas scikit-image

最后，你可以通过执行以下命令来安装最新的 TensorFlow CPU 版本：

pip3 install --upgrade tensorflow

你可以通过运行以下 TensorFlow 语句来检查 TensorFlow 是否成功安装：

python3
>>> import tensorflow as tf
>>> a = tf.constant(5)
>>> b = tf.constant(6)
>>> sess = tf.Session()
>>> sess.run(a+b)
>> sess.close()

你应该在终端中看到以下输出：

TensorFlow CPU 安装（适用于 macOS X）

在本节中，我们将使用 virtualenv 为 macOS X 安装 TensorFlow。所以，首先让我们通过执行以下命令安装 pip 工具：

sudo easy_install pip

接下来，我们需要安装虚拟环境库：

sudo pip install --upgrade virtualenv

安装虚拟环境库后，我们需要创建一个容器或虚拟环境，它将托管 TensorFlow 的安装以及你可能想要安装的任何包，而不会影响底层的主机系统：

virtualenv --system-site-packages targetDirectory # for Python 2.7

virtualenv --system-site-packages -p python3 targetDirectory # for Python 3.n

这里假设targetDirectory为~/tensorflow。

现在你已经创建了虚拟环境，你可以通过输入以下命令来访问它：

source ~/tensorflow/bin/activate

一旦你输入这个命令，你将进入你刚刚创建的虚拟机，你可以在这个环境中安装任何包，而这些包只会安装在这个环境中，不会影响你所使用的底层或主机系统。

要退出环境，你可以输入以下命令：

deactivate

请注意，当前我们确实需要待在虚拟环境内，所以暂时保持它激活。一旦你完成了 TensorFlow 的使用，应该退出虚拟环境：

source bin/activate

为了安装 TensorFlow 的 CPU 版本，你可以输入以下命令，这将同时安装 TensorFlow 所需的所有依赖库：

(tensorflow)$ pip install --upgrade tensorflow      # for Python 2.7

(tensorflow)$ pip3 install --upgrade tensorflow     # for Python 3.n

TensorFlow GPU/CPU 安装指南（Windows）

我们假设你的系统已经安装了 Python 3。要安装 TensorFlow，请以管理员身份启动终端，方法如下：打开开始菜单，搜索 cmd，然后右键点击它并选择“以管理员身份运行”：

一旦你打开了命令窗口，你可以输入以下命令以在 GPU 模式下安装 TensorFlow：

你需要在输入下一个命令之前安装pip或pip3（取决于你的 Python 版本）。

C:\> pip3 install --upgrade tensorflow-gpu

输入以下命令以在 CPU 模式下安装 TensorFlow：

C:\> pip3 install --upgrade tensorflow

TensorFlow 环境

TensorFlow 是谷歌推出的另一个深度学习框架，正如TensorFlow这个名称所暗示的，它源自神经网络在多维数据数组或张量上执行的操作！它实际上是张量的流动。

但首先，为什么我们要在本书中使用深度学习框架？

它扩展了机器学习代码：深度学习和机器学习的大部分研究能够被应用或归因于这些深度学习框架。它们使数据科学家能够极其快速地进行迭代，并使深度学习和其他机器学习算法更加易于实践者使用。像谷歌、Facebook 等大公司正在使用这样的深度学习框架来扩展到数十亿用户。
它计算梯度：深度学习框架也可以自动计算梯度。如果你一步步跟踪梯度计算的过程，你会发现梯度计算并不简单，并且自己实现一个无错的版本可能会很棘手。
它标准化了用于分享的机器学习应用程序：此外，可以在线获取预训练模型，这些模型可以在不同的深度学习框架中使用，并且这些预训练模型帮助那些在 GPU 资源有限的人，这样他们就不必每次都从头开始。我们可以站在巨人的肩膀上，从那里开始。
有很多可用的深度学习框架，具有不同的优势、范式、抽象级别、编程语言等等。
与 GPU 接口进行并行处理：使用 GPU 进行计算是一个非常迷人的特性，因为 GPU 比 CPU 拥有更多的核心和并行化，所以能够大大加速您的代码。

这就是为什么 TensorFlow 几乎是在深度学习中取得进展的必要条件，因为它可以促进您的项目。

所以，简而言之，什么是 TensorFlow？

TensorFlow 是谷歌的深度学习框架，用于使用数据流图进行数值计算的开源工具。
它最初由 Google Brain 团队开发，以促进他们的机器学习研究。
TensorFlow 是表达机器学习算法和执行这些算法的实现的接口。

TensorFlow 是如何工作的，其潜在范式是什么？

计算图

有关 TensorFlow 的所有大想法中最重要的是，数值计算被表达为一个计算图，如下图所示。因此，任何 TensorFlow 程序的核心都将是一个计算图，以下内容为真：

图节点是具有任意数量输入和输出的操作。
我们节点之间的图边将是在这些操作之间流动的张量，关于张量的最佳思考方式实际上是作为n维数组。

使用这样的流图作为深度学习框架的主干的优势在于，它允许您以小而简单的操作构建复杂的模型。此外，当我们在后面讨论梯度计算时，这将使得梯度计算变得非常简单：

另一种思考 TensorFlow 图的方式是，每个操作都是可以在那一点评估的函数。

TensorFlow 数据类型、变量和占位符

对计算图的理解将帮助我们将复杂模型看作是小子图和操作。

让我们看一个只有一个隐藏层的神经网络的例子，以及其在 TensorFlow 中可能的计算图是什么样子：

因此，我们有一些隐藏层，我们试图计算，如某个参数矩阵W时间一些输入x加上偏差项b的 ReLU 激活。ReLU 函数取输出的最大值和零之间的较大者。

下图显示了 TensorFlow 中图形的可能样子：

在这个图中，我们为 b 和 W 定义了变量，并且我们为 x 定义了一个占位符；我们还为图中的每个操作定义了节点。接下来，我们将详细了解这些节点类型。

变量

变量将是有状态的节点，它们输出当前的值。在这个例子中，就是 b 和 W。我们所说的变量是有状态的意思是，它们在多次执行过程中保持其当前值，而且很容易将保存的值恢复到变量中：

此外，变量还有其他有用的功能；例如，它们可以在训练过程中及训练后保存到磁盘，这使得我们之前提到的功能得以实现，即来自不同公司和团队的人们可以保存、存储并将他们的模型参数传输给其他人。而且，变量是你希望调整以最小化损失的东西，我们很快就会看到如何做到这一点。

重要的是要知道，图中的变量，如 b 和 W，仍然是操作，因为根据定义，图中的所有节点都是操作。因此，当你在运行时评估这些持有 b 和 W 值的操作时，你将获得这些变量的值。

我们可以使用 TensorFlow 的 Variable() 函数来定义一个变量并给它一个初始值：

var = tf.Variable(tf.random_normal((0,1)),name='random_values')

这一行代码将定义一个 2x2 的变量，并从标准正态分布中初始化它。你还可以为变量命名。

占位符

下一种类型的节点是占位符。占位符是那些在执行时输入值的节点：

如果你的计算图有依赖于外部数据的输入，这些输入就是我们将在训练过程中添加到计算中的占位符。因此，对于占位符，我们不提供任何初始值。我们只需指定张量的数据类型和形状，这样即使图中还没有存储任何值，计算图仍然知道该计算什么。

我们可以使用 TensorFlow 的占位符函数来创建一个占位符：

ph_var1 = tf.placeholder(tf.float32,shape=(2,3))
ph_var2 = tf.placeholder(tf.float32,shape=(3,2))
result = tf.matmul(ph_var1,ph_var2)

这些代码行定义了两个特定形状的占位符变量，并定义了一个操作（参见下一节），该操作将这两个值相乘。

数学操作

第三种类型的节点是数学操作，它们将是我们的矩阵乘法（MatMul）、加法（Add）和 ReLU。这些都是你 TensorFlow 图中的节点，和 NumPy 操作非常相似：

让我们看看这张图在代码中会是什么样子。

我们执行以下步骤来生成上面的图：

创建权重 W 和 b，包括初始化。我们可以通过从均匀分布中采样来初始化权重矩阵 W，即 W ~ Uniform(-1,1)，并将 b 初始化为 0。
创建输入占位符 x，它将具有 m * 784 的输入矩阵形状。
构建流图。

接下来，让我们按照以下步骤来构建流图：

# import TensorFlow package
import tensorflow as tf
# build a TensorFlow variable b taking in initial zeros of size 100
# ( a vector of 100 values)
b  = tf.Variable(tf.zeros((100,)))
# TensorFlow variable uniformly distributed values between -1 and 1
# of shape 784 by 100
W = tf.Variable(tf.random_uniform((784, 100),-1,1))
# TensorFlow placeholder for our input data that doesn't take in
# any initial values, it just takes a data type 32 bit floats as
# well as its shape
x = tf.placeholder(tf.float32, (100, 784))
# express h as Tensorflow ReLU of the TensorFlow matrix
#Multiplication of x and W and we add b
h = tf.nn.relu(tf.matmul(x,W) + b )
h and see its value until we run this graph. So, this code snippet is just for building a backbone for our model. If you try to print the value of *W* or *b* in the preceding code, you should get the following output in Python:

到目前为止，我们已经定义了我们的图，现在我们需要实际运行它。

获取 TensorFlow 的输出

在前面的部分，我们知道如何构建计算图，但我们需要实际运行它并获取其值。

我们可以通过一种叫做会话（session）的方式来部署/运行图，这实际上是一个绑定到特定执行上下文（例如 CPU 或 GPU）的机制。因此，我们将构建的图部署到 CPU 或 GPU 上下文中。

为了运行图，我们需要定义一个叫做 sess 的会话对象，然后调用 run 函数，该函数接受两个参数：

sess.run(fetches, feeds)

这里：

fetches 是图节点的列表，返回节点的输出。我们关注的正是这些节点的计算值。
feeds 是一个字典，将图节点映射到我们希望在模型中运行的实际值。因此，这就是我们实际填写之前提到的占位符的地方。

所以，让我们继续运行我们的图：

# importing the numpy package for generating random variables for
# our placeholder x
import numpy as np
# build a TensorFlow session object which takes a default execution
# environment which will be most likely a CPU
sess = tf.Session()
# calling the run function of the sess object to initialize all the
# variables.
sess.run(tf.global_variables_initializer())
# calling the run function on the node that we are interested in,
# the h, and we feed in our second argument which is a dictionary
# for our placeholder x with the values that we are interested in.
sess.run(h, {x: np.random.random((100,784))})

通过 sess 对象运行我们的图后，我们应该得到类似下面的输出：

lazy evaluation. It means that the evaluation of your graph only ever happens at runtime, and runtime in TensorFlow means the session. So, calling this function, global_variables_initializer(), will actually initialize anything called variable in your graph, such as *W* and *b* in our case.

我们还可以在一个 with 块中使用会话变量，以确保在执行图后会话会被关闭：

ph_var1 = tf.placeholder(tf.float32,shape=(2,3))
ph_var2 = tf.placeholder(tf.float32,shape=(3,2))
result = tf.matmul(ph_var1,ph_var2)
with tf.Session() as sess:
    print(sess.run([result],feed_dict={ph_var1:[[1.,3.,4.],[1.,3.,4.]],ph_var2:[[1., 3.],[3.,1.],[.1,4.]]}))

Output:
[array([[10.4, 22\. ],
       [10.4, 22\. ]], dtype=float32)]

TensorBoard – 可视化学习

你将用 TensorFlow 进行的计算——例如训练一个庞大的深度神经网络——可能会很复杂且令人困惑，相应的计算图也将非常复杂。为了更容易理解、调试和优化 TensorFlow 程序，TensorFlow 团队提供了一套可视化工具，称为 TensorBoard，这是一个可以通过浏览器运行的 Web 应用套件。TensorBoard 可以用来可视化你的 TensorFlow 图，绘制关于图执行的定量指标，并展示额外的数据，比如通过它的图像。当 TensorBoard 完全配置好后，它看起来是这样的：

为了理解 TensorBoard 的工作原理，我们将构建一个计算图，它将作为 MNIST 数据集的分类器，MNIST 是一个手写图像数据集。

你不需要理解这个模型的所有细节，但它会向你展示一个用 TensorFlow 实现的机器学习模型的一般流程。

所以，让我们从导入 TensorFlow 并使用 TensorFlow 的辅助函数加载所需的数据集开始；这些辅助函数会检查你是否已经下载了数据集，否则它会为你下载：

import tensorflow as tf

# Using TensorFlow helper function to get the MNIST dataset
from tensorflow.examples.tutorials.mnist import input_data
mnist_dataset = input_data.read_data_sets("/tmp/data/", one_hot=True)

Output:
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

接下来，我们需要定义超参数（用于微调模型性能的参数）和模型的输入：

# hyperparameters of the the model (you don't have to understand the functionality of each parameter)
learning_rate = 0.01
num_training_epochs = 25
train_batch_size = 100
display_epoch = 1
logs_path = '/tmp/tensorflow_tensorboard/'

# Define the computational graph input which will be a vector of the image pixels
# Images of MNIST has dimensions of 28 by 28 which will multiply to 784
input_values = tf.placeholder(tf.float32, [None, 784], name='input_values')

# Define the target of the model which will be a classification problem of 10 classes from 0 to 9
target_values = tf.placeholder(tf.float32, [None, 10], name='target_values')

# Define some variables for the weights and biases of the model
weights = tf.Variable(tf.zeros([784, 10]), name='weights')
biases = tf.Variable(tf.zeros([10]), name='biases')

现在我们需要构建模型并定义我们将要优化的代价函数：

# Create the computational graph and encapsulating different operations to different scopes
# which will make it easier for us to understand the visualizations of TensorBoard
with tf.name_scope('Model'):
 # Defining the model
 predicted_values = tf.nn.softmax(tf.matmul(input_values, weights) + biases)

with tf.name_scope('Loss'):
 # Minimizing the model error using cross entropy criteria
 model_cost = tf.reduce_mean(-tf.reduce_sum(target_values*tf.log(predicted_values), reduction_indices=1))

with tf.name_scope('SGD'):
 # using Gradient Descent as an optimization method for the model cost above
 model_optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(model_cost)

with tf.name_scope('Accuracy'):
 #Calculating the accuracy
 model_accuracy = tf.equal(tf.argmax(predicted_values, 1), tf.argmax(target_values, 1))
 model_accuracy = tf.reduce_mean(tf.cast(model_accuracy, tf.float32))

# TensorFlow use the lazy evaluation strategy while defining the variables
# So actually till now none of the above variable got created or initialized
init = tf.global_variables_initializer()

我们将定义一个摘要变量，用于监控特定变量（如损失函数）在训练过程中如何变化，以及其改进情况：

# Create a summary to monitor the model cost tensor
tf.summary.scalar("model loss", model_cost)

# Create another summary to monitor the model accuracy tensor
tf.summary.scalar("model accuracy", model_accuracy)

# Merging the summaries to single operation
merged_summary_operation = tf.summary.merge_all()

最后，我们通过定义一个会话变量来运行模型，该变量将用于执行我们构建的计算图：

# kick off the training process
with tf.Session() as sess:

 # Intialize the variables 
 sess.run(init)

 # operation to feed logs to TensorBoard
 summary_writer = tf.summary.FileWriter(logs_path, graph=tf.get_default_graph())

 # Starting the training cycle by feeding the model by batch at a time
 for train_epoch in range(num_training_epochs):

 average_cost = 0.
 total_num_batch = int(mnist_dataset.train.num_examples/train_batch_size)

 # iterate through all training batches
 for i in range(total_num_batch):
 batch_xs, batch_ys = mnist_dataset.train.next_batch(train_batch_size)

 # Run the optimizer with gradient descent and cost to get the loss
 # and the merged summary operations for the TensorBoard
 _, c, summary = sess.run([model_optimizer, model_cost, merged_summary_operation],
 feed_dict={input_values: batch_xs, target_values: batch_ys})

 # write statistics to the log et every iteration
 summary_writer.add_summary(summary, train_epoch * total_num_batch + i)

 # computing average loss
 average_cost += c / total_num_batch

 # Display logs per epoch step
 if (train_epoch+1) % display_epoch == 0:
 print("Epoch:", '%03d' % (train_epoch+1), "cost=", "{:.9f}".format(average_cost))

 print("Optimization Finished!")

 # Testing the trained model on the test set and getting the accuracy compared to the actual labels of the test set
 print("Accuracy:", model_accuracy.eval({input_values: mnist_dataset.test.images, target_values: mnist_dataset.test.labels}))

 print("To view summaries in the Tensorboard, run the command line:\n" \
 "--> tensorboard --logdir=/tmp/tensorflow_tensorboard " \
"\nThen open http://0.0.0.0:6006/ into your web browser")

训练过程的输出应类似于以下内容：

Epoch: 001 cost= 1.183109128
Epoch: 002 cost= 0.665210275
Epoch: 003 cost= 0.552693334
Epoch: 004 cost= 0.498636444
Epoch: 005 cost= 0.465516675
Epoch: 006 cost= 0.442618381
Epoch: 007 cost= 0.425522513
Epoch: 008 cost= 0.412194222
Epoch: 009 cost= 0.401408134
Epoch: 010 cost= 0.392437336
Epoch: 011 cost= 0.384816745
Epoch: 012 cost= 0.378183398
Epoch: 013 cost= 0.372455584
Epoch: 014 cost= 0.367275238
Epoch: 015 cost= 0.362772711
Epoch: 016 cost= 0.358591895
Epoch: 017 cost= 0.354892231
Epoch: 018 cost= 0.351451424
Epoch: 019 cost= 0.348337946
Epoch: 020 cost= 0.345453095
Epoch: 021 cost= 0.342769080
Epoch: 022 cost= 0.340236065
Epoch: 023 cost= 0.337953151
Epoch: 024 cost= 0.335739001
Epoch: 025 cost= 0.333702818
Optimization Finished!
Accuracy: 0.9146
To view summaries in the Tensorboard, run the command line:
--> tensorboard --logdir=/tmp/tensorflow_tensorboard 
Then open http://0.0.0.0:6006/ into your web browser

为了在 TensorBoard 中查看汇总统计信息，我们将在终端中输入以下命令，执行输出末尾的提示信息：

tensorboard --logdir=/tmp/tensorflow_tensorboard

然后，在你的网页浏览器中打开http://0.0.0.0:6006/。

打开 TensorBoard 后，你应该会看到类似于以下的截图：

这将显示我们监控的变量，如模型的准确度以及它是如何逐渐提高的，模型的损失函数及其如何逐渐降低。因此，你会看到我们在这里经历了一个正常的学习过程。但有时你会发现准确度和模型损失会随机变化，或者你想跟踪一些变量及其在会话期间的变化，这时 TensorBoard 将非常有用，帮助你发现任何随机性或错误。

如果切换到 TensorBoard 的 GRAPHS 标签页，你将看到我们在前面的代码中构建的计算图：

摘要

在本章中，我们涵盖了 Ubuntu 和 Mac 的安装过程，介绍了 TensorFlow 编程模型，并解释了可用于构建复杂操作的不同类型的简单节点，以及如何通过会话对象从 TensorFlow 获取输出。我们还介绍了 TensorBoard，并说明了它在调试和分析复杂深度学习应用中的重要性。

接下来，我们将简单解释神经网络及多层神经网络背后的直觉。我们还将涵盖一些 TensorFlow 的基本示例，并演示如何将其用于回归和分类问题。

第五章：TensorFlow 实践 - 一些基本示例

在本章中，我们将解释 TensorFlow 背后的主要计算概念，即计算图模型，并展示如何通过实现线性回归和逻辑回归帮助你入门。

本章将涵盖以下主题：

单个神经元的能力与激活函数
激活函数
前馈神经网络
多层网络的需求
TensorFlow 术语—回顾
线性回归模型—构建与训练
逻辑回归模型—构建与训练

我们将从解释单个神经元实际上可以做什么/建模开始，并基于此，提出多层网络的需求。接下来，我们将对在 TensorFlow 中使用/可用的主要概念和工具做更详细的阐述，并展示如何使用这些工具构建简单的示例，如线性回归和逻辑回归。

单个神经元的能力

神经网络 是一种计算模型，主要受到人类大脑生物神经网络处理传入信息方式的启发。神经网络在机器学习研究（特别是深度学习）和工业应用中取得了巨大突破，如计算机视觉、语音识别和文本处理等领域取得了突破性的成果。本章中，我们将尝试理解一种特定类型的神经网络，即 多层感知器。

生物学动机与连接

我们大脑的基本计算单元是神经元，我们神经系统中大约有 860 亿个神经元，这些神经元通过大约到的突触相连接。

图 1 显示了生物神经元，图 2 显示了对应的数学模型。在生物神经元的图示中，每个神经元通过树突接收传入信号，然后沿着轴突产生输出信号，轴突分支后通过突触连接到其他神经元。

在神经元的对应数学计算模型中，沿轴突传播的信号与树突的乘法操作相互作用，该树突来自系统中另一个神经元，并根据该突触的突触强度进行交互，突触强度由表示。其核心思想是，突触权重/强度由网络学习，它们控制一个特定神经元对另一个神经元的影响。

此外，在图 2中，树突将信号传送到细胞体，细胞体将这些信号求和。如果最终结果超过某个阈值，神经元就会在计算模型中被激活。

同时，值得一提的是，我们需要控制通过轴突传递的输出脉冲频率，因此我们使用被称为激活函数的东西。实际上，一个常用的激活函数是 Sigmoid 函数 σ，因为它接受一个实数值输入（求和后的信号强度）并将其压缩到 0 和 1 之间。我们将在接下来的部分中看到这些激活函数的详细信息：

图 1：大脑的计算单元（cs231n.github.io/assets/nn1/…

这里是生物学模型对应的基本数学模型：

图 2：大脑计算单元的数学模型（cs231n.github.io/assets/nn1/…

神经网络中的基本计算单元是神经元，通常称为节点或单元。它接收来自其他节点或外部来源的输入，并计算输出。每个输入都有一个相关的权重（w），该权重根据该输入相对于其他输入的重要性分配。节点将一个函数 f（我们稍后会定义）应用于其输入的加权和。

因此，神经网络的一般基本计算单元称为神经元/节点/单元。

这个神经元接收来自前一个神经元或外部来源的输入，然后对该输入进行处理以产生所谓的激活。每个输入到这个神经元的信号都有自己的权重，它表示连接的强度，从而也表示该输入的重要性。

因此，神经网络这个基本构建模块的最终输出是加权求和后的输入 w，然后神经元通过激活函数处理加和后的输出。

图 3：单个神经元

激活函数

神经元的输出如图 3所示进行计算，并通过激活函数进行处理，从而在输出中引入非线性。这个 f 称为激活函数。激活函数的主要目的是：

在神经元的输出中引入非线性。这一点非常重要，因为大多数真实世界的数据是非线性的，我们希望神经元能够学习这些非线性表示。
将输出压缩到特定范围内。

每个激活函数（或非线性函数）接受一个数字并对其执行一定的固定数学操作。在实际中，你可能会遇到几种激活函数。

因此，我们将简要介绍最常见的激活函数。

Sigmoid

从历史上看，Sigmoid 激活函数在研究人员中广泛使用。该函数接受一个实数值输入，并将其压缩到 0 和 1 之间，如下图所示：

σ(x) = 1 / (1 + exp(−x))

图 4：Sigmoid 激活函数

Tanh

Tanh 是另一种激活函数，能够容忍一些负值。Tanh 接受一个实值输入，并将其压缩到 [-1, 1] 之间：

tanh(x) = 2σ(2x) − 1

图 5：Tanh 激活函数

ReLU

整流线性单元（ReLU）不容忍负值，因为它接受一个实值输入并将其在零处进行阈值处理（将负值替换为零）：

f(x) = max(0, x)

图 6：Relu 激活函数

偏置的重要性：偏置的主要功能是为每个节点提供一个可训练的常量值（除了节点接收的正常输入之外）。请参见此链接 stackoverflow.com/questions/2480650/role-of-bias-in-neural-networks 了解有关神经元中偏置作用的更多信息。

前馈神经网络

前馈神经网络是最早且最简单的人工神经网络类型。它包含多个神经元（节点），这些神经元按层排列。相邻层的节点之间有连接或边。这些连接都有与之关联的权重。

一个前馈神经网络的示例如 图 7 所示：

图 7：一个示例前馈神经网络

在前馈网络中，信息仅向一个方向流动——从输入节点，通过隐藏节点（如果有的话），然后到输出节点。网络中没有循环或回路（这种前馈网络的特性与循环神经网络不同，后者节点之间的连接会形成循环）。

多层网络的需求

多层感知器（MLP）包含一个或多个隐藏层（除了一个输入层和一个输出层）。虽然单层感知器只能学习线性函数，但 MLP 也可以学习非线性函数。

图 7 显示了一个具有单个隐藏层的 MLP。请注意，所有连接都有与之关联的权重，但图中仅显示了三个权重（w0、w1 和 w2）。

输入层：输入层有三个节点。偏置节点的值为 1。其他两个节点将 X1 和 X2 作为外部输入（这些数值取决于输入数据集）。如前所述，输入层中不执行计算，因此 输入层 中节点的输出分别是 1、X1 和 X2，并将其送入 隐藏层。

隐藏层： 隐藏层也有三个节点，其中偏置节点的输出为 1。隐藏层中另外两个节点的输出依赖于来自输入层的输出（1，X1 和 X2），以及与连接（边）相关的权重。记住，f指的是激活函数。这些输出随后被馈送到输出层中的节点。

图 8：具有一个隐藏层的多层感知器

输出层： 输出层有两个节点；它们从隐藏层接收输入，并执行类似于高亮显示的隐藏节点所示的计算。计算得出的值（Y1 和 Y2）作为多层感知器的输出。

给定一组特征 X = (x1, x2, …) 和目标 y，多层感知器可以学习特征与目标之间的关系，无论是分类问题还是回归问题。

让我们通过一个例子来更好地理解多层感知器。假设我们有以下学生成绩数据集：

表 1 – 示例学生成绩数据集

学习小时数	期中考试成绩	期末考试结果
35	67	通过
12	75	未通过
16	89	通过
45	56	通过
10	90	未通过

这两列输入数据表示学生学习的小时数和学生在期中考试中获得的成绩。期末结果列可以有两个值，1 或 0，表示学生是否通过期末考试。例如，我们可以看到，如果学生学习了 35 小时并且期中考试得了 67 分，他/她最终通过了期末考试。

现在，假设我们想预测一个学生学习了 25 小时并且期中考试得了 70 分，他/她是否能通过期末考试：

表 2 – 示例学生期末考试结果未知

学习小时数	期中考试成绩	期末考试结果
26	70	?

这是一个二分类问题，其中多层感知器可以从给定的示例（训练数据）中学习，并在给定新数据点时做出有根据的预测。我们很快就会看到多层感知器如何学习这些关系。

训练我们的 MLP – 反向传播算法

多层感知器学习的过程称为反向传播算法。我推荐阅读 Hemanth Kumar 在 Quora 上的这篇回答，www.quora.com/How-do-you-explain-back-propagation-algorithm-to-a-beginner-in-neural-network/answer/Hemanth-Kumar-Mantri（后面引用），该回答清晰地解释了反向传播。

"误差反向传播，通常简称为 BackProp，是人工神经网络（ANN）训练的几种方式之一。它是一种监督式训练方法，这意味着它从带标签的训练数据中学习（有一个监督者来引导其学习）。

简单来说，BackProp 就像是“从错误中学习”。每当 ANN 犯错时，监督者都会纠正它。

一个 ANN 由不同层次的节点组成：输入层、隐藏层和输出层。相邻层之间节点的连接有与之关联的“权重”。学习的目标是为这些边分配正确的权重。给定一个输入向量，这些权重决定了输出向量的值。

在监督学习中，训练集是标注的。这意味着对于某些给定的输入，我们知道期望/预期的输出（标签）。

反向传播算法：

最初，所有边的权重是随机分配的。对于训练数据集中的每个输入，激活人工神经网络（ANN）并观察其输出。将此输出与我们已知的期望输出进行比较，误差被“传播”回前一层。该误差被记录并相应地“调整”权重。这个过程会不断重复，直到输出误差低于预定的阈值。

一旦上述算法终止，我们就得到了一个“学习过”的 ANN，我们认为它已经准备好处理“新”输入。这个 ANN 被认为已经从多个示例（标注数据）以及它的错误（误差传播）中学习了。”

—Hemanth Kumar。

现在我们了解了反向传播的工作原理，让我们回到学生成绩数据集。

显示在图 8中的 MLP 有两个输入层节点，分别接收学习时长和期中成绩作为输入。它还拥有一个包含两个节点的隐藏层。输出层也有两个节点；上层节点输出通过的概率，而下层节点输出失败的概率。

在分类应用中，我们广泛使用 softmax 函数 (cs231n.github.io/linear-classify/#softmax) 作为 MLP 输出层的激活函数，以确保输出是概率，并且它们的和为 1。softmax 函数接受一个任意实数值的向量，并将其压缩成一个在 0 和 1 之间的值的向量，且它们的和为 1。因此，在此情况下：

步骤 1 – 前向传播

网络中的所有权重都是随机初始化的。我们考虑一个特定的隐藏层节点，并称其为V。假设从输入到该节点的连接权重为w1、w2和w3（如图所示）。

然后，网络将第一个训练样本作为输入（我们知道，对于输入 35 和 67，及格的概率是 1）：

网络输入 = [35, 67]
网络期望输出（目标） = [1, 0]

然后，考虑节点的输出V，可以通过以下方式计算（f 是激活函数，如 sigmoid）：

V = f (1w1 + 35w2 + 67w3)*

同样，来自隐藏层的另一个节点的输出也会被计算出来。隐藏层中两个节点的输出作为输入，传递给输出层的两个节点。这使我们能够计算输出层两个节点的输出概率。

假设输出层两个节点的输出概率分别是 0.4 和 0.6（由于权重是随机分配的，输出也会是随机的）。我们可以看到，计算出来的概率（0.4 和 0.6）与期望的概率（分别是 1 和 0）相差很远，因此可以说网络产生了错误的输出。

步骤 2 – 反向传播与权重更新

我们计算输出节点的总误差，并通过反向传播将这些误差传递回网络，计算梯度。然后，我们使用诸如梯度下降之类的优化方法来调整网络中所有的权重，目的是减少输出层的误差。

假设考虑的节点的新权重是w4、w5和w6（经过反向传播并调整权重后）。

如果我们现在将相同的样本作为输入喂入网络，由于权重已经被优化以最小化预测误差，网络的表现应该比初始运行更好。输出节点的误差现在减少到[0.2, -0.2]，而之前是[0.6, -0.4]。这意味着我们的网络已经学会正确地分类我们的第一个训练样本。

我们对数据集中的所有其他训练样本重复这个过程。然后，我们可以说我们的网络已经学习了这些示例。

如果我们现在想预测一名学习了 25 小时并且期中考试得了 70 分的学生是否能通过期末考试，我们通过前向传播步骤，找到通过与不通过的输出概率。

我在这里避免了数学方程和梯度下降等概念的解释，而是尽量为算法建立直觉。关于反向传播算法的更深入的数学讨论，请参考这个链接：home.agh.edu.pl/%7Evlsi/AI/backp_t_en/backprop.html。

TensorFlow 术语回顾

本节将概述 TensorFlow 库以及基本 TensorFlow 应用程序的结构。TensorFlow 是一个开源库，用于创建大规模的机器学习应用程序；它可以在各种硬件上建模计算，从安卓设备到异构多 GPU 系统。

TensorFlow 使用一种特殊的结构来在不同的设备上执行代码，如 CPU 和 GPU。计算被定义为一个图形，每个图形由操作组成，也称为操作，因此每当我们使用 TensorFlow 时，我们都会在图形中定义一系列操作。

要运行这些操作，我们需要将图形加载到一个会话中。会话会翻译这些操作并将它们传递给设备进行执行。

例如，下面的图像表示了一个 TensorFlow 图形。W、x 和 b 是图中边缘上的张量。MatMul 是对张量 W 和 x 的操作；之后，调用 Add，并将前一个操作的结果与 b 相加。每个操作的结果张量会传递给下一个操作，直到最后，可以得到所需的结果。

图 9：示例 TensorFlow 计算图

为了使用 TensorFlow，我们需要导入该库；我们将其命名为 tf，这样就可以通过写 tf 点号再加上模块名来访问模块：

import tensorflow as tf

为了创建我们的第一个图形，我们将从使用源操作开始，这些操作不需要任何输入。这些源操作或源操作将把它们的信息传递给其他操作，这些操作将实际执行计算。

让我们创建两个源操作，它们将输出数字。我们将它们定义为 A 和 B，你可以在下面的代码片段中看到：

A = tf.constant([2])

B = tf.constant([3])

之后，我们将定义一个简单的计算操作 tf.add()，用来将两个元素相加。你也可以使用 C = A + B，如下面的代码所示：

C = tf.add(A,B)

#C = A + B is also a way to define the sum of the terms

由于图形需要在会话的上下文中执行，我们需要创建一个会话对象：

session = tf.Session()

为了查看图形，让我们运行会话来获取之前定义的 C 操作的结果：

result = session.run(C)
print(result)

Output:
[5]

你可能会觉得，仅仅是加两个数字就做了很多工作，但理解 TensorFlow 的基本结构是非常重要的。一旦你理解了它，你就可以定义任何你想要的计算；再次强调，TensorFlow 的结构使它能够处理不同设备（CPU 或 GPU）甚至集群上的计算。如果你想了解更多，可以运行方法tf.device()。

你也可以随时实验 TensorFlow 的结构，以便更好地理解它是如何工作的。如果你想查看 TensorFlow 支持的所有数学操作，可以查阅文档。

到现在为止，你应该已经理解了 TensorFlow 的结构以及如何创建基本的应用程序。

使用 TensorFlow 定义多维数组

现在我们将尝试使用 TensorFlow 定义这些数组：

salar_var = tf.constant([4])
vector_var = tf.constant([5,4,2])
matrix_var = tf.constant([[1,2,3],[2,2,4],[3,5,5]])
tensor = tf.constant( [ [[1,2,3],[2,3,4],[3,4,5]] , [[4,5,6],[5,6,7],[6,7,8]] , [[7,8,9],[8,9,10],[9,10,11]] ] )
with tf.Session() as session:
    result = session.run(salar_var)
    print "Scalar (1 entry):\n %s \n" % result
    result = session.run(vector_var)
    print "Vector (3 entries) :\n %s \n" % result
    result = session.run(matrix_var)
    print "Matrix (3x3 entries):\n %s \n" % result
    result = session.run(tensor)
    print "Tensor (3x3x3 entries) :\n %s \n" % result

Output:
Scalar (1 entry):
 [2] 

Vector (3 entries) :
 [5 6 2] 

Matrix (3x3 entries):
 [[1 2 3]
 [2 3 4]
 [3 4 5]] 

Tensor (3x3x3 entries) :
 [[[ 1  2  3]
  [ 2  3  4]
  [ 3  4  5]]

 [[ 4  5  6]
  [ 5  6  7]
  [ 6  7  8]]

 [[ 7  8  9]
  [ 8  9 10]
  [ 9 10 11]]]

现在你已经理解了这些数据结构，我鼓励你使用一些之前的函数来尝试这些数据结构，看看它们如何根据结构类型表现：

Matrix_one = tf.constant([[1,2,3],[2,3,4],[3,4,5]])
Matrix_two = tf.constant([[2,2,2],[2,2,2],[2,2,2]])
first_operation = tf.add(Matrix_one, Matrix_two)
second_operation = Matrix_one + Matrix_two
with tf.Session() as session:
    result = session.run(first_operation)
    print "Defined using tensorflow function :"
    print(result)
    result = session.run(second_operation)
    print "Defined using normal expressions :"
    print(result)

Output:
Defined using tensorflow function :
[[3 4 5]
 [4 5 6]
 [5 6 7]]
Defined using normal expressions :
[[3 4 5]
 [4 5 6]
 [5 6 7]]

使用常规符号定义以及tensorflow函数，我们能够实现逐元素相乘，也叫做哈达玛积。但如果我们想要常规的矩阵乘法呢？我们需要使用另一个 TensorFlow 函数，叫做tf.matmul()：

Matrix_one = tf.constant([[2,3],[3,4]])
Matrix_two = tf.constant([[2,3],[3,4]])
first_operation = tf.matmul(Matrix_one, Matrix_two)
with tf.Session() as session:
    result = session.run(first_operation)
    print "Defined using tensorflow function :"
    print(result)

Output:
Defined using tensorflow function :
[[13 18]
 [18 25]]

我们也可以自己定义这个乘法，但已经有一个函数可以做这个，所以不需要重新发明轮子！

为什么使用张量？

张量结构通过赋予我们自由来帮助我们按自己想要的方式构造数据集。

这在处理图像时特别有用，因为图像中信息的编码方式。

想到图像时，很容易理解它有高度和宽度，因此用二维结构（矩阵）表示其中包含的信息是有意义的……直到你记得图像有颜色。为了添加颜色信息，我们需要另一个维度，这就是张量特别有用的地方。

图像被编码为颜色通道；图像数据在每个颜色的强度在给定点的颜色通道中表示，最常见的是 RGB（即红色、蓝色和绿色）。图像中包含的信息是每个通道颜色在图像的宽度和高度中的强度，就像这样：

图 10：特定图像的不同颜色通道

因此，红色通道在每个点上的强度（带宽和高度）可以用矩阵表示；蓝色和绿色通道也是如此。于是，我们最终得到三个矩阵，当这些矩阵结合在一起时，就形成了一个张量。

变量

现在我们更熟悉数据的结构了，我们将看看 TensorFlow 如何处理变量。

要定义变量，我们使用命令tf.variable()。为了能够在计算图中使用变量，有必要在会话中运行图之前初始化它们。这可以通过运行tf.global_variables_initializer()来完成。

要更新变量的值，我们只需运行一个赋值操作，将一个值分配给变量：

state = tf.Variable(0)

让我们首先创建一个简单的计数器，一个每次增加一个单位的变量：

one = tf.constant(1)
new_value = tf.add(state, one)
update = tf.assign(state, new_value)

变量必须通过运行初始化操作来初始化，前提是图已启动。我们首先需要将初始化操作添加到图中：

init_op = tf.global_variables_initializer()

然后，我们启动一个会话来运行图。

我们首先初始化变量，然后打印状态变量的初始值，最后运行更新状态变量的操作，并在每次更新后打印结果：

with tf.Session() as session:
 session.run(init_op)
 print(session.run(state))
 for _ in range(3):
    session.run(update)
    print(session.run(state))

Output:
0
1
2
3

占位符

现在，我们知道如何在 TensorFlow 中操作变量，但如果要向 TensorFlow 模型外部提供数据怎么办？

如果你想从模型外部向 TensorFlow 模型提供数据，你需要使用占位符。

那么，这些占位符是什么，它们有什么作用？占位符可以看作是模型中的空洞，空洞是你将数据传递给它的地方。你可以通过 tf.placeholder(datatype) 创建它们，其中 datatype 指定数据的类型（整数、浮点数、字符串和布尔值）以及其精度（8、16、32 和 64 位）。

每种数据类型的定义和相应的 Python 语法如下：

表 3 – 不同 TensorFlow 数据类型的定义

数据类型	Python 类型	描述
`DT_FLOAT`	`tf.float32`	32 位浮点数。
`DT_DOUBLE`	`tf.float64`	64 位浮点数
`DT_INT8`	`tf.int8`	8 位带符号整数。
`DT_INT16`	`tf.int16`	16 位带符号整数。
`DT_INT32`	`tf.int32`	32 位带符号整数。
`DT_INT64`	`tf.int64`	64 位带符号整数。
`DT_UINT8`	`tf.uint8`	8 位无符号整数。
`DT_STRING`	`tf.string`	可变长度的字节数组。每个张量的元素都是一个字节数组。
`DT_BOOL`	`tf.bool`	布尔值。
`DT_COMPLEX64`	`tf.complex64`	由两个 32 位浮点数（实部和虚部）组成的复数。
`DT_COMPLEX128`	`tf.complex128`	由两个 64 位浮点数（实部和虚部）组成的复数。
`DT_QINT8`	`tf.qint8`	用于量化操作的 8 位带符号整数。
`DT_QINT32`	`tf.qint32`	用于量化操作的 32 位带符号整数。
`DT_QUINT8`	`tf.quint8`	用于量化操作的 8 位无符号整数。

所以，让我们创建一个占位符：

a=tf.placeholder(tf.float32)

定义一个简单的乘法操作：

b=a*2

现在，我们需要定义并运行会话，但由于我们在模型中创建了一个空洞来传递数据，因此在初始化会话时，我们必须传递一个带有数据的参数；否则会出现错误。

为了将数据传递给模型，我们调用会话时会传入一个额外的参数 feed_dict，在其中我们应该传递一个字典，字典的每个占位符名称后跟其对应的数据，就像这样：

with tf.Session() as sess:
    result = sess.run(b,feed_dict={a:3.5})
    print result

Output:
7.0

由于 TensorFlow 中的数据是以多维数组的形式传递的，我们可以通过占位符传递任何类型的张量，以获得简单的乘法操作的结果：

dictionary={a: [ [ [1,2,3],[4,5,6],[7,8,9],[10,11,12] ] , [ [13,14,15],[16,17,18],[19,20,21],[22,23,24] ] ] }
with tf.Session() as sess:
    result = sess.run(b,feed_dict=dictionary)
    print result

Output:
[[[  2\.   4\.   6.]
  [  8\.  10\.  12.]
  [ 14\.  16\.  18.]
  [ 20\.  22\.  24.]]

 [[ 26\.  28\.  30.]
  [ 32\.  34\.  36.]
  [ 38\.  40\.  42.]
  [ 44\.  46\.  48.]]]

操作

操作是表示图中张量的数学运算的节点。这些操作可以是任何类型的函数，比如加法、减法张量，或者可能是激活函数。

tf.matmul、tf.add 和 tf.nn.sigmoid 是 TensorFlow 中的一些操作。这些类似于 Python 中的函数，但直接作用于张量，每个函数都有特定的功能。

其他操作可以在以下网址找到：www.tensorflow.org/api_guides/python/math_ops。

让我们来尝试一些操作：

a = tf.constant([5])
b = tf.constant([2])
c = tf.add(a,b)
d = tf.subtract(a,b)
with tf.Session() as session:
    result = session.run(c)
    print 'c =: %s' % result
    result = session.run(d)
    print 'd =: %s' % result

Output:
c =: [7]
d =: [3]

tf.nn.sigmoid 是一个激活函数：它有点复杂，但这个函数有助于学习模型评估什么样的信息是有用的，什么是无用的。

线性回归模型——构建与训练

根据我们在第二章《数据建模实践——泰坦尼克号示例》中的线性回归解释，数据建模实践——泰坦尼克号示例，我们将依赖这个定义来构建一个简单的线性回归模型。

让我们首先导入实现所需的必要包：

import numpy as np
import tensorflow as tf
import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10, 6)

让我们定义一个自变量：

input_values = np.arange(0.0, 5.0, 0.1)
input_values

Output:
array([ 0\. ,  0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9,  1\. ,
        1.1,  1.2,  1.3,  1.4,  1.5,  1.6,  1.7,  1.8,  1.9,  2\. ,  2.1,
        2.2,  2.3,  2.4,  2.5,  2.6,  2.7,  2.8,  2.9,  3\. ,  3.1,  3.2,
        3.3,  3.4,  3.5,  3.6,  3.7,  3.8,  3.9,  4\. ,  4.1,  4.2,  4.3,
        4.4,  4.5,  4.6,  4.7,  4.8,  4.9])

##You can adjust the slope and intercept to verify the changes in the graph
weight=1
bias=0
output = weight*input_values + bias
plt.plot(input_values,output)
plt.ylabel('Dependent Variable')
plt.xlabel('Indepdendent Variable')
plt.show()
Output:

图 11：依赖变量与自变量的可视化

现在，让我们看看这如何转化为 TensorFlow 代码。

使用 TensorFlow 进行线性回归

在第一部分，我们将生成随机数据点并定义线性关系；我们将使用 TensorFlow 来调整并获得正确的参数：

input_values = np.random.rand(100).astype(np.float32)

这个示例中使用的模型方程是：

这个方程没有什么特别之处，它只是我们用来生成数据点的模型。事实上，你可以像稍后一样更改参数。我们添加了一些高斯噪声，使数据点看起来更有趣：

output_values = input_values * 2 + 3
output_values = np.vectorize(lambda y: y + np.random.normal(loc=0.0, scale=0.1))(output_values)

这是数据的一个示例：

list(zip(input_values,output_values))[5:10]

Output:
[(0.25240293, 3.474361759429548), 
(0.946697, 4.980617375175061), 
(0.37582186, 3.650345806087635), 
(0.64025956, 4.271037640404975), 
(0.62555283, 4.37001850440196)]

首先，我们用任何随机猜测初始化变量和，然后我们定义线性函数：

weight = tf.Variable(1.0)
bias = tf.Variable(0.2)
predicted_vals = weight * input_values + bias

在典型的线性回归模型中，我们最小化我们希望调整的方程的平方误差，减去目标值（即我们拥有的数据），因此我们将要最小化的方程定义为损失。

为了找到损失值，我们使用 tf.reduce_mean()。这个函数计算多维张量的均值，结果可以具有不同的维度：

model_loss = tf.reduce_mean(tf.square(predicted_vals - output_values))

然后，我们定义优化器方法。在这里，我们将使用简单的梯度下降法，学习率为 0.5。

现在，我们将定义图表的训练方法，但我们将使用什么方法来最小化损失呢？答案是 tf.train.GradientDescentOptimizer。

.minimize() 函数将最小化优化器的误差函数，从而得到一个更好的模型：

model_optimizer = tf.train.GradientDescentOptimizer(0.5)
train = model_optimizer.minimize(model_loss)

别忘了在执行图表之前初始化变量：

init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)

现在，我们准备开始优化并运行图表：

train_data = []
for step in range(100):
    evals = sess.run([train,weight,bias])[1:]
    if step % 5 == 0:
       print(step, evals)
       train_data.append(evals)

Output:
(0, [2.5176678, 2.9857566])
(5, [2.4192538, 2.3015416])
(10, [2.5731843, 2.221911])
(15, [2.6890132, 2.1613526])
(20, [2.7763696, 2.1156814])
(25, [2.8422525, 2.0812368])
(30, [2.8919399, 2.0552595])
(35, [2.9294133, 2.0356679])
(40, [2.957675, 2.0208921])
(45, [2.9789894, 2.0097487])
(50, [2.9950645, 2.0013444])
(55, [3.0071881, 1.995006])
(60, [3.0163314, 1.9902257])
(65, [3.0232272, 1.9866205])
(70, [3.0284278, 1.9839015])
(75, [3.0323503, 1.9818509])
(80, [3.0353084, 1.9803041])
(85, [3.0375392, 1.9791379])
(90, [3.039222, 1.9782581])
(95, [3.0404909, 1.9775947])

让我们可视化训练过程，以适应数据点：

print('Plotting the data points with their corresponding fitted line...')
converter = plt.colors
cr, cg, cb = (1.0, 1.0, 0.0)

for f in train_data:

    cb += 1.0 / len(train_data)
    cg -= 1.0 / len(train_data)

    if cb > 1.0: cb = 1.0

    if cg < 0.0: cg = 0.0

    [a, b] = f
    f_y = np.vectorize(lambda x: a*x + b)(input_values)
    line = plt.plot(input_values, f_y)
    plt.setp(line, color=(cr,cg,cb))

plt.plot(input_values, output_values, 'ro')
green_line = mpatches.Patch(color='red', label='Data Points')
plt.legend(handles=[green_line])
plt.show()

Output:

图 12：回归线拟合数据点的可视化

逻辑回归模型——构建与训练

同样根据我们在第二章《数据建模实践——泰坦尼克号示例》中的逻辑回归解释，数据建模实践——泰坦尼克号示例，我们将实现 TensorFlow 中的逻辑回归算法。简而言之，逻辑回归将输入通过逻辑/ sigmoid 函数传递，然后将结果视为概率：

图 13：区分两个线性可分类别，0 和 1

在 TensorFlow 中使用逻辑回归

为了在 TensorFlow 中使用逻辑回归，我们首先需要导入我们将要使用的库。为此，你可以运行以下代码单元：

import tensorflow as tf

import pandas as pd

import numpy as np
import time
from sklearn.datasets import load_iris
from sklearn.cross_validation import train_test_split
import matplotlib.pyplot as plt

接下来，我们将加载我们要使用的数据集。在这种情况下，我们使用内置的鸢尾花数据集。因此，不需要进行任何预处理，我们可以直接开始操作它。我们将数据集分成 x 和 y，然后再分成训练集的 x 和 y 以及测试集的 x 和 y，（伪）随机地：

iris_dataset = load_iris()
iris_input_values, iris_output_values = iris_dataset.data[:-1,:], iris_dataset.target[:-1]
iris_output_values= pd.get_dummies(iris_output_values).values
train_input_values, test_input_values, train_target_values, test_target_values = train_test_split(iris_input_values, iris_output_values, test_size=0.33, random_state=42)

现在，我们定义了 x 和 y。这些占位符将存储我们的鸢尾花数据（包括特征和标签矩阵），并帮助将它们传递到算法的不同部分。你可以把占位符看作是空的壳子，我们将数据插入到这些壳子里。我们还需要给它们指定与数据形状相对应的形状。稍后，我们将通过 feed_dict（数据字典）将数据插入到这些占位符中：

为什么使用占位符？

TensorFlow 的这一特性使得我们可以创建一个接受数据并且知道数据形状的算法，而不需要知道进入的数据量。在训练时，当我们插入 batch 数据时，我们可以轻松调整每次训练步骤中训练样本的数量，而无需改变整个算法：

# numFeatures is the number of features in our input data.
# In the iris dataset, this number is '4'.
num_explanatory_features = train_input_values.shape[1]

# numLabels is the number of classes our data points can be in.
# In the iris dataset, this number is '3'.
num_target_values = train_target_values.shape[1]

# Placeholders
# 'None' means TensorFlow shouldn't expect a fixed number in that dimension
input_values = tf.placeholder(tf.float32, [None, num_explanatory_features]) # Iris has 4 features, so X is a tensor to hold our data.
output_values = tf.placeholder(tf.float32, [None, num_target_values]) # This will be our correct answers matrix for 3 classes.

设置模型的权重和偏置

和线性回归类似，我们需要一个共享的变量权重矩阵用于逻辑回归。我们将 W 和 b 都初始化为全零的张量。因为我们将要学习 W 和 b，所以它们的初始值并不重要。这些变量是定义我们回归模型结构的对象，我们可以在训练后保存它们，以便以后重用。

我们定义了两个 TensorFlow 变量作为我们的参数。这些变量将存储我们逻辑回归的权重和偏置，并且在训练过程中会不断更新。

请注意，W 的形状是 [4, 3]，因为我们希望将 4 维的输入向量与其相乘，以产生 3 维的证据向量来区分不同的类别。b 的形状是 [3]，因此我们可以将它加到输出中。此外，与我们的占位符（本质上是等待数据的空壳）不同，TensorFlow 变量需要用值进行初始化，比如使用零初始化：

#Randomly sample from a normal distribution with standard deviation .01

weights = tf.Variable(tf.random_normal([num_explanatory_features,num_target_values],
                                      mean=0,
                                      stddev=0.01,
                                      name="weights"))

biases = tf.Variable(tf.random_normal([1,num_target_values],
                                   mean=0,
                                   stddev=0.01,
                                   name="biases"))

逻辑回归模型

我们现在定义我们的操作，以便正确地运行逻辑回归。逻辑回归通常被视为一个单一的方程：

然而，为了清晰起见，我们可以将其拆分为三个主要部分：

一个加权特征矩阵乘法操作
对加权特征和偏置项的求和
最后，应用 Sigmoid 函数

因此，您将会发现这些组件被定义为三个独立的操作：

# Three-component breakdown of the Logistic Regression equation.
# Note that these feed into each other.
apply_weights = tf.matmul(input_values, weights, name="apply_weights")
add_bias = tf.add(apply_weights, biases, name="add_bias")
activation_output = tf.nn.sigmoid(add_bias, name="activation")

正如我们之前所看到的，我们将使用的函数是逻辑函数，在应用权重和偏差后将输入数据提供给它。在 TensorFlow 中，这个函数被实现为nn.sigmoid函数。有效地，它将带有偏差的加权输入拟合到 0-100 百分比曲线中，这是我们想要的概率函数。

训练

学习算法是如何搜索最佳权重向量（w）的。这个搜索是一个优化问题，寻找能够优化错误/成本度量的假设。

因此，模型的成本或损失函数将告诉我们我们的模型不好，我们需要最小化这个函数。您可以遵循不同的损失或成本标准。在这个实现中，我们将使用均方误差（MSE）作为损失函数。

为了完成最小化损失函数的任务，我们将使用梯度下降算法。

成本函数

在定义我们的成本函数之前，我们需要定义我们将要训练多长时间以及我们应该如何定义学习速率：

#Number of training epochs
num_epochs = 700
# Defining our learning rate iterations (decay)
learning_rate = tf.train.exponential_decay(learning_rate=0.0008,
                                          global_step=1,
                                          decay_steps=train_input_values.shape[0],
                                          decay_rate=0.95,
                                          staircase=True)

# Defining our cost function - Squared Mean Error
model_cost = tf.nn.l2_loss(activation_output - output_values, name="squared_error_cost")
# Defining our Gradient Descent
model_train = tf.train.GradientDescentOptimizer(learning_rate).minimize(model_cost)

现在，是时候通过会话变量执行我们的计算图了。

首先，我们需要使用tf.initialize_all_variables()将我们的权重和偏差初始化为零或随机值。这个初始化步骤将成为我们计算图中的一个节点，当我们将图放入会话中时，操作将运行并创建变量：

# tensorflow session
sess = tf.Session()

# Initialize our variables.
init = tf.global_variables_initializer()
sess.run(init)

#We also want some additional operations to keep track of our model's efficiency over time. We can do this like so:
# argmax(activation_output, 1) returns the label with the most probability
# argmax(output_values, 1) is the correct label
correct_predictions = tf.equal(tf.argmax(activation_output,1),tf.argmax(output_values,1))

# If every false prediction is 0 and every true prediction is 1, the average returns us the accuracy
model_accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"))

# Summary op for regression output
activation_summary = tf.summary.histogram("output", activation_output)

# Summary op for accuracy
accuracy_summary = tf.summary.scalar("accuracy", model_accuracy)

# Summary op for cost
cost_summary = tf.summary.scalar("cost", model_cost)

# Summary ops to check how variables weights and biases are updating after each iteration to be visualized in TensorBoard
weight_summary = tf.summary.histogram("weights", weights.eval(session=sess))
bias_summary = tf.summary.histogram("biases", biases.eval(session=sess))

merged = tf.summary.merge([activation_summary, accuracy_summary, cost_summary, weight_summary, bias_summary])
writer = tf.summary.FileWriter("summary_logs", sess.graph)

#Now we can define and run the actual training loop, like this:
# Initialize reporting variables

inital_cost = 0
diff = 1
epoch_vals = []
accuracy_vals = []
costs = []

# Training epochs
for i in range(num_epochs):
    if i > 1 and diff < .0001:
       print("change in cost %g; convergence."%diff)
       break

    else:
       # Run training step
       step = sess.run(model_train, feed_dict={input_values: train_input_values, output_values: train_target_values})

       # Report some stats evert 10 epochs
       if i % 10 == 0:
           # Add epoch to epoch_values
           epoch_vals.append(i)

           # Generate the accuracy stats of the model
           train_accuracy, new_cost = sess.run([model_accuracy, model_cost], feed_dict={input_values: train_input_values, output_values: train_target_values})

           # Add accuracy to live graphing variable
           accuracy_vals.append(train_accuracy)

           # Add cost to live graphing variable
           costs.append(new_cost)
>
           # Re-assign values for variables
           diff = abs(new_cost - inital_cost)
           cost = new_cost

           print("Training step %d, accuracy %g, cost %g, cost change %g"%(i, train_accuracy, new_cost, diff))

Output:
Training step 0, accuracy 0.343434, cost 34.6022, cost change 34.6022
Training step 10, accuracy 0.434343, cost 30.3272, cost change 30.3272
Training step 20, accuracy 0.646465, cost 28.3478, cost change 28.3478
Training step 30, accuracy 0.646465, cost 26.6752, cost change 26.6752
Training step 40, accuracy 0.646465, cost 25.2844, cost change 25.2844
Training step 50, accuracy 0.646465, cost 24.1349, cost change 24.1349
Training step 60, accuracy 0.646465, cost 23.1835, cost change 23.1835
Training step 70, accuracy 0.646465, cost 22.3911, cost change 22.3911
Training step 80, accuracy 0.646465, cost 21.7254, cost change 21.7254
Training step 90, accuracy 0.646465, cost 21.1607, cost change 21.1607
Training step 100, accuracy 0.666667, cost 20.677, cost change 20.677
Training step 110, accuracy 0.666667, cost 20.2583, cost change 20.2583
Training step 120, accuracy 0.666667, cost 19.8927, cost change 19.8927
Training step 130, accuracy 0.666667, cost 19.5705, cost change 19.5705
Training step 140, accuracy 0.666667, cost 19.2842, cost change 19.2842
Training step 150, accuracy 0.666667, cost 19.0278, cost change 19.0278
Training step 160, accuracy 0.676768, cost 18.7966, cost change 18.7966
Training step 170, accuracy 0.69697, cost 18.5867, cost change 18.5867
Training step 180, accuracy 0.69697, cost 18.3951, cost change 18.3951
Training step 190, accuracy 0.717172, cost 18.2191, cost change 18.2191
Training step 200, accuracy 0.717172, cost 18.0567, cost change 18.0567
Training step 210, accuracy 0.737374, cost 17.906, cost change 17.906
Training step 220, accuracy 0.747475, cost 17.7657, cost change 17.7657
Training step 230, accuracy 0.747475, cost 17.6345, cost change 17.6345
Training step 240, accuracy 0.757576, cost 17.5113, cost change 17.5113
Training step 250, accuracy 0.787879, cost 17.3954, cost change 17.3954
Training step 260, accuracy 0.787879, cost 17.2858, cost change 17.2858
Training step 270, accuracy 0.787879, cost 17.182, cost change 17.182
Training step 280, accuracy 0.787879, cost 17.0834, cost change 17.0834
Training step 290, accuracy 0.787879, cost 16.9895, cost change 16.9895
Training step 300, accuracy 0.79798, cost 16.8999, cost change 16.8999
Training step 310, accuracy 0.79798, cost 16.8141, cost change 16.8141
Training step 320, accuracy 0.79798, cost 16.732, cost change 16.732
Training step 330, accuracy 0.79798, cost 16.6531, cost change 16.6531
Training step 340, accuracy 0.808081, cost 16.5772, cost change 16.5772
Training step 350, accuracy 0.818182, cost 16.5041, cost change 16.5041
Training step 360, accuracy 0.838384, cost 16.4336, cost change 16.4336
Training step 370, accuracy 0.838384, cost 16.3655, cost change 16.3655
Training step 380, accuracy 0.838384, cost 16.2997, cost change 16.2997
Training step 390, accuracy 0.838384, cost 16.2359, cost change 16.2359
Training step 400, accuracy 0.848485, cost 16.1741, cost change 16.1741
Training step 410, accuracy 0.848485, cost 16.1141, cost change 16.1141
Training step 420, accuracy 0.848485, cost 16.0558, cost change 16.0558
Training step 430, accuracy 0.858586, cost 15.9991, cost change 15.9991
Training step 440, accuracy 0.858586, cost 15.944, cost change 15.944
Training step 450, accuracy 0.858586, cost 15.8903, cost change 15.8903
Training step 460, accuracy 0.868687, cost 15.8379, cost change 15.8379
Training step 470, accuracy 0.878788, cost 15.7869, cost change 15.7869
Training step 480, accuracy 0.878788, cost 15.7371, cost change 15.7371
Training step 490, accuracy 0.878788, cost 15.6884, cost change 15.6884
Training step 500, accuracy 0.878788, cost 15.6409, cost change 15.6409
Training step 510, accuracy 0.878788, cost 15.5944, cost change 15.5944
Training step 520, accuracy 0.878788, cost 15.549, cost change 15.549
Training step 530, accuracy 0.888889, cost 15.5045, cost change 15.5045
Training step 540, accuracy 0.888889, cost 15.4609, cost change 15.4609
Training step 550, accuracy 0.89899, cost 15.4182, cost change 15.4182
Training step 560, accuracy 0.89899, cost 15.3764, cost change 15.3764
Training step 570, accuracy 0.89899, cost 15.3354, cost change 15.3354
Training step 580, accuracy 0.89899, cost 15.2952, cost change 15.2952
Training step 590, accuracy 0.909091, cost 15.2558, cost change 15.2558
Training step 600, accuracy 0.909091, cost 15.217, cost change 15.217
Training step 610, accuracy 0.909091, cost 15.179, cost change 15.179
Training step 620, accuracy 0.909091, cost 15.1417, cost change 15.1417
Training step 630, accuracy 0.909091, cost 15.105, cost change 15.105
Training step 640, accuracy 0.909091, cost 15.0689, cost change 15.0689
Training step 650, accuracy 0.909091, cost 15.0335, cost change 15.0335
Training step 660, accuracy 0.909091, cost 14.9987, cost change 14.9987
Training step 670, accuracy 0.909091, cost 14.9644, cost change 14.9644
Training step 680, accuracy 0.909091, cost 14.9307, cost change 14.9307
Training step 690, accuracy 0.909091, cost 14.8975, cost change 14.8975

现在，是时候看看我们训练好的模型在iris数据集上的表现了，让我们将训练好的模型与测试集进行测试：

# test the model against the test set
print("final accuracy on test set: %s" %str(sess.run(model_accuracy,
                                                    feed_dict={input_values: test_input_values,
                                                               output_values: test_target_values}))

Output:
final accuracy on test set: 0.9

在测试集上获得 0.9 的准确率真的很好，您可以通过更改 epochs 的数量尝试获得更好的结果。

摘要

在本章中，我们对神经网络进行了基本解释，并讨论了多层神经网络的需求。我们还涵盖了 TensorFlow 的计算图模型，并举了一些基本的例子，如线性回归和逻辑回归。

接下来，我们将通过更高级的例子，展示如何使用 TensorFlow 构建像手写字符识别之类的东西。我们还将解决传统机器学习中已经替代特征工程的核心架构工程思想。

第六章：深度前馈神经网络 - 实现数字分类

前馈神经网络（FNN）是一种特殊类型的神经网络，其中神经元之间的连接/链接不形成循环。因此，它不同于我们在本书后面将学习的其他神经网络架构（如递归神经网络）。FNN 是广泛使用的架构，也是最早和最简单的神经网络类型。

本章中，我们将讲解典型的前馈神经网络（FNN）架构，并使用 TensorFlow 库进行实现。掌握这些概念后，我们将通过一个实际的数字分类示例进行说明。这个示例的问题是，给定一组包含手写数字的图像，你如何将这些图像分类为 10 个不同的类别（0-9）？

本章将涵盖以下主题：

隐藏单元和架构设计
MNIST 数据集分析
数字分类 - 模型构建与训练

隐藏单元和架构设计

在下一节中，我们将回顾人工神经网络；它们在分类任务中表现良好，例如分类手写数字。

假设我们有如下所示的网络，参见图 1：

图 1：具有一个隐藏层的简单 FNN

如前所述，这个网络中最左侧的层被称为输入层，这一层内的神经元被称为输入神经元。最右侧的层或输出层包含输出神经元，或者在本例中，仅包含一个输出神经元。中间的层被称为隐藏层，因为这一层中的神经元既不是输入神经元，也不是输出神经元。术语“隐藏”可能听起来有些神秘——我第一次听到这个词时，觉得它一定有某种深奥的哲学或数学意义——但它实际上仅仅意味着既不是输入也不是输出。就这么简单。前面的网络只有一个隐藏层，但有些网络有多个隐藏层。例如，下面这个四层的网络就有两个隐藏层：

图 2：具有更多隐藏层的人工神经网络

输入层、隐藏层和输出层的架构非常简单明了。例如，我们通过一个实际的例子来看一下，如何判断一张手写图像是否包含数字 9。

首先，我们将输入图像的像素传递给输入层；例如，在 MNIST 数据集中，我们有单色图像。每一张图像的尺寸为 28×28，因此我们需要在输入层中有 28×28 = 784 个神经元来接收这个输入图像。

在输出层，我们只需要一个神经元，该神经元输出一个概率（或得分），表示该图像是否包含数字 9。例如，输出值大于 0.5 表示该图像包含数字 9，如果小于 0.5，则表示该输入图像不包含数字 9。

所以这种类型的网络，其中一个层的输出作为输入传递给下一层，称为 FNN（前馈神经网络）。这种层与层之间的顺序性意味着网络中没有循环。

MNIST 数据集分析

在这一部分，我们将亲自动手实现一个手写图像的分类器。这种实现可以被看作是神经网络的 Hello world!。

MNIST 是一个广泛使用的数据集，用于基准测试机器学习技术。该数据集包含一组手写数字，像这里展示的这些：

图 3：MNIST 数据集中的样本数字

所以，数据集包括手写图像及其对应的标签。

在这一部分，我们将基于这些图像训练一个基本的模型，目标是识别输入图像中的手写数字。

此外，您会发现我们可以通过非常少的代码行来完成这个分类任务，但这个实现的核心思想是理解构建神经网络解决方案的基本组件。此外，我们还将涵盖在此实现中神经网络的主要概念。

MNIST 数据

MNIST 数据托管在 Yann LeCun 的网站上 (yann.lecun.com/exdb/mnist/)。幸运的是，TensorFlow 提供了一些辅助函数来下载数据集，所以让我们先用以下两行代码下载数据集：

from tensorflow.examples.tutorials.mnist import input_data
mnist_dataset = input_data.read_data_sets("MNIST_data/", one_hot=True)

MNIST 数据分为三部分：55,000 个训练数据点（mnist.train），10,000 个测试数据点（mnist.test），和 5,000 个验证数据点（mnist.validation）。这种划分非常重要；在机器学习中，必须有独立的数据集，我们不能从这些数据中学习，以确保我们的学习结果具有泛化能力！

如前所述，每个 MNIST 样本有两个部分：一个手写数字的图像和它对应的标签。训练集和测试集都包含图像及其相应的标签。例如，训练图像是 mnist.train.images，训练标签是 mnist.train.labels。

每张图片的尺寸为 28 像素 x 28 像素。我们可以将其解释为一个包含数字的大数组：

图 4：MNIST 数字的矩阵表示（强度值）

为了将这张像素值矩阵输入到神经网络的输入层，我们需要将矩阵展平为一个包含 784 个值的向量。因此，数据集的最终形状将是一个 784 维的向量空间。

结果是 mnist.train.images 是一个形状为 (55000, 784) 的张量。第一个维度是图像列表的索引，第二个维度是每个图像中每个像素的索引。张量中的每个条目是一个特定图像中特定像素的像素强度，值在 0 到 1 之间：

图 5：MNIST 数据分析

如前所述，数据集中的每个图像都有一个对应的标签，范围从 0 到 9。

对于本实现，我们将标签编码为 one-hot 向量。One-hot 向量是一个除了表示该向量所代表的数字索引位置为 1 之外，其它位置全为 0 的向量。例如，3 将是 [0,0,0,1,0,0,0,0,0,0]。因此，mnist.train.labels 是一个形状为 (55000, 10) 的浮点数组：

图 6：MNIST 数据分析

数字分类 – 模型构建与训练

现在，让我们开始构建我们的模型。所以，我们的数据集有 10 个类别，分别是 0 到 9，目标是将任何输入图像分类为其中一个类别。我们不会仅仅给出输入图像属于哪个类别的硬性判断，而是将输出一个包含 10 个可能值的向量（因为我们有 10 个类别）。它将表示每个数字从 0 到 9 为输入图像的正确类别的概率。

例如，假设我们输入一个特定的图像。模型可能 70% 确定这个图像是 9，10% 确定这个图像是 8，依此类推。所以，我们将在这里使用 softmax 回归，它将产生介于 0 和 1 之间的值。

Softmax 回归有两个步骤：首先我们将输入属于某些类别的证据加总，然后将这些证据转换为概率。

为了统计某个图像属于特定类别的证据，我们对像素强度进行加权求和。如果某个像素强度高则反映该图像不属于该类别，则权重为负；如果它支持该图像属于该类别，则权重为正。

图 7 显示了模型为每个类别学到的权重。红色代表负权重，蓝色代表正权重：

图 7：模型为每个 MNIST 类别学到的权重

我们还添加了一些额外的证据，称为偏置。基本上，我们希望能够说某些事情在不依赖于输入的情况下更有可能。结果是，给定输入 x 时，类别 i 的证据为：

其中：

W[i] 是权重
b[i] 是类别 i 的偏置
j 是用来对输入图像 x 中的像素求和的索引。

然后，我们使用 softmax 函数将证据总和转换为我们的预测概率 y：

y = softmax(证据)

在这里，softmax 作为激活或连接函数，塑造了我们线性函数的输出形式，我们希望它是一个 10 类的概率分布（因为我们有 10 个可能的类，范围是 0 到 9）。你可以将其看作是将证据的统计数据转换为输入属于每个类的概率。它的定义是：

softmax(证据) = 归一化(exp(证据))

如果你展开这个方程，你会得到：

但通常更有帮助的是按第一种方式理解 softmax：对其输入进行指数运算，然后进行归一化。指数运算意味着多一个证据单位会使任何假设的权重指数级增长。反过来，减少一个证据单位意味着该假设的权重会减少。没有任何假设的权重会为零或负值。然后，softmax 对这些权重进行归一化，使它们的和为 1，形成一个有效的概率分布。

你可以把我们的 softmax 回归想象成以下的样子，尽管它会有更多的 x's。对于每个输出，我们计算 x's 的加权和，加入偏置，然后应用 softmax：

图 8：softmax 回归的可视化

如果我们将其写成方程式，我们得到：

图 9：softmax 回归的方程表示

我们可以使用向量表示法来处理这个过程。这意味着我们将其转换为矩阵乘法和向量加法。这对于计算效率和可读性非常有帮助：

图 10：softmax 回归方程的向量化表示

更简洁地，我们可以写成：

y = softmax(W[x] + b)

现在，让我们将其转换为 TensorFlow 可以使用的形式。

数据分析

那么，让我们开始实现我们的分类器。我们首先导入实现所需的包：

import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
import random as ran

接下来，我们将定义一些辅助函数，以便从我们下载的原始数据集中进行子集选择：

#Define some helper functions 
# to assign the size of training and test data we will take from MNIST dataset
def train_size(size):
    print ('Total Training Images in Dataset = ' + str(mnist_dataset.train.images.shape))
    print ('############################################')
    input_values_train = mnist_dataset.train.images[:size,:]
    print ('input_values_train Samples Loaded = ' + str(input_values_train.shape))
    target_values_train = mnist_dataset.train.labels[:size,:]
    print ('target_values_train Samples Loaded = ' + str(target_values_train.shape))
    return input_values_train, target_values_train

def test_size(size):
    print ('Total Test Samples in MNIST Dataset = ' + str(mnist_dataset.test.images.shape))
    print ('############################################')
    input_values_test = mnist_dataset.test.images[:size,:]
    print ('input_values_test Samples Loaded = ' + str(input_values_test.shape))
    target_values_test = mnist_dataset.test.labels[:size,:]
    print ('target_values_test Samples Loaded = ' + str(target_values_test.shape))
    return input_values_test, target_values_test

此外，我们还将定义两个辅助函数，用于显示数据集中的特定数字，或者甚至显示某个图像子集的平铺版本：

#Define a couple of helper functions for digit images visualization
def visualize_digit(ind):
    print(target_values_train[ind])
    target = target_values_train[ind].argmax(axis=0)
    true_image = input_values_train[ind].reshape([28,28])
    plt.title('Sample: %d Label: %d' % (ind, target))
    plt.imshow(true_image, cmap=plt.get_cmap('gray_r'))
    plt.show()

def visualize_mult_imgs_flat(start, stop):
    imgs = input_values_train[start].reshape([1,784])
    for i in range(start+1,stop):
        imgs = np.concatenate((imgs, input_values_train[i].reshape([1,784])))
    plt.imshow(imgs, cmap=plt.get_cmap('gray_r'))
    plt.show()

现在，让我们开始正式处理数据集。我们将定义我们希望从原始数据集中加载的训练和测试示例。

现在，我们将开始构建和训练我们的模型。首先，我们定义变量，指定我们希望加载的训练和测试示例的数量。目前，我们将加载所有数据，但稍后会更改这个值以节省资源：

input_values_train, target_values_train = train_size(55000)

Output:
Total Training Images in Dataset = (55000, 784)
############################################
input_values_train Samples Loaded = (55000, 784)
target_values_train Samples Loaded = (55000, 10)

所以现在，我们有一个包含 55,000 个手写数字样本的训练集，每个样本是 28×28 像素的图像，经过展平成为 784 维的向量。我们还拥有这些样本对应的标签，采用 one-hot 编码格式。

target_values_train数据是所有input_values_train样本的关联标签。在以下示例中，数组代表数字 7 的独热编码格式：

图 11：数字 7 的独热编码

所以让我们从数据集中随机选择一张图片并看看它是什么样子的，我们将使用之前的辅助函数来显示数据集中的随机数字：

visualize_digit(ran.randint(0, input_values_train.shape[0]))

Output:

图 12：display_digit方法的输出数字

我们还可以使用之前定义的辅助函数来可视化一堆展平后的图片。展平向量中的每个值代表一个像素的强度，因此可视化这些像素将是这样的：

visualize_mult_imgs_flat(0,400)

图 13：前 400 个训练样本

构建模型

到目前为止，我们还没有开始为这个分类器构建计算图。让我们先创建一个会负责执行我们将要构建的计算图的会话变量：

sess = tf.Session()

接下来，我们将定义我们模型的占位符，这些占位符将用于将数据传递到计算图中：

input_values = tf.placeholder(tf.float32, shape=[None, 784]

当我们在占位符的第一个维度指定None时，这意味着该占位符可以接受任意数量的样本。在这种情况下，我们的占位符可以接收任何数量的样本，每个样本有一个784的值。

现在，我们需要定义另一个占位符来传入图片标签。我们将在之后使用这个占位符来将模型的预测与图像的实际标签进行比较：

output_values = tf.placeholder(tf.float32, shape=[None, 10])

接下来，我们将定义weights和biases。这两个变量将成为我们网络的可训练参数，它们将是进行未知数据预测时所需的唯一两个变量：

weights = tf.Variable(tf.zeros([784,10]))
biases = tf.Variable(tf.zeros([10]))

我喜欢把这些weights看作是每个数字的 10 张备忘单。这类似于老师用备忘单来给多选考试打分。

现在我们将定义我们的 softmax 回归，它是我们的分类器函数。这个特殊的分类器叫做多项式逻辑回归，我们通过将数字的展平版本与权重相乘然后加上偏差来做出预测：

softmax_layer = tf.nn.softmax(tf.matmul(input_values,weights) + biases)

首先，让我们忽略 softmax，看看 softmax 函数内部的内容。matmul是 TensorFlow 用于矩阵乘法的函数。如果你了解矩阵乘法（en.wikipedia.org/wiki/Matrix_multiplication），你就会明白它是如何正确计算的，并且：

将导致一个由训练样本数（m） × 类别数（n）的矩阵：

图 13：简单的矩阵乘法。

你可以通过评估softmax_layer来确认这一点：

print(softmax_layer)
Output:
Tensor("Softmax:0", shape=(?, 10), dtype=float32)

现在，让我们用之前定义的计算图，使用训练集中的三个样本来进行实验，看看它是如何工作的。为了执行计算图，我们需要使用之前定义的会话变量。并且，我们需要使用tf.global_variables_initializer()来初始化变量。

现在，我们仅向计算图输入三个样本进行实验：

input_values_train, target_values_train = train_size(3)
sess.run(tf.global_variables_initializer())
#If using TensorFlow prior to 0.12 use:
#sess.run(tf.initialize_all_variables())
print(sess.run(softmax_layer, feed_dict={input_values: input_values_train}))

Output:

[[ 0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1]
 [ 0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1]
 [ 0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1  0.1]]

在这里，你可以看到模型对于输入的三个训练样本的预测结果。目前，模型还没有学到任何关于我们任务的东西，因为我们还没有经过训练过程，所以它只是输出每个数字为输入样本正确类别的 10% 概率。

如前所述，softmax 是一种激活函数，它将输出压缩到 0 到 1 之间，TensorFlow 对 softmax 的实现确保单个输入样本的所有概率加起来为 1。

让我们稍微实验一下 TensorFlow 的 softmax 函数：

sess.run(tf.nn.softmax(tf.zeros([4])))
sess.run(tf.nn.softmax(tf.constant([0.1, 0.005, 2])))

Output:
array([0.11634309, 0.10579926, 0.7778576 ], dtype=float32)

接下来，我们需要为这个模型定义损失函数，来衡量分类器在尝试为输入图像分配类别时的好坏。模型的准确度是通过比较数据集中实际的值与模型输出的预测值来计算的。

目标是减少实际值和预测值之间的误分类。

交叉熵的定义为：

其中：

y是我们预测的概率分布
*y'*是实际分布（带有数字标签的独热编码向量）

在某种粗略的意义上，交叉熵衡量了我们预测值在描述实际输入时的低效程度。

我们可以实现交叉熵函数：

model_cross_entropy = tf.reduce_mean(-tf.reduce_sum(output_values * tf.log(softmax_layer), reduction_indices=[1]))

这个函数对所有从 softmax_layer（其值在 0 到 1 之间）得到的预测取对数，并按元素逐个与示例的真实值 output_values 相乘（en.wikipedia.org/wiki/Hadamard_product_%28matrices%29）。如果每个值的 log 函数接近零，它将使值变成一个大负数（-np.log(0.01) = 4.6）；如果接近一，它将使值变成一个小负数（-np.log(0.99) = 0.1）：

图 15：Y = log(x) 的可视化

本质上，如果预测结果自信地错误，我们会用一个非常大的数字来惩罚分类器；如果预测结果自信地正确，我们则用一个非常小的数字来惩罚。

这里是一个简单的 Python 示例，展示了一个对数字为 3 的预测非常自信的 softmax 预测：

j = [0.03, 0.03, 0.01, 0.9, 0.01, 0.01, 0.0025,0.0025, 0.0025, 0.0025]

让我们创建一个值为 3 的数组标签作为真实值，以便与 softmax 函数进行比较：

k = [0,0,0,1,0,0,0,0,0,0]

你能猜到我们的损失函数给出的值是什么吗？你能看到 j 的对数如何用一个大的负数惩罚错误答案吗？试试这个来理解：

-np.log(j)
-np.multiply(np.log(j),k)

当它们全部加起来时，这将返回九个零和 0.1053 的值；我们可以认为这是一个很好的预测。注意当我们对实际上是 2 的预测做出同样的预测时会发生什么：

k = [0,0,1,0,0,0,0,0,0,0]
np.sum(-np.multiply(np.log(j),k))

现在，我们的cross_entropy函数给出了 4.6051，显示了一个严重惩罚的、预测不良的预测。由于分类器非常确信它是 3，而实际上是 2，因此受到了严重的惩罚。

接下来，我们开始训练我们的分类器。为了训练它，我们必须开发适当的 W 和 b 的值，以便给出尽可能低的损失。

现在，如果我们希望，我们可以为训练分配自定义变量。以下所有大写的值都可以更改和搞砸。事实上，我鼓励这样做！首先，使用这些值，然后注意当您使用太少的训练示例或学习率过高或过低时会发生什么：

input_values_train, target_values_train = train_size(5500)
input_values_test, target_values_test = test_size(10000)
learning_rate = 0.1
num_iterations = 2500

现在，我们可以初始化所有变量，以便它们可以被我们的 TensorFlow 图使用：

init = tf.global_variables_initializer()
#If using TensorFlow prior to 0.12 use:
#init = tf.initialize_all_variables()
sess.run(init)

接下来，我们需要使用梯度下降算法训练分类器。因此，我们首先定义我们的训练方法和一些用于测量模型准确性的变量。变量train将执行梯度下降优化器，选择一个学习率来最小化模型损失函数model_cross_entropy：

train = tf.train.GradientDescentOptimizer(learning_rate).minimize(model_cross_entropy)
model_correct_prediction = tf.equal(tf.argmax(softmax_layer,1), tf.argmax(output_values,1))
model_accuracy = tf.reduce_mean(tf.cast(model_correct_prediction, tf.float32))

模型训练

现在，我们将定义一个循环，它将迭代num_iterations次。对于每个循环，它都会运行训练，使用feed_dict从input_values_train和target_values_train中提供值。

为了计算准确性，它将测试模型对input_values_test中的未见数据的表现：

for i in range(num_iterations+1):
    sess.run(train, feed_dict={input_values: input_values_train, output_values: target_values_train})
    if i%100 == 0:
        print('Training Step:' + str(i) + ' Accuracy = ' + str(sess.run(model_accuracy, feed_dict={input_values: input_values_test, output_values: target_values_test})) + ' Loss = ' + str(sess.run(model_cross_entropy, {input_values: input_values_train, output_values: target_values_train})))

Output:
Training Step:0 Accuracy = 0.5988 Loss = 2.1881988
Training Step:100 Accuracy = 0.8647 Loss = 0.58029664
Training Step:200 Accuracy = 0.879 Loss = 0.45982164
Training Step:300 Accuracy = 0.8866 Loss = 0.40857208
Training Step:400 Accuracy = 0.8904 Loss = 0.37808096
Training Step:500 Accuracy = 0.8943 Loss = 0.35697535
Training Step:600 Accuracy = 0.8974 Loss = 0.34104997
Training Step:700 Accuracy = 0.8984 Loss = 0.32834956
Training Step:800 Accuracy = 0.9 Loss = 0.31782663
Training Step:900 Accuracy = 0.9005 Loss = 0.30886236
Training Step:1000 Accuracy = 0.9009 Loss = 0.3010645
Training Step:1100 Accuracy = 0.9023 Loss = 0.29417014
Training Step:1200 Accuracy = 0.9029 Loss = 0.28799513
Training Step:1300 Accuracy = 0.9033 Loss = 0.28240603
Training Step:1400 Accuracy = 0.9039 Loss = 0.27730304
Training Step:1500 Accuracy = 0.9048 Loss = 0.27260992
Training Step:1600 Accuracy = 0.9057 Loss = 0.26826677
Training Step:1700 Accuracy = 0.9062 Loss = 0.2642261
Training Step:1800 Accuracy = 0.9061 Loss = 0.26044932
Training Step:1900 Accuracy = 0.9063 Loss = 0.25690478
Training Step:2000 Accuracy = 0.9066 Loss = 0.2535662
Training Step:2100 Accuracy = 0.9072 Loss = 0.25041154
Training Step:2200 Accuracy = 0.9073 Loss = 0.24742197
Training Step:2300 Accuracy = 0.9071 Loss = 0.24458146
Training Step:2400 Accuracy = 0.9066 Loss = 0.24187621
Training Step:2500 Accuracy = 0.9067 Loss = 0.23929419

注意，损失在接近尾声时仍在减小，但我们的准确率略有下降！这表明我们仍然可以最小化我们的损失，从而在训练数据上最大化准确率，但这可能无助于预测用于测量准确性的测试数据。这也被称为过拟合（不具有泛化性）。使用默认设置，我们获得了约 91%的准确率。如果我想欺骗以获得 94%的准确率，我本可以将测试示例设置为 100。这显示了没有足够的测试示例可能会给您一个偏见的准确性感觉。

请记住，这种方式计算我们分类器的性能非常不准确。但是，出于学习和实验的目的，我们特意这样做了。理想情况下，当使用大型数据集进行训练时，您应该一次使用小批量的训练数据，而不是全部一起。

这是有趣的部分。现在我们已经计算出了我们的权重备忘单，我们可以用以下代码创建一个图表：

for i in range(10):
    plt.subplot(2, 5, i+1)
    weight = sess.run(weights)[:,i]
    plt.title(i)
    plt.imshow(weight.reshape([28,28]), cmap=plt.get_cmap('seismic'))
    frame = plt.gca()
    frame.axes.get_xaxis().set_visible(False)
    frame.axes.get_yaxis().set_visible(False)

图 15：我们权重的可视化从 0 到 9

上图显示了 0 到 9 的模型权重，这是我们分类器最重要的一部分。所有这些机器学习的工作都是为了找出最优的权重。一旦根据优化标准计算出这些权重，你就拥有了备忘单，并且可以轻松地利用学习到的权重找到答案。

学到的模型通过比较输入数字样本与红色和蓝色权重的相似度或差异来做出预测。红色越深，命中越好；白色表示中立，蓝色表示未命中。

现在，让我们使用备忘单，看看我们的模型在其上的表现：

input_values_train, target_values_train = train_size(1)
visualize_digit(0)

Output:
Total Training Images in Dataset = (55000, 784)
############################################
input_values_train Samples Loaded = (1, 784)
target_values_train Samples Loaded = (1, 10)
[0\. 0\. 0\. 0\. 0\. 0\. 0\. 1\. 0\. 0.]

让我们看看我们的 softmax 预测器：

answer = sess.run(softmax_layer, feed_dict={input_values: input_values_train})
print(answer)

上述代码会给我们一个 10 维向量，每一列包含一个概率：

[[2.1248012e-05 1.1646927e-05 8.9631692e-02 1.9201526e-02 8.2086492e-04
  1.2516821e-05 3.8538201e-05 8.5374612e-01 6.9188857e-03 2.9596921e-02]]

我们可以使用argmax函数来找出最有可能的数字作为我们输入图像的正确分类：

answer.argmax()

Output:
7

现在，我们从网络中得到了一个正确的分类结果。

让我们运用我们的知识定义一个辅助函数，能够从数据集中随机选择一张图像，并将模型应用于其上进行测试：

def display_result(ind):

    # Loading a training sample
    input_values_train = mnist_dataset.train.images[ind,:].reshape(1,784)
    target_values_train = mnist_dataset.train.labels[ind,:]

    # getting the label as an integer instead of one-hot encoded vector
    label = target_values_train.argmax()

    # Getting the prediction as an integer
    prediction = sess.run(softmax_layer, feed_dict={input_values: input_values_train}).argmax()
    plt.title('Prediction: %d Label: %d' % (prediction, label))
    plt.imshow(input_values_train.reshape([28,28]), cmap=plt.get_cmap('gray_r'))
    plt.show()

现在，试试看：

display_result(ran.randint(0, 55000))

Output:

我们再次得到了一个正确的分类结果！

总结

在本章中，我们介绍了用于数字分类任务的 FNN（前馈神经网络）的基本实现。我们还回顾了神经网络领域中使用的术语。

接下来，我们将构建一个更复杂的数字分类模型，使用一些现代最佳实践和技巧来提升模型的表现。

第七章：卷积神经网络简介

在数据科学中，卷积神经网络（CNN）是一种特定的深度学习架构，它使用卷积操作来提取输入图像的相关解释特征。CNN 层以前馈神经网络的方式连接，同时使用此卷积操作来模拟人类大脑在试图识别物体时的工作方式。个别皮层神经元对在一个限制区域内的刺激做出反应，这个区域被称为感受野。特别地，生物医学成像问题有时可能会很有挑战性，但在本章中，我们将看到如何使用 CNN 来发现图像中的模式。

本章将涵盖以下主题：

卷积操作
动机
CNN 的不同层
CNN 基本示例：MNIST 数字分类

卷积操作

CNN 在计算机视觉领域得到了广泛应用，并且它们在很多方面超越了我们一直在使用的传统计算机视觉技术。CNN 结合了著名的卷积操作和神经网络，因此得名卷积神经网络。因此，在深入探讨 CNN 的神经网络部分之前，我们将介绍卷积操作并了解它的工作原理。

卷积操作的主要目的是从图像中提取信息或特征。任何图像都可以看作是一个值矩阵，其中矩阵中的特定值组将形成一个特征。卷积操作的目的是扫描这个矩阵，尝试提取与该图像相关或具有解释性的特征。例如，考虑一个 5x5 的图像，其对应的强度或像素值显示为零和一：

图 9.1：像素值矩阵

并考虑以下 3 x 3 的矩阵：

图 9.2：像素值矩阵

我们可以使用 3 x 3 的卷积核对 5 x 5 的图像进行卷积，方法如下：

图 9.3：卷积操作。输出矩阵称为卷积特征或特征图

上述图可以总结如下。为了使用 3 x 3 的卷积核对原始 5 x 5 图像进行卷积，我们需要执行以下操作：

使用橙色矩阵扫描原始绿色图像，每次只移动 1 像素（步幅）
对于每个橙色图像的位置，我们在橙色矩阵和绿色矩阵中对应的像素值之间执行逐元素相乘操作
将这些逐元素相乘的结果加起来得到一个单一整数，这个整数将构成输出粉色矩阵中的单一值。

如前图所示，橙色的 3 x 3 矩阵每次只对原始绿色图像的一个部分进行操作（步幅），或者它每次只看到图像的一部分。

那么，让我们将前面的解释放到 CNN 术语的背景下：

橙色的 3 x 3 矩阵被称为核、特征检测器或滤波器。
输出的粉色矩阵，其中包含逐元素相乘的结果，称为特征图。

因为我们是通过核与原始输入图像中对应像素的逐元素相乘来获得特征图，所以改变核或滤波器的值每次都会生成不同的特征图。

因此，我们可能会认为，在卷积神经网络的训练过程中，我们需要自己确定特征检测器的值，但事实并非如此。CNN 在学习过程中自动确定这些值。所以，如果我们有更多的滤波器，就意味着我们可以从图像中提取更多的特征。

在进入下一部分之前，让我们介绍一些在 CNN 上下文中通常使用的术语：

步幅：我们之前简要提到了这个术语。一般来说，步幅是指我们在卷积输入矩阵时，特征检测器或滤波器在输入矩阵上移动的像素数。例如，步幅为 1 意味着每次移动一个像素，而步幅为 2 意味着每次移动两个像素。步幅越大，生成的特征图就越小。
零填充：如果我们想包含输入图像的边缘像素，那么部分滤波器将超出输入图像的范围。零填充通过在输入矩阵的边缘周围填充零来解决这个问题。

动机

传统的计算机视觉技术用于执行大多数计算机视觉任务，如物体检测和分割。尽管这些传统计算机视觉技术的性能不错，但始终无法接近实时使用的要求，例如自动驾驶汽车。2012 年，Alex Krizhevsky 推出了 CNN，凭借其在 ImageNet 竞赛中的突破性表现，将物体分类错误率从 26% 降至 15%。从那时起，CNN 被广泛应用，并且发现了不同的变种。它甚至在 ImageNet 竞赛中超越了人类分类错误，如下图所示：

图 9.4：随着时间推移的分类错误，其中人类级别的错误用红色标出

CNN 的应用

自从 CNN 在计算机视觉甚至自然语言处理的不同领域取得突破以来，大多数公司已经将这一深度学习解决方案集成到他们的计算机视觉生态系统中。例如，谷歌在其图像搜索引擎中使用该架构，Facebook 则在自动标记等功能中使用它：

图 9.5：典型的用于物体识别的 CNN 一般架构

CNN 之所以能取得突破，正是因为它们的架构，直观地使用卷积操作从图像中提取特征。稍后你会发现，这与人脑的工作方式非常相似。

CNN 的不同层

典型的 CNN 架构由多个执行不同任务的层组成，如上图所示。在本节中，我们将详细了解它们，并看到将所有这些层以特定方式连接起来的好处，这使得计算机视觉取得了这样的突破。

输入层

这是任何 CNN 架构中的第一层。所有后续的卷积层和池化层都期望输入以特定格式出现。输入变量将是张量，具有以下形状：

[batch_size, image_width, image_height, channels]

这里：

batch_size是从原始训练集中的一个随机样本，用于应用随机梯度下降时。
image_width是输入到网络中的图像宽度。
image_height是输入到网络中的图像高度。
channels是输入图像的颜色通道数。这个数字对于 RGB 图像可能是 3，对于二值图像则是 1。

例如，考虑我们著名的 MNIST 数据集。假设我们将使用 CNN 进行数字分类，使用这个数据集。

如果数据集由 28 x 28 像素的单色图像组成，如 MNIST 数据集，那么我们输入层所需的形状如下：

[batch_size, 28, 28, 1].

为了改变输入特征的形状，我们可以执行以下重塑操作：

input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

如你所见，我们已经将批量大小指定为-1，这意味着这个数字应根据特征中的输入值动态确定。通过这样做，我们将能够通过控制批量大小来微调我们的 CNN 模型。

作为重塑操作的示例，假设我们将输入样本分成五个一批，并且我们的特征["x"]数组将包含 3,920 个输入图像的values()，其中该数组的每个值对应于图像中的一个像素。对于这种情况，输入层将具有以下形状：

[5, 28, 28, 1]

卷积步骤

如前所述，卷积步骤得名于卷积操作。进行这些卷积步骤的主要目的是从输入图像中提取特征，然后将这些特征输入到线性分类器中。

在自然图像中，特征可能出现在图像的任何位置。例如，边缘可能出现在图像的中间或角落，因此堆叠一系列卷积步骤的整个目的是能够在图像的任何地方检测到这些特征。

在 TensorFlow 中定义卷积步骤非常简单。例如，如果我们想对输入层应用 20 个大小为 5x5 的滤波器，并使用 ReLU 激活函数，那么可以使用以下代码来实现：

conv_layer1 = tf.layers.conv2d(
 inputs=input_layer,
 filters=20,
 kernel_size=[5, 5],
 padding="same",
 activation=tf.nn.relu)

这个conv2d函数的第一个参数是我们在前面的代码中定义的输入层，它具有合适的形状，第二个参数是滤波器参数，指定要应用于图像的滤波器数量，滤波器数量越多，从输入图像中提取的特征就越多。第三个参数是kernel_size，表示滤波器或特征探测器的大小。padding 参数指定了使用零填充的方法，这里我们使用"same"来给输入图像的角落像素添加零填充。最后一个参数指定了应该应用于卷积操作输出的激活函数。

因此，在我们的 MNIST 示例中，输入张量将具有以下形状：

[batch_size, 28, 28, 1]

该卷积步骤的输出张量将具有以下形状：

[batch_size, 28, 28, 20]

输出张量的维度与输入图像相同，但现在我们有 20 个通道，表示应用了 20 个滤波器到输入图像。

引入非线性

在卷积步骤中，我们提到过将卷积步骤的输出传递给 ReLU 激活函数以引入非线性：

图 9.6: ReLU 激活函数

ReLU 激活函数将所有负的像素值替换为零，而将卷积步骤的输出传递给该激活函数的目的就是为了引入非线性，因为我们使用的数据通常是非线性的，这对训练过程非常有用。为了清楚地理解 ReLU 激活函数的好处，看看下面的图，它展示了卷积步骤的行输出及其经过修正后的版本：

图 9.7: 对输入特征图应用 ReLU 的结果

池化步骤

我们学习过程中的一个重要步骤是池化步骤，有时也叫做下采样或子采样步骤。这个步骤主要是为了减少卷积步骤输出的特征图（feature map）的维度。池化步骤的优点是，在减小特征图的大小的同时，保留了新版本中重要的信息。

下图展示了通过一个 2x2 滤波器和步幅 2 扫描图像，并应用最大池化操作的步骤。这种池化操作称为最大池化：

图 9.8：使用 2 x 2 窗口在经过卷积和 ReLU 操作后的修正特征图上进行最大池化操作的示例（来源：textminingonline.com/wp-content/…

我们可以使用以下代码行将卷积步骤的输出连接到池化层：

pool_layer1 = tf.layers.max_pooling2d(inputs=conv_layer1, pool_size=[2, 2], strides=2)

池化层接收来自卷积步骤的输入，形状如下：

[batch_size, image_width, image_height, channels]

例如，在我们的数字分类任务中，池化层的输入将具有以下形状：

[batch_size, 28, 28, 20]

池化操作的输出将具有以下形状：

[batch_size, 14, 14, 20]

在这个例子中，我们将卷积步骤的输出大小减少了 50%。这个步骤非常有用，因为它只保留了重要的信息，同时还减少了模型的复杂度，从而避免了过拟合。

全连接层

在堆叠了多个卷积和池化步骤之后，我们使用一个全连接层，在这个层中，我们将从输入图像中提取的高级特征输入到全连接层，以便利用这些特征进行实际的分类：

图 9.9：全连接层 - 每个节点都与相邻层的所有其他节点相连接

例如，在数字分类任务中，我们可以在卷积和池化步骤之后使用一个具有 1,024 个神经元和 ReLU 激活函数的全连接层来执行实际的分类。这个全连接层接受以下格式的输入：

[batch_size, features]

因此，我们需要重新调整或展平来自pool_layer2的输入特征图，以匹配这种格式。我们可以使用以下代码行来重新调整输出：

pool1_flat = tf.reshape(pool_layer1, [-1, 14 * 14 * 20])

在这个 reshape 函数中，我们使用-1表示批量大小将动态确定，并且pool_layer1输出中的每个示例将具有宽度为14、高度为14且有20个通道。

因此，这个重塑操作的最终输出将如下所示：

 [batch_size, 3136]

最后，我们可以使用 TensorFlow 的dense()函数来定义我们的全连接层，设定所需的神经元（单位）数量和最终的激活函数：

dense_layer = tf.layers.dense(inputs=pool1_flat, units=1024, activation=tf.nn.relu)

Logits 层

最后，我们需要 logits 层，它将接受全连接层的输出并生成原始的预测值。例如，在数字分类任务中，输出将是一个包含 10 个值的张量，每个值代表 0-9 类中的一个类别的分数。因此，让我们为数字分类示例定义这个 logits 层，其中我们只需要 10 个输出，并且使用线性激活函数，这是 TensorFlow 的dense()函数的默认值：

logits_layer = tf.layers.dense(inputs=dense_layer, units=10)

图 9.10：训练 ConvNet

这个 logits 层的最终输出将是一个具有以下形状的张量：

[batch_size, 10]

如前所述，模型的 logits 层将返回我们批次的原始预测值。但我们需要将这些值转换为可解释的格式：

输入样本 0-9 的预测类别。
每个可能类别的得分或概率。例如，样本属于类别 0 的概率是 1，依此类推。

图 9.11：CNN 不同层的可视化（来源：cs231n.github.io/assets/cnn/…

因此，我们的预测类别将是 10 个概率中最大值对应的类别。我们可以通过使用argmax函数如下获取这个值：

tf.argmax(input=logits_layer, axis=1)

记住，logits_layer的形状是这样的：

[batch_size, 10]

因此，我们需要沿着预测结果的维度（即索引为 1 的维度）找到最大值：

最后，我们可以通过对logits_layer的输出应用softmax激活函数来得到下一个值，该值表示每个目标类别的概率，将每个值压缩到 0 和 1 之间：

tf.nn.softmax(logits_layer, name="softmax_tensor")

CNN 基础示例 – MNIST 数字分类

在本节中，我们将通过使用 MNIST 数据集实现数字分类的完整 CNN 示例。我们将构建一个包含两个卷积层和全连接层的简单模型。

让我们先导入实现中所需的库：

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
from sklearn.metrics import confusion_matrix
import math

接下来，我们将使用 TensorFlow 的辅助函数下载并预处理 MNIST 数据集，如下所示：

from tensorflow.examples.tutorials.mnist import input_data
mnist_data = input_data.read_data_sets('data/MNIST/', one_hot=True)

Output:
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting data/MNIST/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

数据集被分为三个不重叠的集合：训练集、验证集和测试集。因此，让我们打印出每个集合中的图像数量：

print("- Number of images in the training set:\t\t{}".format(len(mnist_data.train.labels)))
print("- Number of images in the test set:\t\t{}".format(len(mnist_data.test.labels)))
print("- Number of images in the validation set:\t{}".format(len(mnist_data.validation.labels)))

- Number of images in the training set: 55000
- Number of images in the test set: 10000
- Number of images in the validation set: 5000

图像的实际标签以独热编码格式存储，所以我们有一个包含 10 个值的数组，除了表示该图像所属类别的索引外，其余值均为零。为了后续使用，我们需要将数据集中的类别号转换为整数：

mnist_data.test.cls_integer = np.argmax(mnist_data.test.labels, axis=1)

让我们定义一些已知的变量，以便在后续实现中使用：

# Default size for the input monocrome images of MNIST
image_size = 28

# Each image is stored as vector of this size.
image_size_flat = image_size * image_size

# The shape of each image
image_shape = (image_size, image_size)

# All the images in the mnist dataset are stored as a monocrome with only 1 channel
num_channels = 1

# Number of classes in the MNIST dataset from 0 till 9 which is 10
num_classes = 10

接下来，我们需要定义一个辅助函数，用于从数据集中绘制一些图像。这个辅助函数将以九个子图的网格方式绘制图像：

def plot_imgs(imgs, cls_actual, cls_predicted=None):
    assert len(imgs) == len(cls_actual) == 9

    # create a figure with 9 subplots to plot the images.
    fig, axes = plt.subplots(3, 3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # plot the image at the ith index
        ax.imshow(imgs[i].reshape(image_shape), cmap='binary')

        # labeling the images with the actual and predicted classes.
        if cls_predicted is None:
            xlabel = "True: {0}".format(cls_actual[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_actual[i], cls_predicted[i])

        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)

    plt.show()

让我们从测试集绘制一些图像，看看它们长什么样：

# Visualizing 9 images form the test set.
imgs = mnist_data.test.images[0:9]

# getting the actual classes of these 9 images
cls_actual = mnist_data.test.cls_integer[0:9]

#plotting the images
plot_imgs(imgs=imgs, cls_actual=cls_actual)

这是输出：

图 9.12：来自 MNIST 数据集的一些示例的可视化

构建模型

现在，到了构建模型核心部分的时候。计算图包含我们在本章前面提到的所有层。我们将从定义一些用于定义特定形状变量并随机初始化它们的函数开始：

def new_weights(shape):
    return tf.Variable(tf.truncated_normal(shape, stddev=0.05))

def new_biases(length):
    return tf.Variable(tf.constant(0.05, shape=[length]))

现在，让我们定义一个函数，该函数负责根据某些输入层、输入通道、滤波器大小、滤波器数量以及是否使用池化参数来创建一个新的卷积层：

def conv_layer(input, # the output of the previous layer.
                   input_channels, 
                   filter_size, 
                   filters, 
                   use_pooling=True): # Use 2x2 max-pooling.

    # preparing the accepted shape of the input Tensor.
    shape = [filter_size, filter_size, input_channels, filters]

    # Create weights which means filters with the given shape.
    filters_weights = new_weights(shape=shape)

    # Create new biases, one for each filter.
    filters_biases = new_biases(length=filters)

    # Calling the conve2d function as we explained above, were the strides parameter
    # has four values the first one for the image number and the last 1 for the input image channel
    # the middle ones represents how many pixels the filter should move with in the x and y axis
    conv_layer = tf.nn.conv2d(input=input,
                         filter=filters_weights,
                         strides=[1, 1, 1, 1],
                         padding='SAME')

    # Adding the biase to the output of the conv_layer.
    conv_layer += filters_biases

    # Use pooling to down-sample the image resolution?
    if use_pooling:
        # reduce the output feature map by max_pool layer
        pool_layer = tf.nn.max_pool(value=conv_layer,
                               ksize=[1, 2, 2, 1],
                               strides=[1, 2, 2, 1],
                               padding='SAME')

    # feeding the output to a ReLU activation function.
    relu_layer = tf.nn.relu(pool_layer)

    # return the final results after applying relu and the filter weights
    return relu_layer, filters_weights

如前所述，池化层生成一个 4D 张量。我们需要将这个 4D 张量展平为 2D 张量，以便传递到全连接层：

def flatten_layer(layer):
    # Get the shape of layer.
    shape = layer.get_shape()

    # We need to flatten the layer which has the shape of The shape [num_images, image_height, image_width, num_channels]
    # so that it has the shape of [batch_size, num_features] where number_features is image_height * image_width * num_channels

    number_features = shape[1:4].num_elements()

    # Reshaping that to be fed to the fully connected layer
    flatten_layer = tf.reshape(layer, [-1, number_features])

    # Return both the flattened layer and the number of features.
    return flatten_layer, number_features

该函数创建一个全连接层，假设输入是一个 2D 张量：

def fc_layer(input, # the flatten output.
                 num_inputs, # Number of inputs from previous layer
                 num_outputs, # Number of outputs
                 use_relu=True): # Use ReLU on the output to remove negative values

    # Creating the weights for the neurons of this fc_layer
    fc_weights = new_weights(shape=[num_inputs, num_outputs])
    fc_biases = new_biases(length=num_outputs)

    # Calculate the layer values by doing matrix multiplication of
    # the input values and fc_weights, and then add the fc_bias-values.
    fc_layer = tf.matmul(input, fc_weights) + fc_biases

    # if use RelU parameter is true
    if use_relu:
        relu_layer = tf.nn.relu(fc_layer)
        return relu_layer

    return fc_layer

在构建网络之前，让我们定义一个占位符用于输入图像，其中第一维是None，表示可以输入任意数量的图像：

input_values = tf.placeholder(tf.float32, shape=[None, image_size_flat], name='input_values')

正如我们之前提到的，卷积步骤期望输入图像的形状是 4D 张量。因此，我们需要将输入图像调整为以下形状：

[num_images, image_height, image_width, num_channels]

所以，让我们重新调整输入值的形状以匹配这种格式：

input_image = tf.reshape(input_values, [-1, image_size, image_size, num_channels])

接下来，我们需要定义另一个占位符用于实际类别的值，格式为独热编码：

y_actual = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_actual')

此外，我们还需要定义一个占位符来保存实际类别的整数值：

y_actual_cls_integer = tf.argmax(y_actual, axis=1)

所以，让我们从构建第一个卷积神经网络开始：

conv_layer_1, conv1_weights = \
        conv_layer(input=input_image,
                   input_channels=num_channels,
                   filter_size=filter_size_1,
                   filters=filters_1,
                   use_pooling=True)

让我们检查第一卷积层将产生的输出张量的形状：

conv_layer_1

Output:
<tf.Tensor 'Relu:0' shape=(?, 14, 14, 16) dtype=float32>

接下来，我们将创建第二个卷积神经网络，并将第一个网络的输出作为输入：

conv_layer_2, conv2_weights = \
         conv_layer(input=conv_layer_1,
                   input_channels=filters_1,
                   filter_size=filter_size_2,
                   filters=filters_2,
                   use_pooling=True)

此外，我们需要再次检查第二卷积层输出张量的形状。形状应该是(?, 7, 7, 36)，其中?表示任意数量的图像。

接下来，我们需要将 4D 张量展平，以匹配全连接层所期望的格式，即 2D 张量：

flatten_layer, number_features = flatten_layer(conv_layer_2)

我们需要再次检查展平层输出张量的形状：

flatten_layer

Output:
<tf.Tensor 'Reshape_1:0' shape=(?, 1764) dtype=float32>

接下来，我们将创建一个全连接层，并将展平层的输出传递给它。我们还将把全连接层的输出输入到 ReLU 激活函数中，然后再传递给第二个全连接层：

fc_layer_1 = fc_layer(input=flatten_layer,
                         num_inputs=number_features,
                         num_outputs=fc_num_neurons,
                         use_relu=True)

让我们再次检查第一个全连接层输出张量的形状：

fc_layer_1

Output:
<tf.Tensor 'Relu_2:0' shape=(?, 128) dtype=float32>

接下来，我们需要添加另一个全连接层，它将接收第一个全连接层的输出，并为每张图像生成一个大小为 10 的数组，表示每个目标类别是正确类别的得分：

fc_layer_2 = fc_layer(input=fc_layer_1,
                         num_inputs=fc_num_neurons,
                         num_outputs=num_classes,
                         use_relu=False)

fc_layer_2

Output:
<tf.Tensor 'add_3:0' shape=(?, 10) dtype=float32>

接下来，我们将对第二个全连接层的得分进行归一化，并将其输入到softmax激活函数中，这样它会将值压缩到 0 到 1 之间：

y_predicted = tf.nn.softmax(fc_layer_2)

然后，我们需要使用 TensorFlow 的argmax函数选择具有最高概率的目标类别：

y_predicted_cls_integer = tf.argmax(y_predicted, axis=1)

成本函数

接下来，我们需要定义我们的性能衡量标准，即交叉熵。如果预测的类别是正确的，那么交叉熵的值为 0：

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=fc_layer_2,
                                                        labels=y_actual)

接下来，我们需要将之前步骤得到的所有交叉熵值求平均，以便得到一个单一的性能衡量标准：

model_cost = tf.reduce_mean(cross_entropy)

现在，我们有了一个需要优化/最小化的成本函数，因此我们将使用AdamOptimizer，它是一种优化方法，类似于梯度下降，但更为先进：

model_optimizer = tf.train.AdamOptimizer(learning_rate=1e-4).minimize(model_cost)

性能衡量标准

为了显示输出，让我们定义一个变量来检查预测的类别是否等于真实类别：

model_correct_prediction = tf.equal(y_predicted_cls_integer, y_actual_cls_integer)

通过将布尔值转换并求平均，计算模型的准确性，进而统计正确分类的数量：

model_accuracy = tf.reduce_mean(tf.cast(model_correct_prediction, tf.float32))

模型训练

让我们通过创建一个会负责执行先前定义的计算图的会话变量来启动训练过程：

session = tf.Session()

此外，我们需要初始化到目前为止已定义的变量：

session.run(tf.global_variables_initializer())

我们将按批次输入图像，以避免出现内存溢出错误：

train_batch_size = 64

在开始训练过程之前，我们将定义一个辅助函数，该函数通过遍历训练批次来执行优化过程：

# number of optimization iterations performed so far
total_iterations = 0

def optimize(num_iterations):
    # Update globally the total number of iterations performed so far.
    global total_iterations

    for i in range(total_iterations,
                   total_iterations + num_iterations):

        # Generating a random batch for the training process
        # input_batch now contains a bunch of images from the training set and
        # y_actual_batch are the actual labels for the images in the input batch.
        input_batch, y_actual_batch = mnist_data.train.next_batch(train_batch_size)

        # Putting the previous values in a dict format for Tensorflow to automatically assign them to the input
        # placeholders that we defined above
        feed_dict = {input_values: input_batch,
                           y_actual: y_actual_batch}

        # Next up, we run the model optimizer on this batch of images
        session.run(model_optimizer, feed_dict=feed_dict)

        # Print the training status every 100 iterations.
        if i % 100 == 0:
            # measuring the accuracy over the training set.
            acc_training_set = session.run(model_accuracy, feed_dict=feed_dict)

            #Printing the accuracy over the training set
            print("Iteration: {0:>6}, Accuracy Over the training set: {1:>6.1%}".format(i + 1, acc_training_set))

    # Update the number of iterations performed so far
    total_iterations += num_iterations

我们还将定义一些辅助函数，帮助我们可视化模型的结果，并查看哪些图像被模型误分类：

def plot_errors(cls_predicted, correct):

    # cls_predicted is an array of the predicted class number of each image in the test set.

    # Extracting the incorrect images.
    incorrect = (correct == False)

    # Get the images from the test-set that have been
    # incorrectly classified.
    images = mnist_data.test.images[incorrect]

    # Get the predicted classes for those incorrect images.
    cls_pred = cls_predicted[incorrect]

    # Get the actual classes for those incorrect images.
    cls_true = mnist_data.test.cls_integer[incorrect]

    # Plot 9 of these images
    plot_imgs(imgs=imgs[0:9],
                cls_actual=cls_actual[0:9],
                cls_predicted=cls_predicted[0:9])

我们还可以绘制预测结果与实际类别的混淆矩阵：

def plot_confusionMatrix(cls_predicted):

 # cls_predicted is an array of the predicted class number of each image in the test set.

 # Get the actual classes for the test-set.
 cls_actual = mnist_data.test.cls_integer

 # Generate the confusion matrix using sklearn.
 conf_matrix = confusion_matrix(y_true=cls_actual,
 y_pred=cls_predicted)

 # Print the matrix.
 print(conf_matrix)

 # visualizing the confusion matrix.
 plt.matshow(conf_matrix)

 plt.colorbar()
 tick_marks = np.arange(num_classes)
 plt.xticks(tick_marks, range(num_classes))
 plt.yticks(tick_marks, range(num_classes))
 plt.xlabel('Predicted class')
 plt.ylabel('True class')

 # Showing the plot
 plt.show()

最后，我们将定义一个辅助函数，帮助我们测量训练模型在测试集上的准确率：

# measuring the accuracy of the trained model over the test set by splitting it into small batches
test_batch_size = 256

def test_accuracy(show_errors=False,
                        show_confusionMatrix=False):

    #number of test images 
    number_test = len(mnist_data.test.images)

    # define an array of zeros for the predicted classes of the test set which
    # will be measured in mini batches and stored it.
    cls_predicted = np.zeros(shape=number_test, dtype=np.int)

    # measuring the predicted classes for the testing batches.

    # Starting by the batch at index 0.
    i = 0

    while i < number_test:
        # The ending index for the next batch to be processed is j.
        j = min(i + test_batch_size, number_test)

        # Getting all the images form the test set between the start and end indices
        input_images = mnist_data.test.images[i:j, :]

        # Get the acutal labels for those images.
        actual_labels = mnist_data.test.labels[i:j, :]

        # Create a feed-dict with the corresponding values for the input placeholder values
        feed_dict = {input_values: input_images,
                     y_actual: actual_labels}

        cls_predicted[i:j] = session.run(y_predicted_cls_integer, feed_dict=feed_dict)

        # Setting the start of the next batch to be the end of the one that we just processed j
        i = j

    # Get the actual class numbers of the test images.
    cls_actual = mnist_data.test.cls_integer

    # Check if the model predictions are correct or not
    correct = (cls_actual == cls_predicted)

    # Summing up the correct examples
    correct_number_images = correct.sum()

    # measuring the accuracy by dividing the correclty classified ones with total number of images in the test set.
    testset_accuracy = float(correct_number_images) / number_test

    # showing the accuracy.
    print("Accuracy on Test-Set: {0:.1%} ({1} / {2})".format(testset_accuracy, correct_number_images, number_test))

    # showing some examples form the incorrect ones.
    if show_errors:
        print("Example errors:")
        plot_errors(cls_predicted=cls_predicted, correct=correct)

    # Showing the confusion matrix of the test set predictions
    if show_confusionMatrix:
        print("Confusion Matrix:")
        plot_confusionMatrix(cls_predicted=cls_predicted)

让我们打印出未经任何优化的模型在测试集上的准确率：

test_accuracy()

Output:
Accuracy on Test-Set: 4.1% (410 / 10000)

让我们通过运行一次优化过程来感受优化过程如何增强模型的能力，将图像正确分类到对应的类别：

optimize(num_iterations=1)
Output:
Iteration: 1, Accuracy Over the training set: 4.7%
test_accuracy()
Output
Accuracy on Test-Set: 4.4% (437 / 10000)

现在，让我们开始进行一项长时间的优化过程，进行 10,000 次迭代：

optimize(num_iterations=9999) #We have already performed 1 iteration.

在输出的最后，您应该看到与以下输出非常接近的结果：

Iteration: 7301, Accuracy Over the training set: 96.9%
Iteration: 7401, Accuracy Over the training set: 100.0%
Iteration: 7501, Accuracy Over the training set: 98.4%
Iteration: 7601, Accuracy Over the training set: 98.4%
Iteration: 7701, Accuracy Over the training set: 96.9%
Iteration: 7801, Accuracy Over the training set: 96.9%
Iteration: 7901, Accuracy Over the training set: 100.0%
Iteration: 8001, Accuracy Over the training set: 98.4%
Iteration: 8101, Accuracy Over the training set: 96.9%
Iteration: 8201, Accuracy Over the training set: 100.0%
Iteration: 8301, Accuracy Over the training set: 98.4%
Iteration: 8401, Accuracy Over the training set: 98.4%
Iteration: 8501, Accuracy Over the training set: 96.9%
Iteration: 8601, Accuracy Over the training set: 100.0%
Iteration: 8701, Accuracy Over the training set: 98.4%
Iteration: 8801, Accuracy Over the training set: 100.0%
Iteration: 8901, Accuracy Over the training set: 98.4%
Iteration: 9001, Accuracy Over the training set: 100.0%
Iteration: 9101, Accuracy Over the training set: 96.9%
Iteration: 9201, Accuracy Over the training set: 98.4%
Iteration: 9301, Accuracy Over the training set: 98.4%
Iteration: 9401, Accuracy Over the training set: 100.0%
Iteration: 9501, Accuracy Over the training set: 100.0%
Iteration: 9601, Accuracy Over the training set: 98.4%
Iteration: 9701, Accuracy Over the training set: 100.0%
Iteration: 9801, Accuracy Over the training set: 100.0%
Iteration: 9901, Accuracy Over the training set: 100.0%
Iteration: 10001, Accuracy Over the training set: 98.4%

现在，让我们检查模型在测试集上的泛化能力：

test_accuracy(show_errors=True,
                    show_confusionMatrix=True)

Output:
Accuracy on Test-Set: 92.8% (9281 / 10000)
Example errors:

图 9.13：测试集上的准确率

Confusion Matrix:
[[ 971    0    2    2    0    4    0    1    0    0]
 [   0 1110    4    2    1    2    3    0   13    0]
 [  12    2  949   15   16    3    4   17   14    0]
 [   5    3   14  932    0   34    0   13    6    3]
 [   1    2    3    0  931    1    8    2    3   31]
 [  12    1    4   13    3  852    2    1    3    1]
 [  21    4    5    2   18   34  871    1    2    0]
 [   1   10   26    5    5    0    0  943    2   36]
 [  16    5   10   27   16   48    5   13  815   19]
 [  12    5    5   11   38   10    0   18    3  907]]

以下是输出结果：

图 9.14：测试集的混淆矩阵。

有趣的是，实际上在使用基础卷积网络时，我们在测试集上的准确率几乎达到了 93%。这个实现和结果展示了一个简单的卷积网络能做些什么。

总结

在本章中，我们介绍了卷积神经网络（CNN）的直觉和技术细节，同时也了解了如何在 TensorFlow 中实现一个基本的 CNN 架构。

在下一章中，我们将展示一些更先进的架构，这些架构可以用于检测数据科学家广泛使用的图像数据集中的物体。我们还将看到卷积神经网络（CNN）的魅力，它们是如何通过首先识别物体的基本特征，再在这些特征基础上构建更高级的语义特征，从而模拟人类对物体的理解，最终得出对物体的分类的。尽管这个过程在人类大脑中发生得非常迅速，但它实际上是我们识别物体时的运作方式。