1 强化学习与深度学习的关系

强化学习的问题可以拆分成两类问题，即预测和控制。预测的主要目的是根据环境的状态和动作来预测状态价值和动作价值，而控制的主要目的是根据状态价值和动作价值来选择动作。换句话说，预测主要是告诉我们当前状态下采取什么动作比较好，而控制则是按照某种方式决策。就好比军师与主公的关系，军师提供他认为最佳的策略，而主公则决定是否采纳这个策略。

深度学习就是用来提高强化学习中预测的效果的，因为深度学习本身就是一个目前预测和分类效果俱佳的工具。比如 Q-learning 的 𝑄 表就完全可以用神经网络来拟合。注意，深度学习只是一种非常广泛的应用，但并不是强化学习的必要条件，也可以是一些传统的预测模型，例如决策树、贝叶斯模型等等，因此读者在研究相关问题时需要充分打开思路。类似地，在控制问题中，也可以利用深度学习或者其他的方法来提高性能，例如结合进化算法来提高强化学习的探索能力。

从训练模式上来看，深度学习和强化学习，尤其是结合了深度学习的深度强化学习，都是基于大量的样本来对相应算法进行迭代更新并且达到最优的，这个过程我们称之为训练。但与另外两者不同的是，强化学习是在交互中产生样本的，是一个产生样本、算法更新、再次产生样本、再次算法更新的动态循环训练过程，而不是一个准备样本、算法更新的静态训练过程。

这本质上还是跟要解决的问题不同有关，强化学习解决的是序列决策问题，而深度学习解决的是“打标签”问题，即给定一张图片，我们需要判断这张图片是猫还是狗，这里的猫和狗就是标签，当然也可以让算法自动打标签，这就是监督学习与无监督学习的区别。而强化学习解决的是“打分数”问题，即给定一个状态，我们需要判断这个状态是好还是坏，这里的好和坏就是分数。当然，这只是一个比喻，实际上强化学习也可以解决“打标签”问题，只不过这个标签是一个连续的值，而不是离散的值，比如我们可以给定一张图片，然后判断这张图片的美观程度，这里的美观程度就是一个连续的值，而不是离散的值。

如图 1 所示，除了训练生成模型之外，强化学习相当于在深度学习的基础上增加了一条回路，即继续与环境交互产生样本。相信学过控制系统的读者很快会意识到，这个回路就是一个典型的反馈系统机制，模型的输出一开始并不能达到预期的值，因此通过动态地不断与环境交互来产生一些反馈信息，从而训练出一个更好的模型。

图 1 深度学习与强化学习示例

2.逻辑回归

简单介绍完梯度下降之后，我们就可以继续介绍一些模型了，现在是逻辑回归。注意，虽然逻辑回归名字中带有回归，但是它是用来解决分类问题的，而不是回归问题（即预测问题）。在分类问题中，我们的目标是预测样本的类别，而不是预测一个连续的值。例如，我们要预测一封邮件是否是垃圾邮件，这就是一个二分类问题，通常输出 0 和 1 等离散的数字来表示对应的类别。在形式上，逻辑回归和线性回归非常相似，如图 2 所示，就是在线性模型的后面增加一个 sigmoid 函数，我们一般称之为激活函数。

图 2 逻辑回归结构

sigmoid 函数定义为:

如图 3 所示，sigmoid 函数可以将输入的任意实数映射到 (0,1) 的区间内，对其输出的值进行判断，例如小于 0.5 我们认为预测的是类别 0，反之是类别 1 ，这样一来通过梯度下降来求解模型参数就可以用于实现二分类问题了。注意，虽然逻辑回归只是在线性回归模型基础上增加了一个激活函数，但两个模型是完全不同的，包括损失函数等等。线性回归的损失函数是均方差损失，而逻辑回归模型一般是交叉熵损失，这两种损失函数在深度学习和深度强化学习中都很常见。

图 3 sigmoid 函数图像

逻辑回归的主要优点在于增加了模型的非线性能力，同时模型的参数也比较容易求解，但是它也有一些缺点，例如它的非线性能力还是比较弱的，而且它只能解决二分类问题，不能解决多分类问题。在实际应用中，我们一般会将多个二分类问题组合成一个多分类问题，例如将 sigmoid 函数换成 softmax 回归函数等。

其实，逻辑回归的模型结构已经跟生物神经网络的最小单位神经元很相似了。如图 4 所示，我们知道神经元之间是通过生物电信号来传递信息的，在每个神经元的末端会有一个叫做突触的结构，会根据信号的不同来激活不同的受体并传递给下一个神经元。当然，每个神经元也会同时接收来自不同神经元的信号并通过细胞核处理，人工神经网络中这个处理过程就相当于线性加权处理，即 𝑤𝑇𝑥, 然后通过激活函数来判断是否激活。

图 4 生物神经网络与人工神经网络的对比

同时，逻辑回归这类模型的结构也比较灵活多变，可以通过横向堆叠的形式来增加模型的复杂度，例如增加隐藏层等，这样就能解决更复杂的问题，这就是接下来要讲的神经网络模型。并且，我们可以认为逻辑回归就是一个最简单的人工神经网络模型。

3.全连接网络

如图 5 所示，将线性层横向堆叠起来，前一层网络的所有神经元的输出都会输入到下一层的所有神经元中，这样就可以得到一个全连接网络。其中，每个线性层的输出都会经过一个激活函数（图中已略去），这样就可以增加模型的非线性能力。

图 5 全连接网络

我们把这样的网络叫做全连接网络（fully connected network），也称作多层感知机（，multi-layer perceptron，MLP），是最基础的深度神经网络模型。

4 卷积神经网络 (以下引用英文)

Originally developed by Yann LeCun decades ago, better known as CNNs (ConvNets) are one of the state of the art, Artificial Neural Network design architecture, which has proven its effectiveness in areas such as image recognition and classification. The Basic Principle behind the working of CNN is the idea of Convolution, producing filtered Feature Maps stacked over each other.

A convolutional neural network consists of several layers. Implicit explanation about each of these layers is given below.

1. Convolution Layer (Conv Layer)

The Conv layer is the core building block of a Convolutional Neural Network. The primary purpose of Conv layer is to extract features from the input image.

Fig 1 Convolution Mechanism

The Conv Layer parameters consist of a set of learnable filters (kernels or feature detector). Filters are used for recognizing patterns throughout the entire input image. Convolution works by sliding the filter over the input image and along the way we take the dot product between the filter and chunks of the input image.

2. Pooling Layer (Sub-sampling or Down-sampling)

Pooling layer reduce the size of feature maps by using some functions to summarize sub-regions, such as taking the average or the maximum value. Pooling works by sliding a window across the input and feeding the content of the window to a pooling function.

Fig.2 Max-Pooling and Average-Pooling

The purpose of pooling is to reduce the number of parameters in our network (hence called down-sampling) and to make learned features more robust by making it more invariant to scale and orientation changes.

3. ReLU Layer

ReLU stands for Rectified Linear Unit and is a non-linear operation. ReLU is an element wise operation (applied per pixel) and replaces all negative pixel values in the feature map by zero.

Fig. 3 ReLU Layer

The purpose of ReLU is to introduce non-linearity in our ConvNet, since most of the real-world data we would want our ConvNet to learn would be non-linear.

Other non linear functions such as tanh or sigmoid can also be used instead of ReLU, but ReLU has been found to perform better in most cases.

4. Fully Connected Layer

The Fully Connected layer is configured exactly the way its name implies: it is fully connected with the output of the previous layer. A fully connected layer takes all neurons in the previous layer (be it fully connected, pooling, or convolutional) and connects it to every single neuron it has.

Fig. 4 Fully Connected Layer

Adding a fully-connected layer is also a cheap way of learning non-linear combinations of these features. Most of the features learned from convolutional and pooling layers may be good, but combinations of those features might be even better.

循环神经网络(以下引用英文)

Recurrent neural networks is a type of deep learning-oriented algorithm, which follows a sequential approach. In neural networks, we always assume that each input and output is independent of all other layers. These type of neural networks are called recurrent because they perform mathematical computations in sequential manner.

Consider the following steps to train a recurrent neural network −

Step 1 − Input a specific example from dataset.

Step 2 − Network will take an example and compute some calculations using randomly initialized variables.

Step 3 − A predicted result is then computed.

Step 4 − The comparison of actual result generated with the expected value will produce an error.

Step 5 − To trace the error, it is propagated through same path where the variables are also adjusted.

Step 6 − The steps from 1 to 5 are repeated until we are confident that the variables declared to get the output are defined properly.

Step 7 − A systematic prediction is made by applying these variables to get new unseen input.

The schematic approach of representing recurrent neural networks is described below −

引用参考

datawhalechina.github.io/joyrl-book/… qffc.uic.edu.cn/home/conten… qffc.uic.edu.cn/home/conten…

深度强化学习基础-深度学习基础