1Tensorflow 基础构架

1.1处理结构

1.1.1计算图纸

Tensorflow 首先要定义神经网络的结构, 然后再把数据放入结构当中去运算和 training.

因为TensorFlow是采用数据流图（data flow graphs）来计算, 所以首先我们得创建一个数据流图, 然后再将我们的数据（数据以张量(tensor)的形式存在）放在数据流图中计算. 节点（Nodes）在图中表示数学操作,图中的线（edges）则表示在节点间相互联系的多维数据数组, 即张量（tensor). 训练模型时tensor会不断的从数据流图中的一个节点flow到另一节点, 这就是TensorFlow名字的由来.

1.1.2Tensor 张量意义

张量（Tensor):

张量有多种. 零阶张量为纯量或标量 (scalar) 也就是一个数值. 比如 [1]
一阶张量为向量 (vector), 比如一维的 [1, 2, 3]
二阶张量为矩阵 (matrix), 比如二维的 [[1, 2, 3],[4, 5, 6],[7, 8, 9]]
以此类推, 还有三阶三维的 …

1.2例子1

Tensorflow 是非常重视结构的, 我们得建立好了神经网络的结构, 才能将数字放进去, 运行这个结构.

这个例子简单的阐述了 tensorflow 当中如何用代码来运行我们搭建的结构.

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import numpy as np
#tensorflow是一个数据流图
#先创建结构，再输入数据

#1创建数据
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data*0.1 + 0.3 #0.1weight,0.1biase


######sart 创建tensorflow结构######
#设定初始值
Weights = tf.Variable(tf.random_uniform([1], -1.0, 1.0))#维度为[1],从-1~1取随机值
biases = tf.Variable(tf.zeros([1])) #一维的0
y = Weights*x_data + biases
#计算误差
loss = tf.reduce_mean(tf.square(y-y_data))
#反向传递误差
#learning_rate=0.5
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
#初始化所有之前定义的Variable
init = tf.global_variables_initializer()
######end 创建tensorflow结构######

#激活结构init
sess = tf.Session()
sess.run(init) #指向init起点,用 Session 来执行 init 初始化步骤

#训练
for step in range(201):
    sess.run(train) #指向train,用 Session 来 run 每一次 training 的数据.
    if step%20 == 0:
        print(step,sess.run(Weights),sess.run(biases))




#训练结果
# 0 [0.09017247] [0.4566294]
# 20 [0.08102754] [0.31115255]
# 40 [0.09524692] [0.302794]
# 60 [0.09880923] [0.30069998]
# 80 [0.09970169] [0.30017537]
# 100 [0.09992526] [0.30004394]
# 120 [0.09998128] [0.300011]
# 140 [0.09999532] [0.30000275]
# 160 [0.09999885] [0.3000007]
# 180 [0.09999971] [0.3000002]
# 200 [0.09999991] [0.30000007]

1.3 session会话控制

Session 是 Tensorflow 为了控制,和输出文件的执行的语句. 运行 session.run() 可以获得你要得知的运算结果, 或者是你所要运算的部分.

#Session 会话控制
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

matrix1 = tf.constant([[3,3,] ])#1×2
matrix2 = tf.constant( [[2],
                       [2]])#2×1
product = tf.matmul(matrix1,matrix2)#matrix multiplpy np.dot(m1,m2)

#因为 product 不是直接计算的步骤, 所以我们会要使用 Session 来激活 product 并得到计算结果. 
#有两种形式使用会话控制 Session 。

#method 1需要人工closs
sess = tf.Session()
result = sess.run(product)
print(result) #[[12]]
sess.close()

#method 2 自动closs 类似文件open的操作
with tf.Session() as sess:
    result2 = sess.run(product)
    print(result2) #[[12]]

1.4 variable变量

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

#定义tensorflow中的变量
# xx = tf.Variable(value,name = '')
state = tf.Variable(0,name='counter')#值为0的变量
print(state)#无用功，必须用sess.run
# <tf.Variable 'counter:0' shape=() dtype=int32_ref>
print(state.name)
# counter:0

one = tf.constant(1)#常量

# 定义加法步骤 (注: 此步并没有直接计算)
#要sess.run(new_value)才执行了加法计算
new_value = tf.add(state,one)
# assign：将 State 更新成 new_value
update = tf.assign(state,new_value)

#！定义了变量以后,一定要定义初始化
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)#使用变量前还需要激活init
    for i in range(3):
        sess.run(update)
        print(sess.run(state))
# 1
# 2
# 3

小结:

tf的变量定义的语法: xx = tf.Variable(value,name = '')
变量定义 - > 变量初始化 - >激活初始化 ->使用变量（如print）

1.5placeholder传入值

placeholder 是 Tensorflow 中的占位符，暂时储存变量.

Tensorflow 如果想要从外部传入data, 那就需要用到 tf.placeholder(), 然后以字典的形式传输数据 sess.run(***, feed_dict={input: **}).

placeholder不算变量，所以不用定义init，不用激活init。

tf.placeholder(tf.float32,[None,])中[None,]怎么理解？

[None,2]代表形状是行不限，列数为2

为什么要用placeholder？

Tensorflow的设计理念称之为计算流图，在编写程序时，首先构筑整个系统的graph，代码并不会直接生效，这一点和python的其他数值计算库（如Numpy等）不同，graph为静态的，类似于docker中的镜像。然后，在实际的运行时，启动一个session，程序才会真正的运行。这样做的好处就是：避免反复地切换底层程序实际运行的上下文，tensorflow帮你优化整个系统的代码。我们知道，很多python程序的底层为C语言或者其他语言，执行一行脚本，就要切换一次，是有成本的，tensorflow通过计算流图的方式，帮你优化整个session需要执行的代码，还是很有优势的。

所以placeholder()函数是在神经网络构建graph的时候在模型中的占位，此时并没有把要输入的数据传入模型，它只会分配必要的内存。等建立session，在会话中，运行模型的时候通过feed_dict()函数向占位符喂入数据。

链接：zhuanlan.zhihu.com/p/78432879 来源：知乎著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

#placeholder传入值
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

# tf.placeholder(type,结构(矩阵行列)) #type一般float32
# 不是变量
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)

output = tf.multiply(input1,input2)#multiply乘法

#接下来, 传值的工作交给了 sess.run() , 需要传入的值放在了feed_dict={}
with tf.Session() as sess:
    res = sess.run(output,feed_dict={input1:[7],input2:[2]})
    print(res)
    #[14.]

1.6激励函数Activation Function

引入

激励函数是为了解决我们日常生活中不能用线性方程所概括的问题

激励函数的目的是为了弥补线性模型表达能力不足，从而加入的非线性因素

我们可以把整个网络简化成这样一个式子. Y = Wx, W 就是我们要求的参数, y 是预测值, x 是输入值. 用这个式子, 我们很容易就能描述刚刚的那个线性问题, 因为 W 求出来可以是一个固定的数

如何描述非线性问题呢?

Y= AF(Wx)

AF : relu, sigmoid, tanh

使得输出结果 y 也有了非线性的特征

可以创造自己的激励函数来处理自己的问题, 不过要确保的是这些激励函数必须是可以微分的, 因为在 backpropagation 误差反向传递的时候, 只有这些可微分的激励函数才能把误差传递回去.

激活函数有两个作用，

第一是因为实际数据会有噪声，为了让噪声对下一层的影响较小，采用激活函数可以抑制边缘奇异性较大的数据，

第二个作用就是对前一层输出值进行约束，这就好比人一样，我们的身高在1-3米之间，加入计算中计算出了一个非常夸张的数，就可以用激励函数约束（一般训练初时病态比较严重）在1-3米之间，常用的双极型函数就是把数据都约束在-1到1的超球体内。

如何选择激活函数？

当神经网络层只有两三层, 不是很多的时候, 对于隐藏层, 使用任意的激励函数都可以,不会有特别大的影响。

层数很多的时候，就得精挑细选，避免梯度爆炸, 梯度消失的问题。

在卷积神经网络 Convolutional neural networks 的卷积层中, 推荐的激励函数是 relu.

在循环神经网络中 recurrent neural networks, 推荐的是 tanh

2建造神经网络__例子2

2.1添加层 def add_layer()

神经层里常见的参数通常有weights、biases和激励函数。

它有四个参数：输入值、输入的大小、输出的大小和激励函数.

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()

def add_layer(inputs,in_size,out_size,activation_function=None):
    Weights = tf.Variable(tf.random_normal([in_size,out_size]))
    #随机变量(normal distribution)会比全部为0要好很多
    biases = tf.Variable(tf.zeros([1,out_size])+0.1)
    #biases的推荐值不为0，所以我们这里是在0向量的基础上又加了0.1
    Wx_plus_b = tf.matmul(inputs,Weights) + biases
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs

2.2完整的神经网络代码_例子2

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import numpy as np

def add_layer(inputs,in_size,out_size,activation_function=None):
    Weights = tf.Variable(tf.random_normal([in_size,out_size]))
    #随机变量(normal distribution)会比全部为0要好很多
    biases = tf.Variable(tf.zeros([1,out_size])+0.1)
    #biases的推荐值不为0，所以我们这里是在0向量的基础上又加了0.1
    Wx_plus_b = tf.matmul(inputs,Weights) + biases
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs


#           架构
#          input   hidden  output
#  神经元数量：1      10       1

#1creat data
x_data = np.linspace(-1,1,300)[:,np.newaxis]
noise = np.random.normal(0,0.05,x_data.shape).astype(np.float32)#.normal(mear,方差,size)
y_data = np.square(x_data) - 0.5 + noise

#利用占位符定义我们所需的神经网络的输入。
# tf.placeholder()就是代表占位符，
# 这里的None代表无论输入有多少都可以，因为输入只有一个‘特征’，所以这里是1。
xs = tf.placeholder(tf.float32,[None,1])#一维的数据
#[ [1] [2] ] reduction_indices=[1],跨列压成0维 [3]
ys = tf.placeholder(tf.float32,[None,1])

#2隐藏层
l1= add_layer(xs,1,10,activation_function=tf.nn.relu)
#3输出层
pridiction = add_layer(l1,10,1,activation_function=None)

#4求loss
loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - pridiction),
                                    reduction_indices=[1]))#见补充问题解释reduction_indices=[1,0]ok? no

#5train
train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

#定义变量初始化
init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    for i in range(2000):
        sess.run(train,feed_dict={xs:x_data,ys:y_data})
        if i%200 == 0:
            print(sess.run(loss,feed_dict={xs:x_data,ys:y_data}))
  


# loss变化:
# 0.20815682
# 0.0036451495
# 0.003249791
# 0.003177435
# 0.0031459925
# 0.0031128605
# 0.0030276207
# 0.0029023623
# 0.0027435538
# 0.0026034769

2.3结果可视化matplotlib.pyplot

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import numpy as np

##可视化新添加##
import matplotlib.pyplot as plt
##可视化新添加##

def add_layer(inputs, in_size, out_size, activation_function=None):
    Weights = tf.Variable(tf.random_normal([in_size, out_size]))
    # 随机变量(normal distribution)会比全部为0要好很多
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    # biases的推荐值不为0，所以我们这里是在0向量的基础上又加了0.1
    Wx_plus_b = tf.matmul(inputs, Weights) + biases
    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    return outputs


#           架构
#          input   hidden  output
#  神经元数量：1      10       1

# creat data
x_data = np.linspace(-1, 1, 300)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape).astype(np.float32)  # .normal(mear,方差,size)
y_data = np.square(x_data) - 0.5 + noise

# 利用占位符定义我们所需的神经网络的输入。
# tf.placeholder()就是代表占位符，
# 这里的None代表无论输入有多少都可以，因为输入只有一个特征，所以这里是1。
xs = tf.placeholder(tf.float32, [None, 1])  # 一维的数据
# [ [1] [2] ] reduction_indices=[1],跨列压成0维 [3]
ys = tf.placeholder(tf.float32, [None, 1])

# 隐藏层
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
# 输出层
prediction = add_layer(l1, 10, 1, activation_function=None)

loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                                    reduction_indices=[1]))  # 见补充问题解释reduction_indices=[1,0]ok? no
train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# 定义变量初始化
init = tf.global_variables_initializer()

##可视化新添加##
#plot the real data
fig = plt.figure()#生成一个图片界面
ax = fig.add_subplot(1,1,1)#不断的add新的plot,编号为(1,1,1)
ax.scatter(x_data,y_data)#scatter生成点
plt.ion()
plt.show()#.show后程序会暂停，用了.ion后则程序可以继续
##可视化新添加##

with tf.Session() as sess:
    sess.run(init)
    for i in range(200000):
        sess.run(train, feed_dict={xs: x_data, ys: y_data})
        if i % 2 == 0:
            #print(sess.run(loss, feed_dict={xs: x_data, ys: y_data}))

            ##可视化新添加##
            #每生成新的线就把之前的给remove掉，这样好看，用try即使还没定义出现错误了也可以pass掉
            try:
                ax.lines.remove(lines[0])
            except Exception:
                pass
            prediction_value=sess.run(prediction,feed_dict={xs: x_data, ys: y_data})
            lines=ax.plot(x_data,prediction_value,'r-',lw=5)#红色、宽度为5的线
            plt.pause(0.4)
            ##可视化新添加##

2.4优化器 optimizer

优化器就是更新权重W的算法/公式

Gradient Descent（Batch Gradient Descent，BGD）

梯度下降法是最原始，也是最基础的算法。它将所有的数据集都载入，计算它们所有的梯度，然后执行决策。（即沿着梯度相反的方向更新权重）

优点是在凸函数能收敛到最小值。但显而易见的是，这方法计算量太大。假如我们的数据集很大的话，普通的GPU是完全执行不来的。还有一点，它逃不出鞍点，也容易收敛到局部最小值（也就是极小值）。

Stochastic Gradient Descent

随机梯度下降法相比较BGD，其实就是计算梯度时根据的数据不同。SGD根据的是一整个数据集的随机一部分.它更新的速度比较频繁。因为我们随机选取的小批量数据(mini-batch)并不是太多，所以计算的过程也并不是很复杂。相比起BGD，我们在相同的时间里更新的次数多很多，也自然能更快的收敛。

Stochastic Gradient Descent (SGD)
Momentum
AdaGrad
RMSProp
Adam

3tensorboard可视化工具

Graph显示:

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
import numpy as np
def add_layer(inputs, in_size, out_size, activation_function=None):
    with tf.name_scope('layer'):
        with tf.name_scope('weights'):
            Weights = tf.Variable(tf.random_normal([in_size, out_size]),name='W')
        with tf.name_scope('biases'):
            biases = tf.Variable(tf.zeros([1, out_size]) + 0.1,name='b')
        with tf.name_scope('Wx_plus_b'):
            Wx_plus_b = tf.add(tf.matmul(inputs, Weights),biases)
        #activation_function默认有自己的board图层
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b)
        return outputs


#           架构
#          input   hidden  output
#  神经元数量：1      10       1

# creat data
x_data = np.linspace(-1, 1, 300)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape).astype(np.float32)  # .normal(mear,方差,size)
y_data = np.square(x_data) - 0.5 + noise

with tf.name_scope('inputs'):
    xs = tf.placeholder(tf.float32, [None, 1],name='x_input')  # 一维的数据
    # [ [1] [2] ] reduction_indices=[1],跨列压成0维 [3]
    ys = tf.placeholder(tf.float32, [None, 1],name='x_input')

# 隐藏层
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
# 输出层
prediction = add_layer(l1, 10, 1, activation_function=None)

with tf.name_scope('loss'):
    loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                                        reduction_indices=[1]))  # 见补充问题解释reduction_indices=[1,0]ok? no
with tf.name_scope('train'):
    train = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

# 定义变量初始化
init = tf.global_variables_initializer()

sess = tf.Session()

sess.run(init)

#tensorboard配置
writer = tf.summary.FileWriter("logs",sess.graph)

(base) C:~\learning_python>tensorboard --logdir='logs/'
2020-02-26 14:38:39.039790: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library
'cudart64_101.dll'; dlerror: cudart64_101.dll not found
2020-02-26 14:38:39.046017: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not
have a GPU set up on your machine.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.1.0 at http://localhost:6006/ (Press CTRL+C to quit)

4Classification 分类学习（mnist手写字）

import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

def add_layer(inputs, in_size, out_size, activation_function=None):
    with tf.name_scope('layer'):
        with tf.name_scope('weights'):
            Weights = tf.Variable(tf.random_normal([in_size, out_size]),name='W')
        with tf.name_scope('biases'):
            biases = tf.Variable(tf.zeros([1, out_size]) + 0.1,name='b')
        with tf.name_scope('Wx_plus_b'):
            Wx_plus_b = tf.add(tf.matmul(inputs, Weights),biases)
        #activation_function默认有自己的board图层
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b)
        return outputs

def compute_accuracy(v_xs,v_ys):
    global prediction
    y_pre = sess.run(prediction,feed_dict={xs:v_xs})
    correct_pridiction = tf.equal(tf.argmax(y_pre,1),tf.argmax(v_ys,1))#比较结果和数据的最大值对应的数字是否相同
    accuracy = tf.reduce_mean(tf.cast(correct_pridiction,tf.float32))
    result = sess.run(accuracy,fee_dict={xs:v_xs,ys:v_ys})
    return result

#define placeholder for inputs to nn
xs = tf.placeholder(tf.float32,[None,784])
#[None,784]None:数量随意不规定，784：大小为784 28×28
ys = tf.placeholder(tf.float32,[None,10])

#add output layer
prediction = add_layer(xs,784,10,activation_function=tf.nn.softmax)#分类问题常用softmax

#loss函数（即最优化目标函数）选用交叉熵函数。
#交叉熵用来衡量预测值和真实值的相似程度，如果完全相同，它们的交叉熵等于零。
cross_entropy = tf.reduce_mean(-tf.reduce_sum(ys*tf.log(prediction),
                                              reduction_indices=[1]))
#train方法（最优化算法）采用梯度下降法。
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

sess = tf.Session()
sess.run(tf.global_variables_initializer())

for i in range(1000):
    batch_xs,batch_ys =mnist.train.next_batch(100)#batch训练,每次只取100张图片，免得数据太多训练太慢
    sess.run(train_step,fee_dict={xs:batch_xs,ys:batch_ys})
    if i % 50 == 0:
        print(compute_accuracy(
            mnist.test.images,mnist.test.labels))#观看test数据的准确率

补充问题

Tensorflow 的reduce_sum()函数到底是什么意思

详情见此

reduction_indices:

tensorflow学习笔记