·  阅读 518

# [开发技巧]·深度学习使用生成器加速数据读取与训练简明教程（TensorFlow，pytorch，keras）

## 1.问题描述

• 占用太大内存，我们在训练网络时，一般采取minibatch的方法，没必要一下读取很多数据在使用切片选取一部分。
• 花费更长时间，我们生成包含所有数据的数组时，会去读取每个元素，所有的时间在累加在一起，很耗时，此时神经网络也没有在训练，这样会导致总体的时间加长很多。

## 2.编程实战

2.1生成一些假数据用于演示

``````import numpy as np
import math
data = np.array([[x*10,x] for x in range(16)])

print(data)

``````[[  0   0]
[ 10   1]
[ 20   2]
[ 30   3]
[ 40   4]
[ 50   5]
[ 60   6]
[ 70   7]
[ 80   8]
[ 90   9]
[100  10]
[110  11]
[120  12]
[130  13]
[140  14]
[150  15]]

2.2构建生成器

``````def xs_gen(data,batch_size):
lists = data
num_batch = math.ceil(len(lists) / batch_size)    # 确定每轮有多少个batch
for i in range(num_batch):
batch_list = lists[i * batch_size : i * batch_size + batch_size]
np.random.shuffle(batch_list)
batch_x = np.array([x for x in batch_list[:,0]])
batch_y = np.array([y for y in batch_list[:,1]])

yield batch_x, batch_y

2.3演示输出

``````if __name__ == "__main__":

#data_gen = xs_gen(data,5)
for x,y in xs_gen(data,5):
print("item",x,y)
for x,y in xs_gen(data,5):
print("item",x,y)

``````item [30 20 10  0 40] [3 2 1 0 4]
item [50 70 80 90 60] [5 7 8 9 6]
item [110 120 140 100 130] [11 12 14 10 13]
item [150] [15]
item [ 0 30 20 10 40] [0 3 2 1 4]
item [60 80 90 70 50] [6 8 9 7 5]
item [130 100 110 120 140] [13 10 11 12 14]
item [150] [15]

2.4改进的生成器函数

``````def xs_gen_pro(data,batch_size):
lists = data
num_batch = math.ceil(len(lists) / batch_size)    # 确定每轮有多少个batch
for i in range(num_batch):
if(i==0):
np.random.shuffle(lists)
batch_list = lists[i * batch_size : i * batch_size + batch_size]
np.random.shuffle(batch_list)
batch_x = np.array([x for x in batch_list[:,0]])
batch_y = np.array([y for y in batch_list[:,1]])

yield batch_x, batch_y

``````def xs_gen_pro(data,batch_size):
lists = data
num_batch = math.ceil(len(lists) / batch_size)    # 确定每轮有多少个batch
np.random.shuffle(lists)
for i in range(num_batch):

batch_list = lists[i * batch_size : i * batch_size + batch_size]
np.random.shuffle(batch_list)
batch_x = np.array([x for x in batch_list[:,0]])
batch_y = np.array([y for y in batch_list[:,1]])

yield batch_x, batch_y

``````item [50 30 20 90 80] [5 3 2 9 8]
item [ 60   0 100 110  40] [ 6  0 10 11  4]
item [120  10 140 130 150] [12  1 14 13 15]
item [70] [7]
item [120  90  70  80 130] [12  9  7  8 13]
item [ 10 150 100   0  50] [ 1 15 10  0  5]
item [140  30  60  20 110] [14  3  6  2 11]
item [40] [4]

## 如何在深度学习应用生成器

### 2.1如何在TensorFlow，pytorch应用生成器

``````for e in Epochs:
for x,y in xs_gen():
train(x,y)

### 2.1如何在keras应用生成器

``````def xs_gen_keras(data,batch_size):
lists = data
num_batch = math.ceil(len(lists) / batch_size)    # 确定每轮有多少个batch
while True:
np.random.shuffle(lists)
for i in range(num_batch):

batch_list = lists[i * batch_size : i * batch_size + batch_size]
np.random.shuffle(batch_list)
batch_x = np.array([x for x in batch_list[:,0]])
batch_y = np.array([y for y in batch_list[:,1]])

yield batch_x, batch_y

keras使用生成器训练

``````train_iter = xs_gen_keras()
val_iter = xs_gen_keras()
model.fit_generator(
generator=train_iter,
steps_per_epoch=Lens1//Batch_size,
epochs=10,
validation_data = val_iter,
nb_val_samples = Lens2//Batch_size,
)