通过LSTM 做时间序列的预测

完整体现了神经网络训练模型的全过程，

1. 读取数据：

2. 查看数据

import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style("darkgrid")
plt.figure(figsize = (15,9))
plt.plot(training_set[['Passengers']])

plt.xticks(range(0,training_set.shape[0],20), training_set['Month'].loc[::20], rotation=45)  # x 轴 

plt.title("airline-passengers",fontsize=18, fontweight='bold')
plt.xlabel('Date',fontsize=18)
plt.ylabel('Passengers',fontsize=18)
plt.show()

3. 数据的归一化

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler(feature_range=(-1, 1))
scaler_data = scaler.fit_transform(training_set['Passengers'].values.reshape(-1,1))

4. 构造自己的 dataset

from torch.utils.data import Dataset, DataLoader,random_split
class MyDataset(Dataset): 
    # 传入  数据，
    def __init__(self, data_source,seq_length):
        self.data_source = data_source  # 输入的数据
        self.seq = seq_length       # 数据的尺度， 
         
    
    # 根据索引 返回 x 和 y  也叫 输入特征 和对应的标签 
    # 初步逻辑是 使用前三个数据，预测第四个数据 
    def __getitem__(self, index): 
        
        x=self.data_source[index : index + self.seq ] 
        y= self.data_source[ index + self.seq ] 
        
        x_Tensor = torch.from_numpy(x).type(torch.Tensor)
        y_Tensor = torch.from_numpy(x).type(torch.Tensor)
        return ( x_Tensor, y_Tensor  )
    
    # 返回数据集的长度
    def __len__(self):  
        return (len(self.data_source) - self.seq )

5. 加载数据集合读取数据

这里的时间序列模型方法是用前三个数据预测第四个数据，也就是说 x = [0:3] y = [3] x : (3,1)
y : (1,1) 的格式

# Define dataset 
my_dataset = MyDataset(data_source=scaler_data ,seq_length=3)

data_all_loader = DataLoader(dataset=my_dataset, # 传递数据集
                          batch_size=5 , #一个小批量容量是多少
                          shuffle=True , # 数据集顺序是否要打乱，一般是要的。测试数据集一般没必要
                          num_workers=0) # 需要几个进程来一次性读取这个小批量数据

查看训练的源数据

这里虽然数据量小，依然采用了大量数据加载的方式一个batch的加载，一次输入 5个数据

6. 划分数据集和测试集

使用dataloader 加载数据集

batch_size=5
train_loader = DataLoader(
        train_data, batch_size=batch_size, shuffle=True, num_workers=0
    )
test_loader = DataLoader(
    test_data, batch_size=batch_size, shuffle=True, num_workers=0
    )

构建模型 LSTM

input_dim = 1    # 输入的 x 的纬度 ( 特征项 )  输入数据的特征维数 
hidden_dim = 32    #  隐藏层的大小   LSTM中隐层的维度 
num_layers = 2   #   循环神经网络的层数 
output_dim = 1  #  输出的纬度 
num_epochs = 100  # 训练的次数 


# 1. 模型的 实例化 
model = LSTM(input_dim=input_dim, hidden_dim=hidden_dim, output_dim=output_dim, num_layers=num_layers)
#  2. 设置损失函数 
criterion = torch.nn.MSELoss()
# 3. 设置 学习率
learning_rate = 0.01
# 4. 设置 优化器 
optimiser = torch.optim.Adam(model.parameters(), lr=learning_rate)

训练模型



import time

hist = np.zeros(num_epochs)
start_time = time.time()

for epoch in range(num_epochs):
    for batch_index, (batch_x_train, batch_y_train) in enumerate(train_loader):
         #  已经 是 tensor 格式 
        y_train_pred = model(batch_x_train)
        #print( batch_y_train.shape , y_train_pred.shape)  # torch.Size([5, 3, 1]) torch.Size([5, 3, 1])
        loss = criterion(y_train_pred, batch_y_train)
       # print("Epoch ", t, "MSE: ", loss.item())
        hist[ epoch ] = loss.item()
        optimiser.zero_grad()
        loss.backward()
        optimiser.step()
 
        
    if epoch % 50 == 0:
        print('epoch {}, loss {}'.format(epoch, loss.item()))
    
training_time = time.time()-start_time
print("Training time: {}".format(training_time))

模型结果可视化

使用测试集数据进行验证

y_pred=[]
y_test=[]
for batch_index, (batch_x_test, batch_y_test) in enumerate(test_loader):
         #  已经 是 tensor 格式 
        
    y_test_pred = model(batch_x_test)
    y_pred.append(y_test_pred)
    y_test.append(batch_y_test)


#  测试集 预测的 y 结果 
y_p_all=[i.detach().numpy().reshape(-1).tolist()     for i  in y_pred ]
y_p_all_list=sum(y_p_all,[] )
len(y_p_all_list)

# 测试集 实际的 y 的值 
y_t_all=[i.detach().numpy().reshape(-1).tolist()     for i  in y_test ]
y_t_all_list=sum(y_t_all,[] )
len(y_t_all_list)

将归一化的数据还原回去进行比较

predict = pd.DataFrame(scaler.inverse_transform(      np.array(y_p_all_list).reshape(-1,1)    ))  # 预测值 
original = pd.DataFrame(scaler.inverse_transform(      np.array(y_t_all).reshape(-1,1)         ))  # 真实值

import seaborn as sns
sns.set_style("darkgrid")    

fig = plt.figure()
fig.subplots_adjust(hspace=0.2, wspace=0.2)

plt.subplot(1, 2, 1)
ax = sns.lineplot(x = predict.index, y = predict[0], label="Training Prediction (LSTM)", color='tomato')
ax = sns.lineplot(x = original.index, y = original[0], label="Data original", color='royalblue')

ax.set_title('data value', size = 14, fontweight='bold')
ax.set_xlabel("Days", size = 14)
ax.set_ylabel("value y ", size = 14)
ax.set_xticklabels('', size=10)


plt.subplot(1, 2, 2)
ax = sns.lineplot(data=hist, color='royalblue')
ax.set_xlabel("Epoch", size = 14)
ax.set_ylabel("Loss", size = 14)
ax.set_title("Training Loss", size = 14, fontweight='bold')
fig.set_figheight(6)
fig.set_figwidth(16)

最后贴上 LSTM 的架构图

blog.csdn.net/baidu_38963… lstm原理原文链接

代码和数据已上传 github 欢迎交流。github.com/jevy146/tim…

时间序列-LSTM-pytorch

通过LSTM 做时间序列的 预测

1. 读取数据：

2. 查看数据

3. 数据的归一化

4. 构造自己的 dataset

5. 加载数据集合 读取数据

查看 训练的源数据

6. 划分数据集 和测试集

使用dataloader 加载数据集

构建模型 LSTM

训练模型

模型结果可视化

使用测试集 数据进行验证

将 归一化的数据 还原 回去 进行比较

最后贴上 LSTM 的 架构图

通过LSTM 做时间序列的预测

5. 加载数据集合读取数据

查看训练的源数据

6. 划分数据集和测试集

使用测试集数据进行验证

将归一化的数据还原回去进行比较

最后贴上 LSTM 的架构图