Machine-Learning-Mastery-深度学习时间序列教程-五-Machine Learning Maste

Machine Learning Mastery 深度学习时间序列教程（五）

原文：Machine Learning Mastery

协议：CC BY-NC-SA 4.0

如何开发用于多步时间序列预测的卷积神经网络

原文： machinelearningmastery.com/how-to-develop-convolutional-neural-networks-for-multi-step-time-series-forecasting/

鉴于智能电表的兴起以及太阳能电池板等发电技术的广泛采用，可提供大量的用电数据。

该数据代表了多变量时间序列的功率相关变量，而这些变量又可用于建模甚至预测未来的电力消耗。

与其他机器学习算法不同，卷积神经网络能够自动学习序列数据的特征，支持多变量数据，并且可以直接输出向量用于多步预测。因此，已经证明一维 CNN 表现良好，甚至在挑战性序列预测问题上实现了最先进的结果。

在本教程中，您将了解如何为多步时间序列预测开发一维卷积神经网络。

完成本教程后，您将了解：

如何开发 CNN 用于单变量数据的多步时间序列预测模型。
如何开发多变量数据的多通道多步时间序列预测模型。
如何开发多元数据的多头多步时间序列预测模型。

让我们开始吧。

How to Develop Convolutional Neural Networks for Multi-Step Time Series Forecasting

如何开发用于多步时间序列预测的卷积神经网络照片由 Banalities ，保留一些权利。

教程概述

本教程分为七个部分;他们是：

问题描述
加载并准备数据集
模型评估
用于多步预测的 CNN
具有单变量 CNN 的多步时间序列预测
使用多通道 CNN 的多步时间序列预测
具有多头 CNN 的多步时间序列预测

问题描述

'家庭用电量'数据集是一个多变量时间序列数据集，描述了四年内单个家庭的用电量。

该数据是在 2006 年 12 月至 2010 年 11 月之间收集的，并且每分钟收集家庭内的能耗观察结果。

它是一个多变量系列，由七个变量组成（除日期和时间外）;他们是：

global_active_power ：家庭消耗的总有功功率（千瓦）。
global_reactive_power ：家庭消耗的总无功功率（千瓦）。
电压：平均电压（伏特）。
global_intensity ：平均电流强度（安培）。
sub_metering_1 ：厨房的有功电能（瓦特小时的有功电能）。
sub_metering_2 ：用于洗衣的有功能量（瓦特小时的有功电能）。
sub_metering_3 ：气候控制系统的有功电能（瓦特小时的有功电能）。

有功和无功电能参考交流电的技术细节。

可以通过从总活动能量中减去三个定义的子计量变量的总和来创建第四个子计量变量，如下所示：

sub_metering_remainder = (global_active_power * 1000 / 60) - (sub_metering_1 + sub_metering_2 + sub_metering_3)

加载并准备数据集

数据集可以从 UCI 机器学习库下载为单个 20 兆字节的.zip 文件：

household_power_consumption.zip

下载数据集并将其解压缩到当前工作目录中。您现在将拥有大约 127 兆字节的文件“household_power_consumption.txt”并包含所有观察结果。

我们可以使用read_csv()函数来加载数据，并将前两列合并到一个日期时间列中，我们可以将其用作索引。

# load all data
dataset = read_csv('household_power_consumption.txt', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0,1]}, index_col=['datetime'])

接下来，我们可以用'_ 标记所有缺失值？_ '具有NaN值的字符，这是一个浮点数。

这将允许我们将数据作为一个浮点值数组而不是混合类型（效率较低）。

# mark all missing values
dataset.replace('?', nan, inplace=True)
# make dataset numeric
dataset = dataset.astype('float32')

我们还需要填写缺失值，因为它们已被标记。

一种非常简单的方法是从前一天的同一时间复制观察。我们可以在一个名为fill_missing()的函数中实现它，该函数将从 24 小时前获取数据的 NumPy 数组并复制值。

# fill missing values with a value at the same time one day ago
def fill_missing(values):
	one_day = 60 * 24
	for row in range(values.shape[0]):
		for col in range(values.shape[1]):
			if isnan(values[row, col]):
				values[row, col] = values[row - one_day, col]

我们可以将此函数直接应用于 DataFrame 中的数据。

# fill missing
fill_missing(dataset.values)

现在，我们可以使用上一节中的计算创建一个包含剩余子计量的新列。

# add a column for for the remainder of sub metering
values = dataset.values
dataset['sub_metering_4'] = (values[:,0] * 1000 / 60) - (values[:,4] + values[:,5] + values[:,6])

我们现在可以将清理后的数据集版本保存到新文件中;在这种情况下，我们只需将文件扩展名更改为.csv，并将数据集保存为“household_power_consumption.csv”。

# save updated dataset
dataset.to_csv('household_power_consumption.csv')

将所有这些结合在一起，下面列出了加载，清理和保存数据集的完整示例。

# load and clean-up data
from numpy import nan
from numpy import isnan
from pandas import read_csv
from pandas import to_numeric

# fill missing values with a value at the same time one day ago
def fill_missing(values):
	one_day = 60 * 24
	for row in range(values.shape[0]):
		for col in range(values.shape[1]):
			if isnan(values[row, col]):
				values[row, col] = values[row - one_day, col]

# load all data
dataset = read_csv('household_power_consumption.txt', sep=';', header=0, low_memory=False, infer_datetime_format=True, parse_dates={'datetime':[0,1]}, index_col=['datetime'])
# mark all missing values
dataset.replace('?', nan, inplace=True)
# make dataset numeric
dataset = dataset.astype('float32')
# fill missing
fill_missing(dataset.values)
# add a column for for the remainder of sub metering
values = dataset.values
dataset['sub_metering_4'] = (values[:,0] * 1000 / 60) - (values[:,4] + values[:,5] + values[:,6])
# save updated dataset
dataset.to_csv('household_power_consumption.csv')

运行该示例将创建新文件'household_power_consumption.csv'，我们可以将其用作建模项目的起点。

模型评估

在本节中，我们将考虑如何开发和评估家庭电力数据集的预测模型。

本节分为四个部分;他们是：

问题框架
评估指标
训练和测试集
前瞻性验证

问题框架

有许多方法可以利用和探索家庭用电量数据集。

在本教程中，我们将使用这些数据来探索一个非常具体的问题;那是：

鉴于最近的耗电量，未来一周的预期耗电量是多少？

这要求预测模型预测未来七天每天的总有功功率。

从技术上讲，考虑到多个预测步骤，这个问题的框架被称为多步骤时间序列预测问题。利用多个输入变量的模型可以称为多变量多步时间序列预测模型。

这种类型的模型在规划支出方面可能有助于家庭。在供应方面，它也可能有助于规划特定家庭的电力需求。

数据集的这种框架还表明，将每分钟功耗的观察结果下采样到每日总数是有用的。这不是必需的，但考虑到我们对每天的总功率感兴趣，这是有道理的。

我们可以使用 pandas DataFrame 上的 resample（）函数轻松实现这一点。使用参数'D'调用此函数允许按日期时间索引的加载数据按天分组（查看所有偏移别名）。然后，我们可以计算每天所有观测值的总和，并为八个变量中的每一个创建每日耗电量数据的新数据集。

下面列出了完整的示例。

# resample minute data to total for each day
from pandas import read_csv
# load the new file
dataset = read_csv('household_power_consumption.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
# resample data to daily
daily_groups = dataset.resample('D')
daily_data = daily_groups.sum()
# summarize
print(daily_data.shape)
print(daily_data.head())
# save
daily_data.to_csv('household_power_consumption_days.csv')

运行该示例将创建一个新的每日总功耗数据集，并将结果保存到名为“household_power_consumption_days.csv”的单独文件中。

我们可以将其用作数据集，用于拟合和评估所选问题框架的预测模型。

评估指标

预测将包含七个值，一个用于一周中的每一天。

多步预测问题通常分别评估每个预测时间步长。这有助于以下几个原因：

在特定提前期评论技能（例如+1 天 vs +3 天）。
在不同的交付时间基于他们的技能对比模型（例如，在+1 天的模型和在日期+5 的模型良好的模型）。

总功率的单位是千瓦，并且具有也在相同单位的误差度量将是有用的。均方根误差（RMSE）和平均绝对误差（MAE）都符合这个要求，尽管 RMSE 更常用，将在本教程中采用。与 MAE 不同，RMSE 更能预测预测误差。

此问题的表现指标是从第 1 天到第 7 天的每个提前期的 RMSE。

作为捷径，使用单个分数总结模型的表现以帮助模型选择可能是有用的。

可以使用的一个可能的分数是所有预测天数的 RMSE。

下面的函数evaluate_forecasts()将实现此行为并基于多个七天预测返回模型的表现。

# evaluate one or more weekly forecasts against expected values
def evaluate_forecasts(actual, predicted):
	scores = list()
	# calculate an RMSE score for each day
	for i in range(actual.shape[1]):
		# calculate mse
		mse = mean_squared_error(actual[:, i], predicted[:, i])
		# calculate rmse
		rmse = sqrt(mse)
		# store
		scores.append(rmse)
	# calculate overall RMSE
	s = 0
	for row in range(actual.shape[0]):
		for col in range(actual.shape[1]):
			s += (actual[row, col] - predicted[row, col])**2
	score = sqrt(s / (actual.shape[0] * actual.shape[1]))
	return score, scores

运行该函数将首先返回整个 RMSE，无论白天，然后每天返回一系列 RMSE 分数。

训练和测试集

我们将使用前三年的数据来训练预测模型和评估模型的最后一年。

给定数据集中的数据将分为标准周。这些是从周日开始到周六结束的周。

这是使用所选模型框架的现实且有用的方法，其中可以预测未来一周的功耗。它也有助于建模，其中模型可用于预测特定日期（例如星期三）或整个序列。

我们将数据拆分为标准周，从测试数据集向后工作。

数据的最后一年是 2010 年，2010 年的第一个星期日是 1 月 3 日。数据于 2010 年 11 月中旬结束，数据中最接近的最后一个星期六是 11 月 20 日。这给出了 46 周的测试数据。

下面提供了测试数据集的每日数据的第一行和最后一行以供确认。

2010-01-03,2083.4539999999984,191.61000000000055,350992.12000000034,8703.600000000033,3842.0,4920.0,10074.0,15888.233355799992
...
2010-11-20,2197.006000000004,153.76800000000028,346475.9999999998,9320.20000000002,4367.0,2947.0,11433.0,17869.76663959999

每日数据从 2006 年底开始。

数据集中的第一个星期日是 12 月 17 日，这是第二行数据。

将数据组织到标准周内为训练预测模型提供了 159 个完整的标准周。

2006-12-17,3390.46,226.0059999999994,345725.32000000024,14398.59999999998,2033.0,4187.0,13341.0,36946.66673200004
...
2010-01-02,1309.2679999999998,199.54600000000016,352332.8399999997,5489.7999999999865,801.0,298.0,6425.0,14297.133406600002

下面的函数split_dataset()将每日数据拆分为训练集和测试集，并将每个数据组织成标准周。

使用特定行偏移来使用数据集的知识来分割数据。然后使用 NumPy split（）函数将分割数据集组织成每周数据。

# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[1:-328], data[-328:-6]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/7))
	test = array(split(test, len(test)/7))
	return train, test

我们可以通过加载每日数据集并打印训练和测试集的第一行和最后一行数据来测试此功能，以确认它们符合上述预期。

完整的代码示例如下所示。

# split into standard weeks
from numpy import split
from numpy import array
from pandas import read_csv

# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[1:-328], data[-328:-6]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/7))
	test = array(split(test, len(test)/7))
	return train, test

# load the new file
dataset = read_csv('household_power_consumption_days.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
train, test = split_dataset(dataset.values)
# validate train data
print(train.shape)
print(train[0, 0, 0], train[-1, -1, 0])
# validate test
print(test.shape)
print(test[0, 0, 0], test[-1, -1, 0])

运行该示例表明，训练数据集确实有 159 周的数据，而测试数据集有 46 周。

我们可以看到，第一行和最后一行的训练和测试数据集的总有效功率与我们定义为每组标准周界限的特定日期的数据相匹配。

(159, 7, 8)
3390.46 1309.2679999999998
(46, 7, 8)
2083.4539999999984 2197.006000000004

前瞻性验证

将使用称为前进验证的方案评估模型。

这是需要模型进行一周预测的地方，然后该模型的实际数据可用于模型，以便它可以用作在随后一周做出预测的基础。这对于如何在实践中使用模型以及对模型有益，使其能够利用最佳可用数据都是现实的。

我们可以通过分离输入数据和输出/预测数据来证明这一点。

Input, 						Predict
[Week1]						Week2
[Week1 + Week2]				Week3
[Week1 + Week2 + Week3]		Week4
...

下面提供了评估此数据集上预测模型的前瞻性验证方法，命名为 evaluate_model（）。

标准周格式的训练和测试数据集作为参数提供给函数。提供了另一个参数n_input，用于定义模型将用作输入以做出预测的先前观察的数量。

调用两个新函数：一个用于根据称为build_model()的训练数据构建模型，另一个用于使用该模型对每个新标准周做出预测，称为forecast()。这些将在后续章节中介绍。

我们正在使用神经网络，因此它们通常很难训练但很快就能进行评估。这意味着模型的首选用法是在历史数据上构建一次，并使用它们来预测前向验证的每个步骤。模型在评估期间是静态的（即未更新）。

这与训练更快的其他模型不同，其中当新数据可用时，模型可以重新拟合或更新前进验证的每个步骤。有了足够的资源，就可以通过这种方式使用神经网络，但在本教程中我们不会这样做。

下面列出了完整的evaluate_model()函数。

# evaluate a single model
def evaluate_model(train, test, n_input):
	# fit model
	model = build_model(train, n_input)
	# history is a list of weekly data
	history = [x for x in train]
	# walk-forward validation over each week
	predictions = list()
	for i in range(len(test)):
		# predict the week
		yhat_sequence = forecast(model, history, n_input)
		# store the predictions
		predictions.append(yhat_sequence)
		# get real observation and add to history for predicting the next week
		history.append(test[i, :])
	# evaluate predictions days for each week
	predictions = array(predictions)
	score, scores = evaluate_forecasts(test[:, :, 0], predictions)
	return score, scores

一旦我们对模型进行评估，我们就可以总结表现。

下面的函数名为 summarize_scores（），将模型的表现显示为单行，以便与其他模型进行比较。

# summarize scores
def summarize_scores(name, score, scores):
	s_scores = ', '.join(['%.1f' % s for s in scores])
	print('%s: [%.3f] %s' % (name, score, s_scores))

我们现在已经开始评估数据集上的预测模型的所有元素。

用于多步预测的 CNN

卷积神经网络模型（简称 CNN）是一种深度神经网络，开发用于图像数据，如手写识别。

事实证明，它们在大规模训练时可以有效地挑战计算机视觉问题，例如识别和定位图像中的对象并自动描述图像内容。

它们是由两种主要类型的元素组成的模型：卷积层和池化层。

卷积层使用内核读取输入，例如 2D 图像或 1D 信号，该内核一次读取小段并跨越整个输入字段。每次读取都会导致对投影到滤镜图上的输入进行解释，并表示对输入的解释。

汇集层采用特征映射投影并将它们提取到最基本的元素，例如使用信号平均或信号最大化过程。

卷积和合并层可以在深度重复，提供输入信号的多层抽象。

这些网络的输出通常是一个或多个完全连接的层，用于解释已读取的内容并将此内部表示映射到类值。

有关卷积神经网络的更多信息，您可以看到帖子：

用于机器学习的卷积神经网络的速成课程

卷积神经网络可用于多步时间序列预测。

卷积层可以读取输入数据的序列并自动提取特征。
汇集层可以提取提取的特征，并将注意力集中在最显着的元素上。
完全连接的层可以解释内部表示并输出表示多个时间步长的向量。

该方法的主要优点是自动特征学习和模型直接输出多步向量的能力。

CNN 可用于递归或直接预测策略，其中模型使得一步预测和输出作为后续预测的输入被馈送，并且其中一个模型被开发用于每个预测的时间步长。或者，CNN 可用于预测整个输出序列，作为整个向量的一步预测。这是前馈神经网络的一般优点。

使用 CNN 的一个重要的第二个好处是它们可以支持多个 1D 输入以做出预测。如果多步输出序列是多个输入序列的函数，则这很有用。这可以使用两种不同的模型配置来实现。

多输入通道。这是每个输入序列作为单独的通道读取的地方，如图像的不同通道（例如红色，绿色和蓝色）。
多输入磁头。这是每个输入序列由不同的 CNN 子模型读取的地方，并且内部表示在被解释并用于做出预测之前被组合。

在本教程中，我们将探讨如何为多步时间序列预测开发三种不同类型的 CNN 模型;他们是：

CNN 用于使用单变量输入数据进行多步时间序列预测。
CNN 用于多步骤时间序列预测，通过信道提供多变量输入数据。
通过子模型使用多变量输入数据进行多步时间序列预测的 CNN。

将在家庭电力预测问题上开发和演示这些模型。如果一个模型比一个朴素的模型更好地实现表现，那么该模型被认为是技术性的，在 7 天的预测中，该模型的总体 RMSE 约为 465 千瓦。

我们不会专注于调整这些模型以实现最佳表现;相反，与朴素的预测相比，我们将在熟练的模型上停下来。选择的结构和超参数通过一些试验和错误来选择。

具有单变量 CNN 的多步时间序列预测

在本节中，我们将开发一个卷积神经网络，用于仅使用每日功耗的单变量序列进行多步时间序列预测。

具体来说，问题的框架是：

考虑到每日总耗电量的前几天，预测下一个标准周的每日耗电量。

用作输入的先前天数定义了 CNN 将读取并学习提取特征的数据的一维（1D）子序列。关于此输入的大小和性质的一些想法包括：

所有前几天，最多数年的数据。
前 7 天。
前两周。
前一个月。
前一年。
前一周和一周从一年前预测。

没有正确的答案;相反，可以测试每种方法和更多方法，并且可以使用模型的表现来选择导致最佳模型表现的输入的性质。

这些选择定义了有关实现的一些内容，例如：

如何准备训练数据以适应模型。
如何准备测试数据以评估模型。
如何使用该模型在未来使用最终模型做出预测。

一个好的起点是使用前七天。

1D CNN 模型期望数据具有以下形状：

[samples, timesteps, features]

一个样本将包含七个时间步骤，其中一个功能用于每日总耗电量的七天。

训练数据集有 159 周的数据，因此训练数据集的形状为：

[159, 7, 1]

这是一个好的开始。此格式的数据将使用先前的标准周来预测下一个标准周。一个问题是 159 个实例对于神经网络来说并不是很多。

创建更多训练数据的方法是在训练期间更改问题，以预测前七天的下一个七天，无论标准周。

这仅影响训练数据，测试问题保持不变：预测给定前一标准周的下一个标准周的每日功耗。

这将需要一点准备训练数据。

训练数据在标准周内提供八个变量，特别是形状[159,7,8]。第一步是展平数据，以便我们有八个时间序列序列。

# flatten data
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))

然后，我们需要迭代时间步骤并将数据划分为重叠窗口;每次迭代沿着一个时间步移动并预测随后的七天。

例如：

Input, Output
[d01, d02, d03, d04, d05, d06, d07],	[d08, d09, d10, d11, d12, d13, d14]
[d02, d03, d04, d05, d06, d07, d08],	[d09, d10, d11, d12, d13, d14, d15]
...

我们可以通过跟踪输入和输出的开始和结束索引来实现这一点，因为我们在时间步长方面迭代展平数据的长度。

我们也可以通过参数化输入和输出的数量来实现这一点（例如n_input，n_out），这样您就可以尝试不同的值或根据自己的问题进行调整。

下面是一个名为to_supervised()的函数，它采用周（历史）列表和用作输入和输出的时间步数，并以重叠移动窗口格式返回数据。

# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end < len(data):
			x_input = data[in_start:in_end, 0]
			x_input = x_input.reshape((len(x_input), 1))
			X.append(x_input)
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

当我们在整个训练数据集上运行此函数时，我们将 159 个样本转换为 1,099 个;具体地，变换的数据集具有形状 X = [1099,7,1] 和 y = [1099,7]。

接下来，我们可以在训练数据上定义和拟合 CNN 模型。

这个多步骤时间序列预测问题是一个自回归。这意味着它可能最好建模，其中接下来的七天是先前时间步骤的观测功能。这和相对少量的数据意味着需要一个小型号。

我们将使用一个具有一个卷积层的模型，其中包含 16 个滤波器，内核大小为 3.这意味着七次输入序列将通过卷积操作一次读取三个时间步，并且此操作将执行 16 次。在将内部表示展平为一个长向量之前，池化层将这些要素图减小其大小的 1/4。然后，在输出层预测序列中的下一个七天之前，由完全连接的层解释。

我们将使用均方误差损失函数，因为它与我们选择的 RMSE 误差度量非常匹配。我们将使用随机梯度下降的有效 Adam 实现，并将模型拟合 20 个时期，批量大小为 4。

小批量大小和算法的随机性意味着相同的模型将在每次训练时学习输入到输出的略微不同的映射。这意味着结果可能会在评估模型时发生变化。您可以尝试多次运行模型并计算模型表现的平均值。

下面的build_model()准备训练数据，定义模型，并将模型拟合到训练数据上，使拟合模型准备好做出预测。

# train the model
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 20, 4
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# define model
	model = Sequential()
	model.add(Conv1D(filters=16, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(10, activation='relu'))
	model.add(Dense(n_outputs))
	model.compile(loss='mse', optimizer='adam')
	# fit network
	model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

现在我们知道如何拟合模型，我们可以看看如何使用模型做出预测。

通常，模型期望数据在做出预测时具有相同的三维形状。

在这种情况下，输入模式的预期形状是一个样本，每天消耗的一个功能的七天：

[1, 7, 1]

在对测试集做出预测时以及在将来使用最终模型做出预测时，数据必须具有此形状。如果将输入天数更改为 14，则必须相应更改训练数据的形状和做出预测时新样本的形状，以便有 14 个时间步长。在使用模型时，您必须继续使用它。

我们正在使用前向验证来评估模型，如上一节中所述。

这意味着我们有前一周的观察结果，以预测下周。这些被收集到一系列标准周，称为历史。

为了预测下一个标准周，我们需要检索观察的最后几天。与训练数据一样，我们必须首先展平历史数据以删除每周结构，以便最终得到八个平行时间序列。

# flatten data
data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))

接下来，我们需要检索每日总功耗的最后七天（功能编号 0）。我们将像对训练数据那样进行参数化，以便将来可以修改模型用作输入的前几天的数量。

# retrieve last observations for input data
input_x = data[-n_input:, 0]

接下来，我们将输入重塑为预期的三维结构。

# reshape into [1, n_input, 1]
input_x = input_x.reshape((1, len(input_x), 1))

然后，我们使用拟合模型和输入数据做出预测，并检索七天输出的向量。

# forecast the next week
yhat = model.predict(input_x, verbose=0)
# we only want the vector forecast
yhat = yhat[0]

下面的forecast()函数实现了这个功能，并将模型拟合到训练数据集，到目前为止观察到的数据历史以及模型预期的输入时间步数。

# make a forecast
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, 0]
	# reshape into [1, n_input, 1]
	input_x = input_x.reshape((1, len(input_x), 1))
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

而已;我们现在拥有了所需的一切，我们需要通过 CNN 模型对单变量数据集的每日总功耗进行多步时间序列预测。

我们可以将所有这些结合在一起。下面列出了完整的示例。

# univariate multi-step cnn
from math import sqrt
from numpy import split
from numpy import array
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D

# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[1:-328], data[-328:-6]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/7))
	test = array(split(test, len(test)/7))
	return train, test

# evaluate one or more weekly forecasts against expected values
def evaluate_forecasts(actual, predicted):
	scores = list()
	# calculate an RMSE score for each day
	for i in range(actual.shape[1]):
		# calculate mse
		mse = mean_squared_error(actual[:, i], predicted[:, i])
		# calculate rmse
		rmse = sqrt(mse)
		# store
		scores.append(rmse)
	# calculate overall RMSE
	s = 0
	for row in range(actual.shape[0]):
		for col in range(actual.shape[1]):
			s += (actual[row, col] - predicted[row, col])**2
	score = sqrt(s / (actual.shape[0] * actual.shape[1]))
	return score, scores

# summarize scores
def summarize_scores(name, score, scores):
	s_scores = ', '.join(['%.1f' % s for s in scores])
	print('%s: [%.3f] %s' % (name, score, s_scores))

# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end < len(data):
			x_input = data[in_start:in_end, 0]
			x_input = x_input.reshape((len(x_input), 1))
			X.append(x_input)
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

# train the model
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 20, 4
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# define model
	model = Sequential()
	model.add(Conv1D(filters=16, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(10, activation='relu'))
	model.add(Dense(n_outputs))
	model.compile(loss='mse', optimizer='adam')
	# fit network
	model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

# make a forecast
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, 0]
	# reshape into [1, n_input, 1]
	input_x = input_x.reshape((1, len(input_x), 1))
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

# evaluate a single model
def evaluate_model(train, test, n_input):
	# fit model
	model = build_model(train, n_input)
	# history is a list of weekly data
	history = [x for x in train]
	# walk-forward validation over each week
	predictions = list()
	for i in range(len(test)):
		# predict the week
		yhat_sequence = forecast(model, history, n_input)
		# store the predictions
		predictions.append(yhat_sequence)
		# get real observation and add to history for predicting the next week
		history.append(test[i, :])
	# evaluate predictions days for each week
	predictions = array(predictions)
	score, scores = evaluate_forecasts(test[:, :, 0], predictions)
	return score, scores

# load the new file
dataset = read_csv('household_power_consumption_days.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
# split into train and test
train, test = split_dataset(dataset.values)
# evaluate model and get scores
n_input = 7
score, scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('cnn', score, scores)
# plot scores
days = ['sun', 'mon', 'tue', 'wed', 'thr', 'fri', 'sat']
pyplot.plot(days, scores, marker='o', label='cnn')
pyplot.show()

运行该示例适合并评估模型，在所有七天内打印整体 RMSE，以及每个提前期的每日 RMSE。

鉴于算法的随机性，您的具体结果可能会有所不同。您可能想尝试几次运行该示例。

我们可以看到，在这种情况下，与朴素的预测相比，该模型是巧妙的，实现了大约 404 千瓦的总体 RMSE，小于 465 千瓦的朴素模型。

cnn: [404.411] 436.1, 400.6, 346.2, 388.2, 405.5, 326.0, 502.9

还创建了每日 RMSE 的图。该图显示，周二和周五可能比其他日子更容易预测，也许星期六在标准周结束时是最难预测的日子。

Line Plot of RMSE per Day for Univariate CNN with 7-day Inputs

具有 7 天输入的单变量 CNN 每日 RMSE 的线图

我们可以通过更改n_input变量来增加用作 7 到 14 之间输入的前几天的数量。

# evaluate model and get scores
n_input = 14

使用此更改重新运行示例首先会打印模型表现的摘要。

具体结果可能有所不同;尝试运行几次这个例子。

在这种情况下，我们可以看到整体 RMSE 进一步下降，这表明进一步调整输入大小以及模型的内核大小可能会带来更好的表现。

cnn: [396.497] 392.2, 412.8, 384.0, 389.0, 387.3, 381.0, 427.1

比较每日 RMSE 分数，我们看到一些更好，有些比使用第七输入更差。

这可以建议以某种方式使用两个不同大小的输入的益处，例如两种方法的集合或者可能是以不同方式读取训练数据的单个模型（例如，多头模型）。

Line Plot of RMSE per Day for Univariate CNN with 14-day Inputs

单变量 CNN 每日 RMSE 的线图，具有 14 天输入

使用多通道 CNN 的多步时间序列预测

在本节中，我们将更新上一节中开发的 CNN，以使用八个时间序列变量中的每一个来预测下一个标准周的每日总功耗。

我们将通过将每个一维时间序列作为单独的输入通道提供给模型来实现此目的。

然后，CNN 将使用单独的内核并将每个输入序列读取到一组单独的过滤器映射上，主要是从每个输入时间序列变量中学习特征。

这对于那些输出序列是来自多个不同特征的先前时间步骤的观察的某些功能的问题是有帮助的，而不仅仅是（或包括）预测的特征。目前还不清楚功耗问题是否属于这种情况，但我们仍可以探索它。

首先，我们必须更新训练数据的准备工作，以包括所有八项功能，而不仅仅是每日消耗的一项功能。它需要一行：

X.append(data[in_start:in_end, :])

下面列出了具有此更改的完整to_supervised()功能。

# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end < len(data):
			X.append(data[in_start:in_end, :])
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

我们还必须使用拟合模型更新用于做出预测的函数，以使用先前时间步骤中的所有八个特征。再次，另一个小变化：

# retrieve last observations for input data
input_x = data[-n_input:, :]
# reshape into [1, n_input, n]
input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))

具有此更改的完整forecast()如下所示：

# make a forecast
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, :]
	# reshape into [1, n_input, n]
	input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

我们将在前面部分的最后一部分中使用 14 天的先前观察到 8 个输入变量，这导致表现稍好一些。

n_input = 14

最后，上一节中使用的模型在这个问题的新框架上表现不佳。

数据量的增加需要更大，更复杂的模型，这种模型需要更长时间的训练。

通过一些试验和错误，一个表现良好的模型使用两个卷积层，32 个滤波器映射，然后汇集，然后另一个卷积层，16 个特征映射和汇集。解释特征的完全连接层增加到 100 个节点，该模型适用于 70 个迭代，批量大小为 16 个样本。

下面列出了更新的build_model()函数，该函数定义并拟合训练数据集上的模型。

# train the model
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 70, 16
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# define model
	model = Sequential()
	model.add(Conv1D(filters=32, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
	model.add(Conv1D(filters=32, kernel_size=3, activation='relu'))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Conv1D(filters=16, kernel_size=3, activation='relu'))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(100, activation='relu'))
	model.add(Dense(n_outputs))
	model.compile(loss='mse', optimizer='adam')
	# fit network
	model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

我们现在拥有为多变量输入数据开发多通道 CNN 以进行多步时间序列预测所需的所有元素。

下面列出了完整的示例。

# multichannel multi-step cnn
from math import sqrt
from numpy import split
from numpy import array
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D

# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[1:-328], data[-328:-6]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/7))
	test = array(split(test, len(test)/7))
	return train, test

# evaluate one or more weekly forecasts against expected values
def evaluate_forecasts(actual, predicted):
	scores = list()
	# calculate an RMSE score for each day
	for i in range(actual.shape[1]):
		# calculate mse
		mse = mean_squared_error(actual[:, i], predicted[:, i])
		# calculate rmse
		rmse = sqrt(mse)
		# store
		scores.append(rmse)
	# calculate overall RMSE
	s = 0
	for row in range(actual.shape[0]):
		for col in range(actual.shape[1]):
			s += (actual[row, col] - predicted[row, col])**2
	score = sqrt(s / (actual.shape[0] * actual.shape[1]))
	return score, scores

# summarize scores
def summarize_scores(name, score, scores):
	s_scores = ', '.join(['%.1f' % s for s in scores])
	print('%s: [%.3f] %s' % (name, score, s_scores))

# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end < len(data):
			X.append(data[in_start:in_end, :])
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

# train the model
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 70, 16
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# define model
	model = Sequential()
	model.add(Conv1D(filters=32, kernel_size=3, activation='relu', input_shape=(n_timesteps,n_features)))
	model.add(Conv1D(filters=32, kernel_size=3, activation='relu'))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Conv1D(filters=16, kernel_size=3, activation='relu'))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(100, activation='relu'))
	model.add(Dense(n_outputs))
	model.compile(loss='mse', optimizer='adam')
	# fit network
	model.fit(train_x, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

# make a forecast
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, :]
	# reshape into [1, n_input, n]
	input_x = input_x.reshape((1, input_x.shape[0], input_x.shape[1]))
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

# evaluate a single model
def evaluate_model(train, test, n_input):
	# fit model
	model = build_model(train, n_input)
	# history is a list of weekly data
	history = [x for x in train]
	# walk-forward validation over each week
	predictions = list()
	for i in range(len(test)):
		# predict the week
		yhat_sequence = forecast(model, history, n_input)
		# store the predictions
		predictions.append(yhat_sequence)
		# get real observation and add to history for predicting the next week
		history.append(test[i, :])
	# evaluate predictions days for each week
	predictions = array(predictions)
	score, scores = evaluate_forecasts(test[:, :, 0], predictions)
	return score, scores

# load the new file
dataset = read_csv('household_power_consumption_days.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
# split into train and test
train, test = split_dataset(dataset.values)
# evaluate model and get scores
n_input = 14
score, scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('cnn', score, scores)
# plot scores
days = ['sun', 'mon', 'tue', 'wed', 'thr', 'fri', 'sat']
pyplot.plot(days, scores, marker='o', label='cnn')
pyplot.show()

运行该示例适合并评估模型，在所有七天内打印整体 RMSE，以及每个提前期的每日 RMSE。

鉴于算法的随机性，您的具体结果可能会有所不同。您可能想尝试几次运行该示例。

我们可以看到，在这种情况下，使用所有八个输入变量确实导致整体 RMSE 分数的另一个小幅下降。

cnn: [385.711] 422.2, 363.5, 349.8, 393.1, 357.1, 318.8, 474.3

对于每日 RMSE 分数，我们确实看到一些更好，一些比上一节中的单变量 CNN 更差。

最后一天，周六，仍然是充满挑战的预测日，周五是一个轻松的预测日。设计模型可能会有一些好处，专门用于减少更难预测天数的误差。

可能有趣的是，可以通过调谐模型或者可能是多个不同模型的集合来进一步降低每日分数的方差。比较使用 7 天甚至 21 天输入数据的模型的表现以查看是否可以进一步获得也可能是有趣的。

Line Plot of RMSE per Day for a Multichannel CNN with 14-day Inputs

具有 14 天输入的多通道 CNN 每天 RMSE 的线图

具有多头 CNN 的多步时间序列预测

我们可以进一步扩展 CNN 模型，为每个输入变量设置一个单独的子 CNN 模型或头部，我们可以将其称为多头 CNN 模型。

这需要修改模型的准备，进而修改训练和测试数据集的准备。

从模型开始，我们必须为八个输入变量中的每一个定义一个单独的 CNN 模型。

模型的配置（包括层数及其超参数）也进行了修改，以更好地适应新方法。新配置不是最佳配置，只需稍加试错即可找到。

使用更灵活的功能 API 来定义多头模型以定义 Keras 模型。

我们可以遍历每个变量并创建一个子模型，该子模型采用 14 天数据的一维序列，并输出包含序列中学习特征摘要的平面向量。这些向量中的每一个可以通过串联合并以产生一个非常长的向量，然后在做出预测之前由一些完全连接的层解释。

在我们构建子模型时，我们会跟踪输入层并在列表中展平层。这样我们就可以在模型对象的定义中指定输入，并使用合并层中的展平层列表。

# create a channel for each variable
in_layers, out_layers = list(), list()
for i in range(n_features):
	inputs = Input(shape=(n_timesteps,1))
	conv1 = Conv1D(filters=32, kernel_size=3, activation='relu')(inputs)
	conv2 = Conv1D(filters=32, kernel_size=3, activation='relu')(conv1)
	pool1 = MaxPooling1D(pool_size=2)(conv2)
	flat = Flatten()(pool1)
	# store layers
	in_layers.append(inputs)
	out_layers.append(flat)
# merge heads
merged = concatenate(out_layers)
# interpretation
dense1 = Dense(200, activation='relu')(merged)
dense2 = Dense(100, activation='relu')(dense1)
outputs = Dense(n_outputs)(dense2)
model = Model(inputs=in_layers, outputs=outputs)
# compile model
model.compile(loss='mse', optimizer='adam')

使用该模型时，它将需要八个数组作为输入：每个子模型一个。

在训练模型，评估模型以及使用最终模型做出预测时，这是必需的。

我们可以通过创建一个 3D 数组列表来实现这一点，其中每个 3D 数组包含[_ 样本，时间步长，1_ ]，具有一个特征。

我们可以按以下格式准备训练数据集：

input_data = [train_x[:,:,i].reshape((train_x.shape[0],n_timesteps,1)) for i in range(n_features)]

下面列出了具有这些更改的更新的build_model()函数。

# train the model
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 25, 16
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# create a channel for each variable
	in_layers, out_layers = list(), list()
	for i in range(n_features):
		inputs = Input(shape=(n_timesteps,1))
		conv1 = Conv1D(filters=32, kernel_size=3, activation='relu')(inputs)
		conv2 = Conv1D(filters=32, kernel_size=3, activation='relu')(conv1)
		pool1 = MaxPooling1D(pool_size=2)(conv2)
		flat = Flatten()(pool1)
		# store layers
		in_layers.append(inputs)
		out_layers.append(flat)
	# merge heads
	merged = concatenate(out_layers)
	# interpretation
	dense1 = Dense(200, activation='relu')(merged)
	dense2 = Dense(100, activation='relu')(dense1)
	outputs = Dense(n_outputs)(dense2)
	model = Model(inputs=in_layers, outputs=outputs)
	# compile model
	model.compile(loss='mse', optimizer='adam')
	# plot the model
	plot_model(model, show_shapes=True, to_file='multiheaded_cnn.png')
	# fit network
	input_data = [train_x[:,:,i].reshape((train_x.shape[0],n_timesteps,1)) for i in range(n_features)]
	model.fit(input_data, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

构建模型时，会创建模型结构图并将其保存到文件中。

注意：对 plot_model（）的调用要求安装 pygraphviz 和 pydot。如果这是一个问题，您可以注释掉这一行。

网络结构如下。

Structure of the Multi Headed Convolutional Neural Network

多头卷积神经网络的结构

接下来，我们可以在对测试数据集做出预测时更新输入样本的准备。

我们必须执行相同的更改，其中[1,14,8]的输入数组必须转换为八个 3D 数组的列表，每个数组都带有[1,14,1]。

input_x = [input_x[:,i].reshape((1,input_x.shape[0],1)) for i in range(input_x.shape[1])]

下面列出了具有此更改的forecast()函数。

# make a forecast
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, :]
	# reshape into n input arrays
	input_x = [input_x[:,i].reshape((1,input_x.shape[0],1)) for i in range(input_x.shape[1])]
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

而已。

我们可以将所有这些结合在一起;下面列出了完整的示例。

# multi headed multi-step cnn
from math import sqrt
from numpy import split
from numpy import array
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from keras.models import Model
from keras.layers import Input
from keras.layers.merge import concatenate

# split a univariate dataset into train/test sets
def split_dataset(data):
	# split into standard weeks
	train, test = data[1:-328], data[-328:-6]
	# restructure into windows of weekly data
	train = array(split(train, len(train)/7))
	test = array(split(test, len(test)/7))
	return train, test

# evaluate one or more weekly forecasts against expected values
def evaluate_forecasts(actual, predicted):
	scores = list()
	# calculate an RMSE score for each day
	for i in range(actual.shape[1]):
		# calculate mse
		mse = mean_squared_error(actual[:, i], predicted[:, i])
		# calculate rmse
		rmse = sqrt(mse)
		# store
		scores.append(rmse)
	# calculate overall RMSE
	s = 0
	for row in range(actual.shape[0]):
		for col in range(actual.shape[1]):
			s += (actual[row, col] - predicted[row, col])**2
	score = sqrt(s / (actual.shape[0] * actual.shape[1]))
	return score, scores

# summarize scores
def summarize_scores(name, score, scores):
	s_scores = ', '.join(['%.1f' % s for s in scores])
	print('%s: [%.3f] %s' % (name, score, s_scores))

# convert history into inputs and outputs
def to_supervised(train, n_input, n_out=7):
	# flatten data
	data = train.reshape((train.shape[0]*train.shape[1], train.shape[2]))
	X, y = list(), list()
	in_start = 0
	# step over the entire history one time step at a time
	for _ in range(len(data)):
		# define the end of the input sequence
		in_end = in_start + n_input
		out_end = in_end + n_out
		# ensure we have enough data for this instance
		if out_end < len(data):
			X.append(data[in_start:in_end, :])
			y.append(data[in_end:out_end, 0])
		# move along one time step
		in_start += 1
	return array(X), array(y)

# plot training history
def plot_history(history):
	# plot loss
	pyplot.subplot(2, 1, 1)
	pyplot.plot(history.history['loss'], label='train')
	pyplot.plot(history.history['val_loss'], label='test')
	pyplot.title('loss', y=0, loc='center')
	pyplot.legend()
	# plot rmse
	pyplot.subplot(2, 1, 2)
	pyplot.plot(history.history['rmse'], label='train')
	pyplot.plot(history.history['val_rmse'], label='test')
	pyplot.title('rmse', y=0, loc='center')
	pyplot.legend()
	pyplot.show()

# train the model
def build_model(train, n_input):
	# prepare data
	train_x, train_y = to_supervised(train, n_input)
	# define parameters
	verbose, epochs, batch_size = 0, 25, 16
	n_timesteps, n_features, n_outputs = train_x.shape[1], train_x.shape[2], train_y.shape[1]
	# create a channel for each variable
	in_layers, out_layers = list(), list()
	for i in range(n_features):
		inputs = Input(shape=(n_timesteps,1))
		conv1 = Conv1D(filters=32, kernel_size=3, activation='relu')(inputs)
		conv2 = Conv1D(filters=32, kernel_size=3, activation='relu')(conv1)
		pool1 = MaxPooling1D(pool_size=2)(conv2)
		flat = Flatten()(pool1)
		# store layers
		in_layers.append(inputs)
		out_layers.append(flat)
	# merge heads
	merged = concatenate(out_layers)
	# interpretation
	dense1 = Dense(200, activation='relu')(merged)
	dense2 = Dense(100, activation='relu')(dense1)
	outputs = Dense(n_outputs)(dense2)
	model = Model(inputs=in_layers, outputs=outputs)
	# compile model
	model.compile(loss='mse', optimizer='adam')
	# fit network
	input_data = [train_x[:,:,i].reshape((train_x.shape[0],n_timesteps,1)) for i in range(n_features)]
	model.fit(input_data, train_y, epochs=epochs, batch_size=batch_size, verbose=verbose)
	return model

# make a forecast
def forecast(model, history, n_input):
	# flatten data
	data = array(history)
	data = data.reshape((data.shape[0]*data.shape[1], data.shape[2]))
	# retrieve last observations for input data
	input_x = data[-n_input:, :]
	# reshape into n input arrays
	input_x = [input_x[:,i].reshape((1,input_x.shape[0],1)) for i in range(input_x.shape[1])]
	# forecast the next week
	yhat = model.predict(input_x, verbose=0)
	# we only want the vector forecast
	yhat = yhat[0]
	return yhat

# evaluate a single model
def evaluate_model(train, test, n_input):
	# fit model
	model = build_model(train, n_input)
	# history is a list of weekly data
	history = [x for x in train]
	# walk-forward validation over each week
	predictions = list()
	for i in range(len(test)):
		# predict the week
		yhat_sequence = forecast(model, history, n_input)
		# store the predictions
		predictions.append(yhat_sequence)
		# get real observation and add to history for predicting the next week
		history.append(test[i, :])
	# evaluate predictions days for each week
	predictions = array(predictions)
	score, scores = evaluate_forecasts(test[:, :, 0], predictions)
	return score, scores

# load the new file
dataset = read_csv('household_power_consumption_days.csv', header=0, infer_datetime_format=True, parse_dates=['datetime'], index_col=['datetime'])
# split into train and test
train, test = split_dataset(dataset.values)
# evaluate model and get scores
n_input = 14
score, scores = evaluate_model(train, test, n_input)
# summarize scores
summarize_scores('cnn', score, scores)
# plot scores
days = ['sun', 'mon', 'tue', 'wed', 'thr', 'fri', 'sat']
pyplot.plot(days, scores, marker='o', label='cnn')
pyplot.show()

运行该示例适合并评估模型，在所有七天内打印整体 RMSE，以及每个提前期的每日 RMSE。

鉴于算法的随机性，您的具体结果可能会有所不同。您可能想尝试几次运行该示例。

我们可以看到，在这种情况下，与朴素的预测相比，整体 RMSE 非常熟练，但是所选择的配置可能不会比上一节中的多通道模型表现更好。

cnn: [396.116] 414.5, 385.5, 377.2, 412.1, 371.1, 380.6, 428.1

我们还可以看到每日 RMSE 分数的不同，更明显的概况，其中 Mon-Tue 和 Thu-Fri 可能比其他预测天更容易预测模型。

与其他预测模型结合使用时，这些结果可能很有用。

在架构中探索用于合并每个子模型的输出的替代方法可能是有趣的。

Line Plot of RMSE per Day for a Multi-head CNN with 14-day Inputs

具有 14 天输入的多头 CNN 每天 RMSE 的线图

扩展

本节列出了一些扩展您可能希望探索的教程的想法。

输入大小。探索用作模型输入的更多或更少天数，例如三天，21 天，30 天等。
模型调整。调整模型的结构和超参数，并进一步平均提升模型表现。
数据缩放。探索数据扩展（例如标准化和规范化）是否可用于改善任何 CNN 模型的表现。
学习诊断。使用诊断，例如训练的学习曲线和验证损失以及均方误差，以帮助调整 CNN 模型的结构和超参数。
不同的内核大小。将多通道 CNN 与多头 CNN 结合使用，并为每个磁头使用不同的内核大小，以查看此配置是否可以进一步提高表现。

如果你探索任何这些扩展，我很想知道。

进一步阅读

如果您希望深入了解，本节将提供有关该主题的更多资源。

API

用品

摘要

在本教程中，您了解了如何为多步时间序列预测开发一维卷积神经网络。

具体来说，你学到了：

如何开发 CNN 用于单变量数据的多步时间序列预测模型。
如何开发多变量数据的多通道多步时间序列预测模型。
如何开发多元数据的多头多步时间序列预测模型。

你有任何问题吗？在下面的评论中提出您的问题，我会尽力回答。

如何开发用于单变量时间序列预测的深度学习模型

原文： machinelearningmastery.com/how-to-develop-deep-learning-models-for-univariate-time-series-forecasting/

深度学习神经网络能够自动学习和从原始数据中提取特征。

神经网络的这一特征可用于时间序列预测问题，其中模型可以直接在原始观测上开发，而不需要使用归一化和标准化来扩展数据或通过差分使数据静止。

令人印象深刻的是，简单的深度学习神经网络模型能够进行熟练的预测，与朴素模型和调整 SARIMA 模型相比，单变量时间序列预测存在趋势和季节性成分且无需预处理的问题。

在本教程中，您将了解如何开发一套用于单变量时间序列预测的深度学习模型。

完成本教程后，您将了解：

如何使用前向验证开发一个强大的测试工具来评估神经网络模型的表现。
如何开发和评估简单多层感知机和卷积神经网络的时间序列预测。
如何开发和评估 LSTM，CNN-LSTM 和 ConvLSTM 神经网络模型用于时间序列预测。

让我们开始吧。

How to Develop Deep Learning Models for Univariate Time Series Forecasting

如何开发单变量时间序列预测的深度学习模型照片由 Nathaniel McQueen ，保留一些权利。

教程概述

本教程分为五个部分;他们是：

问题描述
模型评估测试线束
多层感知机模型
卷积神经网络模型
循环神经网络模型

问题描述

'_ 月度汽车销售 _'数据集总结了 1960 年至 1968 年间加拿大魁北克省的月度汽车销量。

您可以从 DataMarket 了解有关数据集的更多信息。

直接从这里下载数据集：

month-car-sales.csv

在当前工作目录中使用文件名“ monthly-car-sales.csv ”保存文件。

我们可以使用函数read_csv()将此数据集作为 Pandas 系列加载。

# load
series = read_csv('monthly-car-sales.csv', header=0, index_col=0)

加载后，我们可以总结数据集的形状，以确定观察的数量。

# summarize shape
print(series.shape)

然后我们可以创建该系列的线图，以了解该系列的结构。

# plot
pyplot.plot(series)
pyplot.show()

我们可以将所有这些结合在一起;下面列出了完整的示例。

# load and plot dataset
from pandas import read_csv
from matplotlib import pyplot
# load
series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
# summarize shape
print(series.shape)
# plot
pyplot.plot(series)
pyplot.show()

首先运行该示例将打印数据集的形状。

(108, 1)

该数据集是每月一次，有 9 年或 108 次观测。在我们的测试中，将使用去年或 12 个观测值作为测试集。

创建线图。数据集具有明显的趋势和季节性成分。季节性成分的期限可能是六个月或 12 个月。

Line Plot of Monthly Car Sales

月度汽车销售线图

从之前的实验中，我们知道一个幼稚的模型可以通过取预测月份的前三年的观测值的中位数来实现 1841.155 的均方根误差或 RMSE;例如：

yhat = median(-12, -24, -36)

负指数是指相对于预测月份的历史数据末尾的序列中的观察值。

从之前的实验中，我们知道 SARIMA 模型可以达到 1551.842 的 RMSE，其配置为 SARIMA（0,0,0），（1,1,0），12 其中没有为趋势指定元素和季节性差异计算周期为 12，并使用一个季节的 AR 模型。

朴素模型的表现为被认为熟练的模型提供了下限。任何在过去 12 个月内达到低于 1841.155 的预测表现的模型都具有技巧。

SARIMA 模型的表现可以衡量问题的良好模型。任何在过去 12 个月内达到预测表现低于 1551.842 的模型都应采用 SARIMA 模型。

现在我们已经定义了模型技能的问题和期望，我们可以看看定义测试工具。

模型评估测试线束

在本节中，我们将开发一个测试工具，用于开发和评估不同类型的神经网络模型，用于单变量时间序列预测。

本节分为以下几部分：

训练 - 测试分裂
系列作为监督学习
前瞻性验证
重复评估
总结表现
工作示例

训练 - 测试分裂

第一步是将加载的系列分成训练和测试集。

我们将使用前八年（96 个观测值）进行训练，最后 12 个用于测试集。

下面的train_test_split()函数将拆分系列，将原始观察值和在测试集中使用的观察数作为参数。

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

系列作为监督学习

接下来，我们需要能够将单变量观测系列框架化为监督学习问题，以便我们可以训练神经网络模型。

系列的监督学习框架意味着需要将数据拆分为模型从中学习和概括的多个示例。

每个样本必须同时具有输入组件和输出组件。

输入组件将是一些先前的观察，例如三年或 36 个时间步骤。

输出组件将是下个月的总销售额，因为我们有兴趣开发一个模型来进行一步预测。

我们可以使用 pandas DataFrame 上的 shift（）函数来实现它。它允许我们向下移动一列（向前移动）或向后移动（向后移动）。我们可以将该系列作为一列数据，然后创建列的多个副本，向前或向后移动，以便使用我们需要的输入和输出元素创建样本。

当一个系列向下移动时，会引入NaN值，因为我们没有超出系列开头的值。

例如，系列定义为列：

(t)
1
2
3
4

可以预先移位和插入列：

(t-1),		(t)
Nan,		1
1,			2
2,			3
3,			4
4			NaN

我们可以看到，在第二行，值 1 作为输入提供，作为前一时间步的观察，2 是系列中可以预测的下一个值，或者当 1 是预测模型时要学习的值作为输入呈现。

可以去除具有NaN值的行。

下面的series_to_supervised()函数实现了这种行为，允许您指定输入中使用的滞后观察数和每个样本的输出中使用的数。它还将删除具有NaN值的行，因为它们不能用于训练或测试模型。

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

前瞻性验证

可以使用前进验证在测试集上评估时间序列预测模型。

前瞻性验证是一种方法，其中模型一次一个地对测试数据集中的每个观察做出预测。在对测试数据集中的时间步长进行每个预测之后，将预测的真实观察结果添加到测试数据集并使其可用于模型。

在进行后续预测之前，可以使用观察结果更简单的模型。考虑到更高的计算成本，更复杂的模型，例如神经网络，不会被改装。

然而，时间步骤的真实观察可以用作输入的一部分，用于在下一个时间步骤上做出预测。

首先，数据集分为训练集和测试集。我们将调用train_test_split()函数来执行此拆分并传入预先指定数量的观察值以用作测试数据。

对于给定配置，模型将适合训练数据集一次。

我们将定义一个通用的model_fit()函数来执行此操作，可以为我们稍后可能感兴趣的给定类型的神经网络填充该操作。该函数获取训练数据集和模型配置，并返回准备好做出预测的拟合模型。

# fit a model
def model_fit(train, config):
	return None

枚举测试数据集的每个时间步。使用拟合模型做出预测。

同样，我们将定义一个名为model_predict()的通用函数，它采用拟合模型，历史和模型配置，并进行单个一步预测。

# forecast with a pre-fit model
def model_predict(model, history, config):
	return 0.0

将预测添加到预测列表中，并将来自测试集的真实观察结果添加到用训练数据集中的所有观察结果播种的观察列表中。此列表在前向验证的每个步骤中构建，允许模型使用最新历史记录进行一步预测。

然后可以将所有预测与测试集中的真实值进行比较，并计算误差测量值。

我们将计算预测和真实值之间的均方根误差或 RMSE。

RMSE 计算为预测值与实际值之间的平方差的平均值的平方根。measure_rmse()使用 mean_squared_error（）scikit-learn 函数在计算平方根之前首先计算均方误差或 MSE。

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

下面列出了将所有这些联系在一起的完整walk_forward_validation()函数。

它采用数据集，用作测试集的观察数量以及模型的配置，并返回测试集上模型表现的 RMSE。

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

重复评估

神经网络模型是随机的。

这意味着，在给定相同的模型配置和相同的训练数据集的情况下，每次训练模型时将产生不同的内部权重集，这反过来将具有不同的表现。

这是一个好处，允许模型自适应并找到复杂问题的高表现配置。

在评估模型的表现和选择用于做出预测的最终模型时，这也是一个问题。

为了解决模型评估问题，我们将通过前向验证多次评估模型配置，并将错误报告为每次评估的平均误差。

对于大型神经网络而言，这并不总是可行的，并且可能仅适用于能够在几分钟或几小时内完成的小型网络。

下面的repeat_evaluate()函数实现了这一点，并允许将重复次数指定为默认为 30 的可选参数，并返回模型表现得分列表：在本例中为 RMSE 值。

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

总结表现

最后，我们需要从多个重复中总结模型的表现。

我们将首先使用汇总统计汇总表现，特别是平均值和标准差。

我们还将使用盒子和须状图绘制模型表现分数的分布，以帮助了解表现的传播。

下面的summarize_scores()函数实现了这一点，取了评估模型的名称和每次重复评估的分数列表，打印摘要并显示图表。

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

工作示例

现在我们已经定义了测试工具的元素，我们可以将它们绑定在一起并定义一个简单的持久性模型。

具体而言，我们将计算先前观察的子集相对于预测时间的中值。

我们不需要拟合模型，因此model_fit()函数将被实现为简单地返回 _ 无 _。

# fit a model
def model_fit(train, config):
	return None

我们将使用配置来定义先前观察中的索引偏移列表，该列表相对于将被用作预测的预测时间。例如，12 将使用 12 个月前（-12）相对于预测时间的观察。

# define config
config = [12, 24, 36]

可以实现 model_predict（）函数以使用此配置来收集观察值，然后返回这些观察值的中值。

# forecast with a pre-fit model
def model_predict(model, history, config):
	values = list()
	for offset in config:
		values.append(history[-offset])
	return median(values)

下面列出了使用简单持久性模型使用框架的完整示例。

# persistence
from math import sqrt
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from matplotlib import pyplot

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# difference dataset
def difference(data, interval):
	return [data[i] - data[i - interval] for i in range(interval, len(data))]

# fit a model
def model_fit(train, config):
	return None

# forecast with a pre-fit model
def model_predict(model, history, config):
	values = list()
	for offset in config:
		values.append(history[-offset])
	return median(values)

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [12, 24, 36]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('persistence', scores)

运行该示例将打印在最近 12 个月的数据中使用前向验证评估的模型的 RMSE。

该模型被评估 30 次，但由于该模型没有随机元素，因此每次得分相同。

 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
 > 1841.156
persistence: 1841.156 RMSE (+/- 0.000)

我们可以看到模型的 RMSE 是 1841，提供了表现的下限，通过它我们可以评估模型是否熟练掌握问题。

Box and Whisker Plot of Persistence RMSE Forecasting Car Sales

持久性 RMSE 预测汽车销售的盒子和晶须图

既然我们拥有强大的测试工具，我们就可以用它来评估一套神经网络模型。

多层感知机模型

我们将评估的第一个网络是多层感知机，简称 MLP。

这是一个简单的前馈神经网络模型，应该在考虑更复杂的模型之前进行评估。

MLP 可用于时间序列预测，方法是在先前时间步骤中进行多次观测，称为滞后观测，并将其用作输入要素并根据这些观测预测一个或多个时间步长。

这正是上一节中series_to_supervised()函数提供的问题的框架。

因此，训练数据集是样本列表，其中每个样本在预测时间之前的几个月具有一定数量的观察，并且预测是序列中的下个月。例如：

X, 							y
month1, month2, month3,		month4
month2, month3, month4,		month5
month3, month4, month5,		month6
...

该模型将尝试概括这些样本，以便当提供超出模型已知的新样本时，它可以预测有用的东西;例如：

X, 							y
month4, month5, month6,		???

我们将使用 Keras 深度学习库实现一个简单的 MLP。

该模型将具有输入层，其具有一些先前的观察结果。当我们定义第一个隐藏层时，可以使用input_dim参数指定。该模型将具有单个隐藏层，其具有一定数量的节点，然后是单个输出层。

我们将在隐藏层上使用经过校正的线性激活函数，因为它表现良好。我们将在输出层使用线性激活函数（默认值），因为我们正在预测连续值。

网络的损失函数将是均方误差损失或 MSE，我们将使用随机梯度下降的高效 Adam 风格来训练网络。

# define model
model = Sequential()
model.add(Dense(n_nodes, activation='relu', input_dim=n_input))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')

该模型将适合一些训练时期（对训练数据的暴露），并且可以指定批量大小以定义在每个时期内权重的更新频率。

下面列出了用于在训练数据集上拟合 MLP 模型的model_fit()函数。

该函数要求配置为具有以下配置超参数的列表：

n_input ：用作模型输入的滞后观察数。
n_nodes ：隐藏层中使用的节点数。
n_epochs ：将模型公开给整个训练数据集的次数。
n_batch ：更新权重的时期内的样本数。

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_nodes, n_epochs, n_batch = config
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	# define model
	model = Sequential()
	model.add(Dense(n_nodes, activation='relu', input_dim=n_input))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

使用拟合 MLP 模型做出预测与调用predict()函数并传入做出预测所需的一个样本值输入值一样简单。

yhat = model.predict(x_input, verbose=0)

为了使预测超出已知数据的限制，这要求将最后 n 个已知观察值作为数组并用作输入。

predict()函数在做出预测时需要一个或多个输入样本，因此提供单个样本需要数组具有[1, n_input]形状，其中n_input是模型期望作为输入的时间步数。

类似地，predict()函数返回一个预测数组，每个样本一个作为输入提供。在一个预测的情况下，将存在具有一个值的数组。

下面的model_predict()函数实现了这种行为，将模型，先前观察和模型配置作为参数，制定输入样本并进行一步预测，然后返回。

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _ = config
	# prepare data
	x_input = array(history[-n_input:]).reshape(1, n_input)
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

我们现在拥有在月度汽车销售数据集上评估 MLP 模型所需的一切。

进行模型超参数的简单网格搜索，并选择下面的配置。这可能不是最佳配置，但却是最好的配置。

n_input ：24（例如 24 个月）
n_nodes ：500
n_epochs ：100
n_batch ：100

此配置可以定义为列表：

# define config
config = [24, 500, 100, 100]

请注意，当训练数据被构建为监督学习问题时，只有 72 个样本可用于训练模型。

使用 72 或更大的批量大小意味着使用批量梯度下降而不是小批量梯度下降来训练模型。这通常用于小数据集，并且意味着在每个时期结束时执行权重更新和梯度计算，而不是在每个时期内多次执行。

完整的代码示例如下所示。

# evaluate mlp
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from matplotlib import pyplot

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_nodes, n_epochs, n_batch = config
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	# define model
	model = Sequential()
	model.add(Dense(n_nodes, activation='relu', input_dim=n_input))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _ = config
	# prepare data
	x_input = array(history[-n_input:]).reshape(1, n_input)
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [24, 500, 100, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('mlp', scores)

运行该示例为模型的 30 次重复评估中的每一次打印 RMSE。

在运行结束时，报告的平均和标准偏差 RMSE 约为 1,526 销售。

我们可以看到，平均而言，所选配置的表现优于朴素模型（1841.155）和 SARIMA 模型（1551.842）。

这是令人印象深刻的，因为该模型直接对原始数据进行操作而不进行缩放或数据静止。

 > 1629.203
 > 1642.219
 > 1472.483
 > 1662.055
 > 1452.480
 > 1465.535
 > 1116.253
 > 1682.667
 > 1642.626
 > 1700.183
 > 1444.481
 > 1673.217
 > 1602.342
 > 1655.895
 > 1319.387
 > 1591.972
 > 1592.574
 > 1361.607
 > 1450.348
 > 1314.529
 > 1549.505
 > 1569.750
 > 1427.897
 > 1478.926
 > 1474.990
 > 1458.993
 > 1643.383
 > 1457.925
 > 1558.934
 > 1708.278
mlp: 1526.688 RMSE (+/- 134.789)

创建 RMSE 分数的方框和胡须图，以总结模型表现的传播。

这有助于理解分数的传播。我们可以看到，尽管平均而言模型的表现令人印象深刻，但传播幅度很大。标准偏差略大于 134 销售额，这意味着更糟糕的案例模型运行，误差与平均误差相差 2 或 3 个标准差可能比朴素模型差。

使用 MLP 模型的一个挑战是如何利用更高的技能并在多次运行中最小化模型的方差。

该问题通常适用于神经网络。您可以使用许多策略，但最简单的方法可能就是在所有可用数据上训练多个最终模型，并在做出预测时在集合中使用它们，例如：预测是 10 到 30 个模型的平均值。

Box and Whisker Plot of Multilayer Perceptron RMSE Forecasting Car Sales

多层感知机 RMSE 预测汽车销售的盒子和晶须图

卷积神经网络模型

卷积神经网络（CNN）是为二维图像数据开发的一种神经网络，尽管它们可用于一维数据，例如文本序列和时间序列。

当对一维数据进行操作时，CNN 读取一系列滞后观察并学习提取与做出预测相关的特征。

我们将定义具有两个卷积层的 CNN，用于从输入序列中提取特征。每个都将具有可配置数量的滤波器和内核大小，并将使用经过整流的线性激活功能。滤波器的数量决定了读取和投影加权输入的并行字段的数量。内核大小定义了网络沿输入序列读取时每个快照内读取的时间步数。

model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, 1)))
model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu'))

在卷积层之后使用最大池化层将加权输入特征提取为最显着的特征，将输入大小减小 1/4。汇总输入在被解释之前被平展为一个长向量，并用于进行一步预测。

model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(1))

CNN 模型期望输入数据采用多个样本的形式，其中每个样本具有多个输入时间步长，与上一节中的 MLP 相同。

一个区别是 CNN 可以在每个时间步骤支持多个特征或类型的观察，其被解释为图像的通道。我们在每个时间步都只有一个特征，因此输入数据所需的三维形状将是[ n_samples，n_input，1 ]。

train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))

下面列出了用于在训练数据集上拟合 CNN 模型的model_fit()函数。

该模型将以下五个配置参数作为列表：

n_input ：用作模型输入的滞后观察数。
n_filters ：并行滤波器的数量。
n_kernel ：每次读取输入序列时考虑的时间步数。
n_epochs ：将模型公开给整个训练数据集的次数。
n_batch ：更新权重的时期内的样本数。

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_filters, n_kernel, n_epochs, n_batch = config
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))
	# define model
	model = Sequential()
	model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, 1)))
	model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu'))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

使用拟合 CNN 模型做出预测非常类似于使用上一节中的拟合 MLP 模型做出预测。

一个区别在于我们要求我们指定在每个时间步骤观察到的特征数量，在这种情况下为 1.因此，当进行单个一步预测时，输入数组的形状必须是：

[1, n_input, 1]

下面的model_predict()函数实现了这种行为。

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _, _ = config
	# prepare data
	x_input = array(history[-n_input:]).reshape((1, n_input, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

进行模型超参数的简单网格搜索，并选择下面的配置。这不是最佳配置，但却是最好的配置。

所选配置如下：

n_input ：36（例如 3 年或 3 * 12）
n_filters ：256
n_kernel ：3
n_epochs ：100
n_batch ：100（例如批量梯度下降）

这可以指定为如下列表：

# define config
config = [36, 256, 3, 100, 100]

将所有这些结合在一起，下面列出了完整的示例。

# evaluate cnn
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from matplotlib import pyplot

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_filters, n_kernel, n_epochs, n_batch = config
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))
	# define model
	model = Sequential()
	model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(n_input, 1)))
	model.add(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu'))
	model.add(MaxPooling1D(pool_size=2))
	model.add(Flatten())
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _, _ = config
	# prepare data
	x_input = array(history[-n_input:]).reshape((1, n_input, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [36, 256, 3, 100, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('cnn', scores)

首先运行该示例，为每次重复的模型评估打印 RMSE。

在运行结束时，我们可以看到模型确实熟练，达到平均 RMSE 1,524.067，这比朴素模型，SARIMA 模型，甚至上一节中的 MLP 模型更好。

这是令人印象深刻的，因为该模型直接对原始数据进行操作而不进行缩放或数据静止。

分数的标准偏差很大，约为 57 个销售额，但却是前一部分 MLP 模型观察到的方差大小的 1/3。我们有信心在坏情况下（3 个标准偏差），模型 RMSE 将保持低于（优于）朴素模型的表现。

> 1551.031
> 1495.743
> 1449.408
> 1526.017
> 1466.118
> 1566.535
> 1649.204
> 1455.782
> 1574.214
> 1541.790
> 1489.140
> 1506.035
> 1513.197
> 1530.714
> 1511.328
> 1471.518
> 1555.596
> 1552.026
> 1531.727
> 1472.978
> 1620.242
> 1424.153
> 1456.393
> 1581.114
> 1539.286
> 1489.795
> 1652.620
> 1537.349
> 1443.777
> 1567.179
cnn: 1524.067 RMSE (+/- 57.148)

创建分数的框和胡须图以帮助理解运行中的错误传播。

我们可以看到，差价看起来似乎偏向于更大的误差值，正如我们所预期的那样，尽管图的上部胡须（在这种情况下，最大误差不是异常值）仍然受限于 1,650 销售的 RMSE 。

Box and Whisker Plot of Convolutional Neural Network RMSE Forecasting Car Sales

卷积神经网络 RMSE 预测汽车销售的盒子和晶须图

循环神经网络模型

循环神经网络或 RNN 是那些类型的神经网络，其使用来自先前步骤的网络输出作为输入以尝试跨序列数据自动学习。

长短期内存或 LSTM 网络是一种 RNN，其实现解决了在序列数据上训练 RNN 导致稳定模型的一般困难。它通过学习控制每个节点内的循环连接的内部门的权重来实现这一点。

尽管针对序列数据进行了开发， LSTM 尚未证明在时间序列预测问题上有效，其中输出是近期观测的函数，例如自动回归类型预测问题，例如汽车销售数据集。

然而，我们可以开发用于自回归问题的 LSTM 模型，并将其用作与其他神经网络模型进行比较的点。

在本节中，我们将探讨 LSTM 模型的三变量，用于单变量时间序列预测;他们是：

LSTM ：LSTM 网络原样。
CNN-LSTM ：学习输入功能的 CNN 网络和解释它们的 LSTM。
ConvLSTM ：CNN 和 LSTM 的组合，其中 LSTM 单元使用 CNN 的卷积过程读取输入数据。

LSTM

LSTM 神经网络可用于单变量时间序列预测。

作为 RNN，它将一次一步地读取输入序列的每个时间步长。 LSTM 具有内部存储器，允许它在读取给定输入序列的步骤时累积内部状态。

在序列结束时，隐藏 LSTM 单元层中的每个节点将输出单个值。该值向量总结了 LSTM 从输入序列中学习或提取的内容。这可以在完成最终预测之前由完全连接的层解释。

# define model
model = Sequential()
model.add(LSTM(n_nodes, activation='relu', input_shape=(n_input, 1)))
model.add(Dense(n_nodes, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')

与 CNN 一样，LSTM 可以在每个时间步骤支持多个变量或功能。由于汽车销售数据集在每个时间步都只有一个值，我们可以将其固定为 1，既可以在 input_shape 参数[ n_input，1 ]中定义网络输入，也可以定义形状输入样本。

train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))

与不一次一步读取序列数据的 MLP 和 CNN 不同，如果数据是静止的，LSTM 确实表现更好。这意味着执行差异操作以消除趋势和季节性结构。

对于汽车销售数据集，我们可以通过执行季节性调整来制作数据信息，即从每个观察值中减去一年前的值。

adjusted = value - value[-12]

这可以针对整个训练数据集系统地执行。这也意味着必须放弃观察的第一年，因为我们没有前一年的数据来区分它们。

下面的 _ 差异（）_ 函数将使提供的数据集与提供的偏移量不同，称为差异顺序，例如差异顺序。 12 前一个月的一年。

# difference dataset
def difference(data, interval):
	return [data[i] - data[i - interval] for i in range(interval, len(data))]

我们可以使差值顺序成为模型的超参数，并且只有在提供非零值的情况下才执行操作。

下面提供了用于拟合 LSTM 模型的model_fit()函数。

该模型需要一个包含五个模型超参数的列表;他们是：

n_input ：用作模型输入的滞后观察数。
n_nodes ：隐藏层中使用的 LSTM 单元数。
n_epochs ：将模型公开给整个训练数据集的次数。
n_batch ：更新权重的时期内的样本数。
n_diff ：差值顺序或 0 如果不使用。

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_nodes, n_epochs, n_batch, n_diff = config
	# prepare data
	if n_diff > 0:
		train = difference(train, n_diff)
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))
	# define model
	model = Sequential()
	model.add(LSTM(n_nodes, activation='relu', input_shape=(n_input, 1)))
	model.add(Dense(n_nodes, activation='relu'))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

使用 LSTM 模型做出预测与使用 CNN 模型做出预测相同。

单个输入必须具有样本，时间步长和特征的三维结构，在这种情况下，我们只有 1 个样本和 1 个特征：[ 1，n_input，1 ]。

如果执行差异操作，我们必须添加模型做出预测后减去的值。在制定用于做出预测的单个输入之前，我们还必须区分历史数据。

下面的model_predict()函数实现了这种行为。

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _, n_diff = config
	# prepare data
	correction = 0.0
	if n_diff > 0:
		correction = history[-n_diff]
		history = difference(history, n_diff)
	x_input = array(history[-n_input:]).reshape((1, n_input, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return correction + yhat[0]

进行模型超参数的简单网格搜索，并选择下面的配置。这不是最佳配置，但却是最好的配置。

所选配置如下：

n_input ：36（即 3 年或 3 * 12）
n_nodes ：50
n_epochs ：100
n_batch ：100（即批量梯度下降）
n_diff ：12（即季节性差异）

这可以指定为一个列表：

# define config
config = [36, 50, 100, 100, 12]

将所有这些结合在一起，下面列出了完整的示例。

# evaluate lstm
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from matplotlib import pyplot

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# difference dataset
def difference(data, interval):
	return [data[i] - data[i - interval] for i in range(interval, len(data))]

# fit a model
def model_fit(train, config):
	# unpack config
	n_input, n_nodes, n_epochs, n_batch, n_diff = config
	# prepare data
	if n_diff > 0:
		train = difference(train, n_diff)
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], train_x.shape[1], 1))
	# define model
	model = Sequential()
	model.add(LSTM(n_nodes, activation='relu', input_shape=(n_input, 1)))
	model.add(Dense(n_nodes, activation='relu'))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_input, _, _, _, n_diff = config
	# prepare data
	correction = 0.0
	if n_diff > 0:
		correction = history[-n_diff]
		history = difference(history, n_diff)
	x_input = array(history[-n_input:]).reshape((1, n_input, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return correction + yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [36, 50, 100, 100, 12]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('lstm', scores)

运行该示例，我们可以看到每次重复评估模型的 RMSE。

在运行结束时，我们可以看到平均 RMSE 约为 2,109，这比朴素模型更差。这表明所选择的模型并不熟练，并且鉴于前面部分中用于查找模型配置的相同资源，它是最好的。

这提供了进一步的证据（虽然证据不足），LSTM，至少单独，可能不适合自回归型序列预测问题。

> 2129.480
> 2169.109
> 2078.290
> 2257.222
> 2014.911
> 2197.283
> 2028.176
> 2110.718
> 2100.388
> 2157.271
> 1940.103
> 2086.588
> 1986.696
> 2168.784
> 2188.813
> 2086.759
> 2128.095
> 2126.467
> 2077.463
> 2057.679
> 2209.818
> 2067.082
> 1983.346
> 2157.749
> 2145.071
> 2266.130
> 2105.043
> 2128.549
> 1952.002
> 2188.287
lstm: 2109.779 RMSE (+/- 81.373)

还创建了一个盒子和胡须图，总结了 RMSE 分数的分布。

甚至模型的基本情况也没有达到朴素模型的表现。

Box and Whisker Plot of Long Short-Term Memory Neural Network RMSE Forecasting Car Sales

长短期记忆神经网络 RMSE 预测汽车销售的盒子和晶须图

CNN LSTM

我们已经看到 CNN 模型能够自动学习和从原始序列数据中提取特征而无需缩放或差分。

我们可以将此功能与 LSTM 结合使用，其中 CNN 模型应用于输入数据的子序列，其结果一起形成可由 LSTM 模型解释的提取特征的时间序列。

用于通过 LSTM 随时间读取多个子序列的 CNN 模型的这种组合称为 CNN-LSTM 模型。

该模型要求每个输入序列，例如， 36 个月，分为多个子序列，每个子序列由 CNN 模型读取，例如， 12 个时间步骤的 3 个子序列。将子序列划分多年可能是有意义的，但这只是一个假设，可以使用其他分裂，例如六个时间步骤的六个子序列。因此，对于子序列的数量和每个子序列参数的步数，使用n_seq和n_steps参数化该分裂。

train_x = train_x.reshape((train_x.shape[0], n_seq, n_steps, 1))

每个样本的滞后观察数量简单（n_seq * n_steps）。

这是一个 4 维输入数组，现在尺寸为：

[samples, subsequences, timesteps, features]

必须对每个输入子序列应用相同的 CNN 模型。

我们可以通过将整个 CNN 模型包装在TimeDistributed层包装器中来实现这一点。

model = Sequential()
model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(None,n_steps,1))))
model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu')))
model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
model.add(TimeDistributed(Flatten()))

CNN 子模型的一个应用程序的输出将是向量。子模型到每个输入子序列的输出将是可由 LSTM 模型解释的时间序列的解释。接下来是完全连接的层，用于解释 LSTM 的结果，最后是输出层，用于进行一步预测。

model.add(LSTM(n_nodes, activation='relu'))
model.add(Dense(n_nodes, activation='relu'))
model.add(Dense(1))

完整的model_fit()功能如下所示。

该模型需要一个包含七个超参数的列表;他们是：

n_seq ：样本中的子序列数。
n_steps ：每个子序列中的时间步数。
n_filters ：并行滤波器的数量。
n_kernel ：每次读取输入序列时考虑的时间步数。
n_nodes ：隐藏层中使用的 LSTM 单元数。
n_epochs ：将模型公开给整个训练数据集的次数。
n_batch ：更新权重的时期内的样本数。

# fit a model
def model_fit(train, config):
	# unpack config
	n_seq, n_steps, n_filters, n_kernel, n_nodes, n_epochs, n_batch = config
	n_input = n_seq * n_steps
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], n_seq, n_steps, 1))
	# define model
	model = Sequential()
	model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(None,n_steps,1))))
	model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu')))
	model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
	model.add(TimeDistributed(Flatten()))
	model.add(LSTM(n_nodes, activation='relu'))
	model.add(Dense(n_nodes, activation='relu'))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

使用拟合模型做出预测与 LSTM 或 CNN 大致相同，尽管添加了将每个样本分成具有给定数量的时间步长的子序列。

# prepare data
x_input = array(history[-n_input:]).reshape((1, n_seq, n_steps, 1))

更新后的model_predict()功能如下所示。

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_seq, n_steps, _, _, _, _, _ = config
	n_input = n_seq * n_steps
	# prepare data
	x_input = array(history[-n_input:]).reshape((1, n_seq, n_steps, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

进行模型超参数的简单网格搜索，并选择下面的配置。这可能不是最佳配置，但它是最好的配置。

n_seq ：3（即 3 年）
n_steps ：12（即 1 个月）
n_filters ：64
n_kernel ：3
n_nodes ：100
n_epochs ：200
n_batch ：100（即批量梯度下降）

我们可以将配置定义为列表;例如：

# define config
config = [3, 12, 64, 3, 100, 200, 100]

下面列出了评估用于预测单变量月度汽车销售的 CNN-LSTM 模型的完整示例。

# evaluate cnn lstm
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import TimeDistributed
from keras.layers import Flatten
from keras.layers.convolutional import Conv1D
from keras.layers.convolutional import MaxPooling1D
from matplotlib import pyplot

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# fit a model
def model_fit(train, config):
	# unpack config
	n_seq, n_steps, n_filters, n_kernel, n_nodes, n_epochs, n_batch = config
	n_input = n_seq * n_steps
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], n_seq, n_steps, 1))
	# define model
	model = Sequential()
	model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu', input_shape=(None,n_steps,1))))
	model.add(TimeDistributed(Conv1D(filters=n_filters, kernel_size=n_kernel, activation='relu')))
	model.add(TimeDistributed(MaxPooling1D(pool_size=2)))
	model.add(TimeDistributed(Flatten()))
	model.add(LSTM(n_nodes, activation='relu'))
	model.add(Dense(n_nodes, activation='relu'))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_seq, n_steps, _, _, _, _, _ = config
	n_input = n_seq * n_steps
	# prepare data
	x_input = array(history[-n_input:]).reshape((1, n_seq, n_steps, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [3, 12, 64, 3, 100, 200, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('cnn-lstm', scores)

运行该示例为每次重复的模型评估打印 RMSE。

最终平均 RMSE 报告在约 1,626 的末尾，低于幼稚模型，但仍高于 SARIMA 模型。该分数的标准偏差也非常大，表明所选配置可能不如独立 CNN 模型稳定。

 > 1543.533
 > 1421.895
 > 1467.927
 > 1441.125
 > 1750.995
 > 1321.498
 > 1571.657
 > 1845.298
 > 1621.589
 > 1425.065
 > 1675.232
 > 1807.288
 > 2922.295
 > 1391.861
 > 1626.655
 > 1633.177
 > 1667.572
 > 1577.285
 > 1590.235
 > 1557.385
 > 1784.982
 > 1664.839
 > 1741.729
 > 1437.992
 > 1772.076
 > 1289.794
 > 1685.976
 > 1498.123
 > 1618.627
 > 1448.361
cnn-lstm: 1626.735 RMSE (+/- 279.850)

还创建了一个盒子和胡须图，总结了 RMSE 分数的分布。

该图显示了一个非常差的表现异常值，仅低于 3,000 个销售额。

Box and Whisker Plot of CNN-LSTM RMSE Forecasting Car Sales

CNN-LSTM RMSE 预测汽车销售的盒子和晶须图

ConvLSTM

作为读取每个 LSTM 单元内的输入序列的一部分，可以执行卷积运算。

这意味着，LSTM 不是一次一步地读取序列，而是使用卷积过程（如 CNN）一次读取观察的块或子序列。

这与使用 LSTM 首先读取提取特征并使用 LSTM 解释结果不同;这是作为 LSTM 的一部分在每个时间步执行 CNN 操作。

这种类型的模型称为卷积 LSTM，简称 ConvLSTM。它在 Keras 中作为 2D 数据称为 ConvLSTM2D 的层提供。我们可以通过假设我们有一行包含多列来配置它以用于 1D 序列数据。

与 CNN-LSTM 一样，输入数据被分成子序列，其中每个子序列具有固定数量的时间步长，尽管我们还必须指定每个子序列中的行数，在这种情况下固定为 1。

train_x = train_x.reshape((train_x.shape[0], n_seq, 1, n_steps, 1))

形状是五维的，尺寸为：

[samples, subsequences, rows, columns, features]

与 CNN 一样，ConvLSTM 层允许我们指定过滤器映射的数量以及读取输入序列时使用的内核的大小。

model.add(ConvLSTM2D(filters=n_filters, kernel_size=(1,n_kernel), activation='relu', input_shape=(n_seq, 1, n_steps, 1)))

层的输出是一系列过滤器映射，在解释之前必须首先将其展平，然后是输出层。

该模型需要一个包含七个超参数的列表，与 CNN-LSTM 相同;他们是：

n_seq ：样本中的子序列数。
n_steps ：每个子序列中的时间步数。
n_filters ：并行滤波器的数量。
n_kernel ：每次读取输入序列时考虑的时间步数。
n_nodes ：隐藏层中使用的 LSTM 单元数。
n_epochs ：将模型公开给整个训练数据集的次数。
n_batch ：更新权重的时期内的样本数。

下面列出了实现所有这些功能的model_fit()函数。

# fit a model
def model_fit(train, config):
	# unpack config
	n_seq, n_steps, n_filters, n_kernel, n_nodes, n_epochs, n_batch = config
	n_input = n_seq * n_steps
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], n_seq, 1, n_steps, 1))
	# define model
	model = Sequential()
	model.add(ConvLSTM2D(filters=n_filters, kernel_size=(1,n_kernel), activation='relu', input_shape=(n_seq, 1, n_steps, 1)))
	model.add(Flatten())
	model.add(Dense(n_nodes, activation='relu'))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

使用拟合模型以与 CNN-LSTM 相同的方式做出预测，尽管我们将附加行维度固定为 1。

# prepare data
x_input = array(history[-n_input:]).reshape((1, n_seq, 1, n_steps, 1))

下面列出了用于进行单个一步预测的model_predict()函数。

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_seq, n_steps, _, _, _, _, _ = config
	n_input = n_seq * n_steps
	# prepare data
	x_input = array(history[-n_input:]).reshape((1, n_seq, 1, n_steps, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

进行模型超参数的简单网格搜索，并选择下面的配置。

这可能不是最佳配置，但却是最好的配置。

n_seq ：3（即 3 年）
n_steps ：12（即 1 个月）
n_filters ：256
n_kernel ：3
n_nodes ：200
n_epochs ：200
n_batch ：100（即批量梯度下降）

我们可以将配置定义为列表;例如：

# define config
config = [3, 12, 256, 3, 200, 200, 100]

我们可以将所有这些结合在一起。下面列出了评估每月汽车销售数据集一步预测的 ConvLSTM 模型的完整代码清单。

# evaluate convlstm
from math import sqrt
from numpy import array
from numpy import mean
from numpy import std
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.metrics import mean_squared_error
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import ConvLSTM2D
from matplotlib import pyplot

# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
	return data[:-n_test], data[-n_test:]

# transform list into supervised learning format
def series_to_supervised(data, n_in=1, n_out=1):
	df = DataFrame(data)
	cols = list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
	# put it all together
	agg = concat(cols, axis=1)
	# drop rows with NaN values
	agg.dropna(inplace=True)
	return agg.values

# root mean squared error or rmse
def measure_rmse(actual, predicted):
	return sqrt(mean_squared_error(actual, predicted))

# difference dataset
def difference(data, interval):
	return [data[i] - data[i - interval] for i in range(interval, len(data))]

# fit a model
def model_fit(train, config):
	# unpack config
	n_seq, n_steps, n_filters, n_kernel, n_nodes, n_epochs, n_batch = config
	n_input = n_seq * n_steps
	# prepare data
	data = series_to_supervised(train, n_in=n_input)
	train_x, train_y = data[:, :-1], data[:, -1]
	train_x = train_x.reshape((train_x.shape[0], n_seq, 1, n_steps, 1))
	# define model
	model = Sequential()
	model.add(ConvLSTM2D(filters=n_filters, kernel_size=(1,n_kernel), activation='relu', input_shape=(n_seq, 1, n_steps, 1)))
	model.add(Flatten())
	model.add(Dense(n_nodes, activation='relu'))
	model.add(Dense(1))
	model.compile(loss='mse', optimizer='adam')
	# fit
	model.fit(train_x, train_y, epochs=n_epochs, batch_size=n_batch, verbose=0)
	return model

# forecast with a pre-fit model
def model_predict(model, history, config):
	# unpack config
	n_seq, n_steps, _, _, _, _, _ = config
	n_input = n_seq * n_steps
	# prepare data
	x_input = array(history[-n_input:]).reshape((1, n_seq, 1, n_steps, 1))
	# forecast
	yhat = model.predict(x_input, verbose=0)
	return yhat[0]

# walk-forward validation for univariate data
def walk_forward_validation(data, n_test, cfg):
	predictions = list()
	# split dataset
	train, test = train_test_split(data, n_test)
	# fit model
	model = model_fit(train, cfg)
	# seed history with training dataset
	history = [x for x in train]
	# step over each time-step in the test set
	for i in range(len(test)):
		# fit model and make forecast for history
		yhat = model_predict(model, history, cfg)
		# store forecast in list of predictions
		predictions.append(yhat)
		# add actual observation to history for the next loop
		history.append(test[i])
	# estimate prediction error
	error = measure_rmse(test, predictions)
	print(' > %.3f' % error)
	return error

# repeat evaluation of a config
def repeat_evaluate(data, config, n_test, n_repeats=30):
	# fit and evaluate the model n times
	scores = [walk_forward_validation(data, n_test, config) for _ in range(n_repeats)]
	return scores

# summarize model performance
def summarize_scores(name, scores):
	# print a summary
	scores_m, score_std = mean(scores), std(scores)
	print('%s: %.3f RMSE (+/- %.3f)' % (name, scores_m, score_std))
	# box and whisker plot
	pyplot.boxplot(scores)
	pyplot.show()

series = read_csv('monthly-car-sales.csv', header=0, index_col=0)
data = series.values
# data split
n_test = 12
# define config
config = [3, 12, 256, 3, 200, 200, 100]
# grid search
scores = repeat_evaluate(data, config, n_test)
# summarize scores
summarize_scores('convlstm', scores)

运行该示例为每次重复的模型评估打印 RMSE。

最终的平均 RMSE 报告在约 1,660 结束时，低于幼稚模型，但仍高于 SARIMA 模型。

这个结果可能与 CNN-LSTM 模型相当。该分数的标准偏差也非常大，表明所选配置可能不如独立 CNN 模型稳定。

 > 1825.246
 > 1862.674
 > 1684.313
 > 1310.448
 > 2109.668
 > 1507.912
 > 1431.118
 > 1442.692
 > 1400.548
 > 1732.381
 > 1523.824
 > 1611.898
 > 1805.970
 > 1616.015
 > 1649.466
 > 1521.884
 > 2025.655
 > 1622.886
 > 2536.448
 > 1526.532
 > 1866.631
 > 1562.625
 > 1491.386
 > 1506.270
 > 1843.981
 > 1653.084
 > 1650.430
 > 1291.353
 > 1558.616
 > 1653.231
convlstm: 1660.840 RMSE (+/- 248.826)

还创建了一个盒子和胡须图，总结了 RMSE 分数的分布。

Box and Whisker Plot of ConvLSTM RMSE Forecasting Car Sales

ConvLSTM RMSE 预测汽车销售的盒子和晶须图

扩展

本节列出了一些扩展您可能希望探索的教程的想法。

数据准备。探索数据准备（例如规范化，标准化和/或差异化）是否可以列出任何模型的表现。
网格搜索超参数。对一个模型实现超参数的网格搜索，以查看是否可以进一步提升表现。
学习曲线诊断。创建一个模型的单一拟合并查看数据集的训练和验证分割的学习曲线，然后使用学习曲线的诊断来进一步调整模型超参数以提高模型表现。
历史规模。探索一种模型的不同数量的历史数据（滞后输入），以了解您是否可以进一步提高模型表现
减少最终模型的差异。探索一种或多种策略来减少其中一种神经网络模型的方差。
前进期间更新。探索作为前进验证的一部分重新拟合或更新神经网络模型是否可以进一步提高模型表现。
更多参数化。探索为一个模型添加更多模型参数化，例如使用其他层。

如果你探索任何这些扩展，我很想知道。

进一步阅读

如果您希望深入了解，本节将提供有关该主题的更多资源。

摘要

在本教程中，您了解了如何开发一套用于单变量时间序列预测的深度学习模型。

具体来说，你学到了：

如何使用前向验证开发一个强大的测试工具来评估神经网络模型的表现。
如何开发和评估简单多层感知机和卷积神经网络的时间序列预测。
如何开发和评估 LSTM，CNN-LSTM 和 ConvLSTM 神经网络模型用于时间序列预测。

你有任何问题吗？在下面的评论中提出您的问题，我会尽力回答。