时间序列预测（Python）：ARIMA、LSTM、 Prophet

lwnylslwnyls

2019-06-24

关注关注

本文主要对时间序列数据进行预测。我们将用Python构建三个不同的模型，并检查它们的结果。我们将使用的模型有ARIMA(差分整合移动平均自回归模型)、LSTM(长短期记忆神经网络)和Facebook Prophet。

ARIMA

ARIMA是一个用于预测未来趋势的时间序列数据模型。模型是回归分析的一种形式。

AR（Autoregression）：显示变量变化的模型，该变量在其自身的滞后/先验值上回归。
I（Integrated）：差分时间序列的原始观测数据,使其平稳
MA（Moving average）：观察值与移动平均模型的残差之间的依赖关系

对于ARIMA模型，标准的表示法是带有p、d和q的ARIMA，其中整数值替代参数来表示所使用的ARIMA模型的类型。

p：自回归阶数
d：差分次数
q：移动平均阶数

LSTM神经网络

LSTM代表长短期记忆。它是一种扩展了循环神经网络记忆的模型或体系结构。通常，循环神经网络具有“短期记忆”，因为它们使用在当前神经网络中使用的持久先前信息。实质上，先前的信息用于当前任务。这意味着我们没有可用于神经节点的所有先前信息的列表。LSTM将长期记忆引入循环神经网络。它缓解了梯度消失问题，也就是神经网络停止学习的地方，因为给定神经网络内各种权重的更新变得越来越小。它通过使用一系列“门（Gate）”来实现这一点。它们包含在通过层连接的内存块中，如下所示：

时间序列预测（Python）：ARIMA、LSTM、 Prophet

LSTM工作单元内有三种类型的门：Input Gate、Output Gate、Forget Gate。每个门就像一个控制读/写的开关，从而将长期记忆功能整合到模型中。

Prophet

Prophet是一种基于加法模型预测时间序列数据的过程，其中非线性趋势与年、周、日季节性以及假日效应相吻合。它最适用于具有强烈季节效应和几个季节的历史数据的时间序列。Prophet对缺失的数据和趋势的变化是健壮的，通常能很好地处理异常值。

Python实现

读取机器学习数据集

机器学习数据集地址：https://www.kaggle.com/shenba/time-series-datasets

import numpy as np
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf 
from statsmodels.tsa.seasonal import seasonal_decompose 
from pmdarima import auto_arima 
from sklearn.metrics import mean_squared_error
from statsmodels.tools.eval_measures import rmse
import warnings
warnings.filterwarnings("ignore")
df = pd.read_csv('monthly-beer-production-in-austr.csv')
df.head()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

df.info()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

df.Month = pd.to_datetime(df.Month)
df = df.set_index("Month")
df.head()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

df.index.freq = 'MS'
ax = df['Monthly beer production'].plot(figsize = (16,5), title = "Monthly Beer Production")
ax.set(xlabel='Dates', ylabel='Total Production');

时间序列预测（Python）：ARIMA、LSTM、 Prophet

当我们看情节时，我们可以看到数据存在季节性。这就是为什么我们将使用SARIMA（季节性ARIMA）而不是ARIMA。

SARIMA是ARIMA的扩展，它显式地支持带有季节性成分的单变量时间序列数据。它添加了三个新的超参数来指定序列的季节成分的自回归(AR)、差分(I)和移动平均(MA)，以及季节性周期的附加参数。

有四个不属于ARIMA的季节性元素必须配置; 他们是：

P：季节性自回归阶数。
D：季节性差分次数。
Q：季节性移动平均阶数。
m：单个季节性时段的时间步长。

a = seasonal_decompose(df["Monthly beer production"], model = "add")
a.plot();

时间序列预测（Python）：ARIMA、LSTM、 Prophet

import matplotlib.pyplot as plt
plt.figure(figsize = (16,7))
a.seasonal.plot();

时间序列预测（Python）：ARIMA、LSTM、 Prophet

ARIMA预测

让我们运行aauto_arima()函数来获得最佳的p，d，q，P，D，Q值

auto_arima(df['Monthly beer production'], seasonal=True, m=12,max_p=7, max_d=5,max_q=7, max_P=4, max_D=4,max_Q=4).summary()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

我们可以看到auto_arima()选择的最佳arima模型是SARIMAX(2,1,1)x(4,0,3,12)

让我们将机器学习数据集分成训练和测试集

train_data = df[:len(df)-12]
test_data = df[len(df)-12:]
arima_model = SARIMAX(train_data['Monthly beer production'], order = (2,1,1), seasonal_order = (4,0,3,12))
arima_result = arima_model.fit()
arima_result.summary()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

arima_pred = arima_result.predict(start = len(train_data), end = len(df)-1, typ="levels").rename("ARIMA Predictions")
arima_pred

时间序列预测（Python）：ARIMA、LSTM、 Prophet

test_data['Monthly beer production'].plot(figsize = (16,5), legend=True)
arima_pred.plot(legend = True);

时间序列预测（Python）：ARIMA、LSTM、 Prophet

arima_rmse_error = rmse(test_data['Monthly beer production'], arima_pred)
arima_mse_error = arima_rmse_error**2
mean_value = df['Monthly beer production'].mean()
print(f'MSE Error: {arima_mse_error}
RMSE Error: {arima_rmse_error}
Mean: {mean_value}')

时间序列预测（Python）：ARIMA、LSTM、 Prophet

test_data['ARIMA_Predictions'] = arima_pred

LSTM预测

首先，我们将使用MinMaxScaler缩放我们的训练和测试数据

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaler.fit(train_data)
scaled_train_data = scaler.transform(train_data)
scaled_test_data = scaler.transform(test_data)

时间序列预测（Python）：ARIMA、LSTM、 Prophet

在创建LSTM模型之前，我们应该创建一个时间序列生成器对象。

from keras.preprocessing.sequence import TimeseriesGenerator
n_input = 12
n_features= 1
generator = TimeseriesGenerator(scaled_train_data, scaled_train_data, length=n_input, batch_size=1)
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
lstm_model = Sequential()
lstm_model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
lstm_model.add(Dense(1))
lstm_model.compile(optimizer='adam', loss='mse')
lstm_model.summary()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

lstm_model.fit_generator(generator,epochs=20)

时间序列预测（Python）：ARIMA、LSTM、 Prophet

losses_lstm = lstm_model.history.history['loss']
plt.figure(figsize=(12,4))
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.xticks(np.arange(0,21,1))
plt.plot(range(len(losses_lstm)),losses_lstm);

时间序列预测（Python）：ARIMA、LSTM、 Prophet

lstm_predictions_scaled = list()
batch = scaled_train_data[-n_input:]
current_batch = batch.reshape((1, n_input, n_features))
for i in range(len(test_data)): 
 lstm_pred = lstm_model.predict(current_batch)[0]
 lstm_predictions_scaled.append(lstm_pred) 
 current_batch = np.append(current_batch[:,1:,:],[[lstm_pred]],axis=1)

时间序列预测（Python）：ARIMA、LSTM、 Prophet

我们对数据进行了缩放，这就是为什么我们要对它进行逆运算才能看到真实的预测。

lstm_predictions_scaled

时间序列预测（Python）：ARIMA、LSTM、 Prophet

lstm_predictions = scaler.inverse_transform(lstm_predictions_scaled)
lstm_predictions

时间序列预测（Python）：ARIMA、LSTM、 Prophet

test_data['LSTM_Predictions'] = lstm_predictions
test_data

时间序列预测（Python）：ARIMA、LSTM、 Prophet

test_data['Monthly beer production'].plot(figsize = (16,5), legend=True)
test_data['LSTM_Predictions'].plot(legend = True);

时间序列预测（Python）：ARIMA、LSTM、 Prophet

lstm_rmse_error = rmse(test_data['Monthly beer production'], test_data["LSTM_Predictions"])
lstm_mse_error = lstm_rmse_error**2
mean_value = df['Monthly beer production'].mean()
print(f'MSE Error: {lstm_mse_error}
RMSE Error: {lstm_rmse_error}
Mean: {mean_value}')

时间序列预测（Python）：ARIMA、LSTM、 Prophet

Prophet预测

df.info()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

df_pr = df.copy()
df_pr = df.reset_index()
df_pr.columns = ['ds','y'] # To use prophet column names should be like that
train_data_pr = df_pr.iloc[:len(df)-12]
test_data_pr = df_pr.iloc[len(df)-12:]
from fbprophet import Prophet
m = Prophet()
m.fit(train_data_pr)
future = m.make_future_dataframe(periods=12,freq='MS')
prophet_pred = m.predict(future)
prophet_pred.tail()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

prophet_pred = pd.DataFrame({"Date" : prophet_pred[-12:]['ds'], "Pred" : prophet_pred[-12:]["yhat"]})
prophet_pred = prophet_pred.set_index("Date")
prophet_pred.index.freq = "MS"
prophet_pred

时间序列预测（Python）：ARIMA、LSTM、 Prophet

test_data["Prophet_Predictions"] = prophet_pred['Pred'].values
import seaborn as sns
plt.figure(figsize=(16,5))
ax = sns.lineplot(x= test_data.index, y=test_data["Monthly beer production"])
sns.lineplot(x=test_data.index, y = test_data["Prophet_Predictions"]);

时间序列预测（Python）：ARIMA、LSTM、 Prophet

prophet_rmse_error = rmse(test_data['Monthly beer production'], test_data["Prophet_Predictions"])
prophet_mse_error = prophet_rmse_error**2
mean_value = df['Monthly beer production'].mean()
print(f'MSE Error: {prophet_mse_error}
RMSE Error: {prophet_rmse_error}
Mean: {mean_value}')

时间序列预测（Python）：ARIMA、LSTM、 Prophet

rmse_errors = [arima_rmse_error, lstm_rmse_error, prophet_rmse_error]
mse_errors = [arima_mse_error, lstm_mse_error, prophet_mse_error]
errors = pd.DataFrame({"Models" : ["ARIMA", "LSTM", "Prophet"],"RMSE Errors" : rmse_errors, "MSE Errors" : mse_errors})
plt.figure(figsize=(16,9))
plt.plot_date(test_data.index, test_data["Monthly beer production"], linestyle="-")
plt.plot_date(test_data.index, test_data["ARIMA_Predictions"], linestyle="-.")
plt.plot_date(test_data.index, test_data["LSTM_Predictions"], linestyle="--")
plt.plot_date(test_data.index, test_data["Prophet_Predictions"], linestyle=":")
plt.legend()
plt.show()

时间序列预测（Python）：ARIMA、LSTM、 Prophet

print(f"Mean: {test_data['Monthly beer production'].mean()}")
errors

时间序列预测（Python）：ARIMA、LSTM、 Prophet

test_data

时间序列预测（Python）：ARIMA、LSTM、 Prophet

这只是最基本的模型预测，您可以根据您的数据和业务知识通过调整来改进这些模型。

时间序列 arima lstm 神经网络 python

安科网

时间序列预测（Python）：ARIMA、LSTM、 Prophet

lwnylslwnyls

ARIMA

LSTM神经网络

Prophet

Python实现

lwnylslwnyls

相关推荐

教你使用简单神经网络和LSTM进行时间序列预测（附代码）

如何在Python中用LSTM网络进行时间序列预测

如何在时间序列预测中使用LSTM网络中的时间步长

如何用 Keras 调试LSTM超参数解决时间序列预测问题

使用tensorflow的lstm网络进行时间序列预测

数据分析三剑客之Pandas时间序列

时间序列数据库(TSDB)初识与选择(InfluxDB、OpenTSDB、Druid、Elasticsearch对比)

MODIS系列之NDVI（MOD13Q1）七：时间序列S-G滤波之Python

kaggle比赛实践M5-数据集介绍

HBase模式案例日志数据和时间序列数据

时间序列数据库(TSDB)初识与选择(InfluxDB、OpenTSDB、Druid、Elastic

时间序列数据库(TSDB)初识与选择

用Python进行时间序列预测的7种方法

Python——Pandas 时间序列数据处理

How Hulu Uses InfluxDB and Kafka to Scale to Over 1 Million Metrics a Second

【数据分析&数据挖掘】pandas时间数据

pandas 之时间序列索引

Pandas时序数据处理入门

AWS发布时间序列预测云服务，无机器学习基础也能上手

Elasticsearch 当数据库使：Join

lwnylslwnyls