季节性ARIMA模型进行时间序列预测

2023-12-18 06:32:11

首先对于模型: SARIMA(p,d,q)x(P,D,Q)。

参数的选择的注意事项如下:

where, P, D and Q are SAR, order of seasonal differencing and SMA terms respectively and ‘x’ is the frequency of the time series. If the model has well defined seasonal patterns, then enforce D=1 for a given frequency ‘x’.

We should set the model parameters such that D never exceeds one. And the total differencing ‘d + D’ never exceeds 2. We should try to keep only either SAR or SMA terms if the model has seasonal components.

代码如下:

# 导入必要的包
import matplotlib.pyplot as plt
import pandas as pd
# 数据读入
time_series_table=pd.read_csv('new_merged.csv',index_col=0,parse_dates=True)
time_series_table=time_series_table.sort_index()
print(time_series_table)

在这里插入图片描述

from statsmodels.tsa.statespace.sarimax import SARIMAX

# 季节模型的拟合
best_model = SARIMAX(time_series_table['33_1002'][:-432], order=(0, 0, 2), seasonal_order=(0, 1, 2, 144)).fit(dis=-1)
best_model.summary()
%%time

# 模型拟合情况检查
best_model.plot_diagnostics(figsize=(15,12));

在这里插入图片描述

这里的检查主要是考虑了季节拟合之后的残差的检查。
下面检查是否需要季节差分。

# Plot
data=time_series_table["21_1002"]

fig, axes = plt.subplots(2, 1, figsize=(10,5), dpi=100, sharex=True)

# Usual Differencing
axes[0].plot(data[:], label='Original Series')
axes[0].plot(data[:].diff(1), label='Usual Differencing')
axes[0].set_title('Usual Differencing')
axes[0].legend(loc='upper left', fontsize=10)


# Seasonal Differencing
axes[1].plot(data[:], label='Original Series')
axes[1].plot(data[:].diff(144), label='Seasonal Differencing', color='green')
axes[1].set_title('Seasonal Differencing')
plt.legend(loc='upper left', fontsize=10)
plt.suptitle('Drug Sales - Time Series Dataset', fontsize=16)
plt.show()

在这里插入图片描述
差分之后的绿色线条显示序列比较平稳。

target="33_1002"
# 定义测试集和训练集如何分割
train_start_dt = '2023-05-14 00:00:00'
test_start_dt = '2023-05-19 00:00:00'
train = time_series_table.copy()[(time_series_table.index >= train_start_dt) & (time_series_table.index < test_start_dt)][[target]]
test = time_series_table.copy()[time_series_table.index >= test_start_dt][[target]]
pred = best_model.predict(start=test.index[0], end=test.index[-1])

利用mape指标对拟合的效果进行评估。

import numpy as np

# 分析 mape 的函数如下
def analysis(predict,test):

    def mape(predictions, actuals):
        """Mean absolute percentage error"""
        predictions = np.array(predictions)
        actuals = np.array(actuals)
        return (np.absolute(predictions - actuals) / actuals).mean()

    mape1=  mape(predict, test)
    mape2=  mape(test.shift(1).dropna(), test[1:])

    print('predict-actual MAPE: ', mape1 * 100, '%')
    print('shifted1-actual MAPE: ', mape2 * 100, '%')
    print('mape improvement',(mape1-mape2)/mape2)

Github上一个很好的例子:
github.com/marcopeix/time-series-analysis/blob/master/Advanced%20modelling/SARIMA.ipynb


总结:季节 SARIMAX 不适合把周期指定的很大,这里指定为144,拟合的速度非常慢, 并且吃内存。

best_model = SARIMAX(time_series_table['33_1002'][:-432], order=(0, 0, 2), seasonal_order=(0, 1, 2, 144)).fit(dis=-1)

文章来源:https://blog.csdn.net/weixin_37726222/article/details/135048514
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。