Time Series Forecasting by using Seasonal Autoregressive Integrated Moving Average: Subset, Multiplicative or Additive Model

Problem statement: Most of Seasonal Autoregressive Integrated Moving Average (SARIMA) models that used for forecasting seasonal time series are multiplicative SARIMA models. These models assume that there is a significant parameter as a result of multiplication between nonseasonal and seasonal parameters without testing by certain statistical test. Moreover, most popular statistical software such as MINITAB and SPSS only has facility to fit a multiplicative model. The aim of this research is to propose a new procedure for indentifying the most appropriate order of SARIMA model whether it involves subset, multiplicative or additive order. In particular, the study examined whether a multiplicative parameter existed in the SARIMA model. Approach: Theoretical derivation about Autocorrelation (ACF) and Partial Autocorrelation (PACF) functions from subset, multiplicative and additive SARIMA model was firstly discussed and then R program was used to create the graphics of these theoretical ACF and PACF. Then, two monthly datasets were used as case studies, i.e. the international airline passenger data and series about the number of tourist arrivals to Bali, Indonesia. The model identification step to determine the order of ARIMA model was done by using MINITAB program and the model estimation step used SAS program to test whether the model consisted of subset, multiplicative or additive order. Results: The theoretical ACF and PACF showed that subset, multiplicative and additive SARIMA models have different patterns, especially at the lag as a result of multiplication between non-seasonal and seasonal lags. Modeling of the airline data yielded a subset SARIMA model as the best model, whereas an additive SARIMA model is the best model for forecasting the number of tourist arrivals to Bali. Conclusion: Both of case studies showed that a multiplicative SARIMA model was not the best model for forecasting these data. The comparison evaluation showed that subset and additive SARIMA models gave more accurate forecasted values at out-sample datasets than multiplicative SARIMA model for airline and tourist arrivals datasets respectively. This study is valuable contribution to the Box-Jenkins procedure particularly at the model identification and estimation steps in SARIMA model. Further work involving multiple seasonal ARIMA models, such as short term load data forecasting in certain countries, may provide further insights regarding the subset, multiplicative or additive orders.


INTRODUCTION
ARIMA is the method first introduced by Box and Jenkins (1976) and until now become the most popular models for forecasting univariate time series data. This model has been originated from the Autoregressive model (AR), the Moving Average model (MA) and the combination of the AR and MA, the ARMA models. In the case where seasonal components are included in this model, then the model is called as the SARIMA model. Box-Jenkins procedure that contains three main stages to build an ARIMA model, i.e. model identification, model estimation and model checking, is usually used for determining the best ARIMA model for certain time series data.
The generalized form of SARIMA model can be written as (Box et al., 2008;Cryer and Chan, 2008): Where: Where: B = The backward shift operator d and D = The non-seasonal and seasonal order of differences, respectively and usually abbreviated as SARIMA (p,d,q)(P,D,Q) S When there is no seasonal effect, a SARIMA model reduces to pure ARIMA (p,d,q) and when the time series dataset is stationary a pure ARIMA reduces to ARMA(p,q).
To date, SARIMA model has been used in various fields of forecasting. For example, Haswell et al. (2003) applied this model for forecasting soil dryness index in the southwest of Western Australia; Hu et al. (2004) for prediction of Ross River virus disease in Brisbane et al. (2005), Modarres (2007), also Abebe and Foerch (2008) for drought forecasting; Ediger et al. (2006), also Ediger and Akar (2007) for forecasting production of fossil fuel sources in Turkey; Briet et al. (2008) Ibrahim et al. (2009) for air pollutants prediction in several area of Malaysia. More recently, Pozza et al. (2010) applied SARIMA to analysis of PM2.5 and PM10-2.5 mass concentration in the city of Carlos et al. (2010) used for forecasting of boron in Western Turkey and Wagner (2010) for forecasting daily demand in cash supply chains. Ong et al. (2005) stated that although many previous papers have concentrated on model estimation, model identification is actually the most crucial stage in building ARIMA models, because false model identification will cause the wrong stage of model estimation and increase the cost of re-identification. In particular of SARIMA models, most of previous papers usually used directly the multiplicative model without testing whether the multiplicative parameter was significant. It means that the multiplicative SARIMA models assume that there is a significant parameter as a result of multiplicative between non-seasonal and seasonal parameters. Moreover, most popular statistical software such as MINITAB and SPSS only has facility to fit a multiplicative model. The purpose of this research is to propose a new procedure for indentifying and then testing the most appropriate order of SARIMA model whether it involves subset, multiplicative or additive order. In particular, the study will examine whether a multiplicative parameter existed in the SARIMA model. Additionally, the present study updates the Box-Jenkins procedure particularly for seasonal model.

Data sources:
There are two monthly datasets that used as case studies, i.e., the international airline passenger data and series about the number of tourist arrivals to Bali, Indonesia, from 1989-1997. The first series was well known as Airline Data, listed in Box et al. (2008) and many researchers already analyzed these data, see for example Faraway and Chatfield (1998), also Suhartono and Subanar (2006). These data also have become one of two data to be competed in Neural Network Forecasting Competition in 2005 (see www.neural-forecasting.com). In all these researches, the multiplicative SARIMA model was used as the best model without testing first whether the multiplicative parameter was significant.
The second monthly data about the number of tourist arrivals to Bali was obtained from the Indonesia Central Bureau of Statistics (see www.bps.go.id). Bali is the main destination of the international tourists who visit Indonesia and these data also have seasonal pattern. Ismail et al. (2009) analyzed these tourism data using intervention analysis. For both of datasets, the last 12 observations are reserved as the test for forecasting evaluation and comparison (out-sample dataset or testing data).

Subset SARIMA:
The generalized form of ARIMA (0,0,[1,12,13]) model, then known as subset SARIMA, can be written as: where, θ 1 , θ 12 and θ 13 denotes the parameters of MA orders. By using mathematical statistics, it could be shown that the ACF of this model is as follows: Multiplicative SARIMA: The generalized form of ARIMA(0,0,1)(0,0,1) 12 model, known as multiplicative SARIMA, can be written as: where, θ 1 and θ 12 denotes the parameters of nonseasonal and seasonal MA order, respectively. This model is the same with subset SARIMA model in Eq. 2 when θ 13 = -θ 1 θ 12 . Thus, it could be concluded that multiplicative model is part of subset model. Hence, it could be shown that the ACF of this model is as follows: Equation 5 shows that the ACF values at lag 11 and 13 are equal.
Additive SARIMA: The generalized form of ARIMA (0,0,[1,12]) model, then known as additive SARIMA, can be written as: where, θ 1 and θ 12 denotes the parameters of nonseasonal and seasonal MA order, respectively. This model is the same with subset SARIMA model in Eq. 2 when θ 13 = 0. Thus, it could be concluded that additive model is also part of subset model. Moreover, this additive model in Eq. 6 could also be seen as subset ARIMA model with lower order than model in Eq. 2. It could be shown that the ACF of this model is as follows: Equation 7 shows that the main difference between additive and other models (subset or multiplicative) is the ACF value at lag 13 are equal zero. R Program to simulate the theoretical ACF and PACF: To illustrate the difference between the theoretical ACF and PACF of subset, multiplicative and additive models, we use facility in R program. The following code is the program in R for generating the theoretical ACF and PACF of these three kinds of models.
The results of the theoretical ACF and PACF for these three models are shown at Fig. 1-3.  Most of previous researches usually used directly a multiplicative SARIMA model when the ACF and PACF indicated that the data contained both nonseasonal and seasonal orders. In this research, we proposed a more precise model identification step particularly at the lags as implication of multiplicative orders. As an example, for monthly data that indicated consisting MA orders both in non-seasonal (ACF at lag 1) and seasonal (ACF at lag 12), we must check first whether ACF at lag 13 is equal to zero (indicate additive model) or not (indicate multiplicative if tend to equal with ACF at lag 11, or subset model if difference from ACF at lag 11), as illustrated previously at Fig. 1-3.
Then, we validate the significance of multiplicative parameter at the model estimation step. In this step, we suggest to use SAS program that contains facility to fit subset, multiplicative and additive SARIMA models. In particular, the new stages that we propose in model estimation step are as follows: • Fit the subset SARIMA model first and test whether the multiplicative parameter is significant • If the multiplicative parameter is significant, then continue to test whether this coefficient is the same with the multiplication between non-seasonal and seasonal coefficients. If YES, it means that the appropriate model is multiplicative SARIMA. If NOT, it means that the subset SARIMA is the appropriate model for the time series data • If the multiplicative parameter is insignificant, it means that the appropriate model is additive SARIMA

Model identification:
The time series plots of the both two datasets are shown in Fig. 5. The plots show that both two data have seasonal and trend patterns with increasingly variation of variance. It means that both two data not yet satisfy the stationary condition, both in mean and variance. MINITAB program is used in this identification step. By using logarithm transformation and difference both non-seasonal (d = 1) and seasonal (D = 1, S = 12), then both two data become stationary series and the ACF and PACF are shown in Fig. 6-7. Based on the graphs at Fig. 6, it could be seen that even though the estimated values are not significant, the ACF at lag 11 and 13 tend to have difference values, i.e. ACF at lag 13 looks larger than at lag 11. It indicates that no strong evidence to identify multiplicative model. Furthermore, the graphs at Fig. 7 show that the estimated value of ACF at lag 11 is significant, whereas at lag 13 is not significant. Again, it indicates that no strong evidence to identify multiplicative model as illustrated on the theoretical ACF and PACF Fig. 2.
In general, ACF and PACF for these two series suggest that both of non-seasonal and seasonal MA orders exist in the tentative SARIMA model. Then, the important question is whether the models are subset, multiplicative or additive ones.
Model estimation: Based on the proposed stages at model estimation step, the subset ARIMA is fitted first to know whether the parameter of multiplicative effect is significant. SAS is used in this step and the following code is an example of the program for estimating subset, multiplicative and additive SARIMA models. The results of subset ARIMA are shown at Table 1 and 2 for the airline and tourist arrivals data, respectively.
Based on the output SAS at Table 1 for case of the airline data, it could be concluded that the estimated parameter θ 13 is significant or difference with zero. The estimated value of parameter θ 13 shows that it's parameter not satisfy θ 13 = -θ 1 .θ 12 or θ 13 <-θ1.θ 12 . If we use confidence interval 95% for θ 13 , the value of θ 13 will be between: 0.27973 (1.96 0.09417) − ± ×   Fig. 5: Monthly data about the number of international airline passenger (airline data) and the number of tourist to Bali or inside the interval of (-0.46430,-0.08516) and we could conclude that. Hence, we could continue to fit the multiplicative model and the results are presented at Table 3. Otherwise, the estimation output at Table 2 for tourist arrivals case shows that the estimated parameter θ 13 is insignificant or statistically not difference with zero. It means that there is no evidence to use multiplicative SARIMA model for this case. So, the model estimation continues to fit the additive model and the results are shown at Table 4.

DISCUSSION
The model identification step for SARIMA models showed that there were different of ACF and PACF between subset, multiplicative and additive models, particularly at lag order as a multiplication between non-seasonal and seasonal orders. Theoretical results illustrated that evaluation ACF and PACF in this lag of multiplication order is the most important stage of model identification step for seasonal model.
Moreover, the results of performance evaluation show that multiplicative model yields poorer forecast accuracy than subset and additive models for the airline data and tourist arrivals data respectively. In particular, the results of the first case study about airline data show that subset ARIMA model yields more accurate forecasted values (less MSE) than multiplicative model both at in-and out-sample dataset. It is not surprisingly if the AIC shows that multiplicative is better than subset models. It is caused the calculation of number of parameters yields 2 for multiplicative and 3 for subset, even though in general the models are the same (Eq. 2 and 4). Thus, the selection of the best SARIMA model should be used cross validation principle that highlighting on the results at out-sample dataset than used criteria as AIC which put emphasis on in-sample performance.
Otherwise, the results of performance evaluation at the second case study about tourist arrivals data show that multiplicative model yields better forecast at insample dataset, or less MSE and AIC, than additive SARIMA model even though the estimated parameter of multiplicative effect was not significant. However, the evaluation at out-sample dataset shows that additive model produces more accurate forecasted values than multiplicative SARIMA model. Again, this empirical evidence shows that more careful identification is needed to determine the order of SARIMA model and not directly choose multiplicative model.

CONCLUSION
This study has discussed about three kinds of seasonal ARIMA models, namely subset, multiplicative and additive SARIMA models, including the theoretical of ACF and PACF, how to simulate these values by using R program and how to test by using SAS program at model estimation of Box-Jenkins procedure. Most of the previous researches just used directly a multiplicative SARIMA model without identifying intensely ACF or PACF at lag as multiplication order and testing whether the multiplicative parameter was significant.
In general, these two empirical results show that the determination of orders in SARIMA model must consider about subset, multiplicative or additive orders. Moreover, the understanding pattern of theoretical ACF and PACF for these three kinds of model orders are very important to determine an appropriate tentative model for certain seasonal time series data. In addition, the results also illustrated that the multiplicative SARIMA model yielded less accurate forecasted values than subset or additive models for airline data and tourist arrivals data respectively.
Hence, the proposed stages in model estimation step to test the significance of the multiplicative parameter should be used concurrently with model identification of ACF and PACF for seasonal time series data. It means that we must revise the Box-Jenkins procedure for seasonal ARIMA model, i.e. to not directly use multiplicative SARIMA model particularly at model identification and estimation steps. It also suggests the forecasters to use the statistical program that has facility for testing the multiplicative parameter model, such as SAS program. Moreover, further research involving multiple seasonal ARIMA models, such as short term load data forecasting in certain countries that recently become one of the central topics in forecasting, may provide further insights regarding the subset, multiplicative or additive orders.