Seasonal Time Series Data Forecasting by Using Neural Networks Multiscale Autoregressive Model

,


INTRODUCTION
Recently, neural network has been proposed in many researches about different kinds of statistical analysis. There are many types of neural network applied to solve many problems. For examples, Feedforward Neural Network (FFNN) is applied in electricity demand forecasting Taylor et al. (2006), General Regression Neural Network (GRNN) is used in exchange rates forecasting and Recurrent Neural Network (RNN) has been applied in detecting changes in autocorellated process for quality monitoring. Different from those previous researches, here, the predictors or the inputs are not the lags of the variables or the data variables, but they are the coefficients from wavelet transformation.
A new development related with wavelet transformation application for time series analysis is proposed. As an overview this can be seen in Nason and von Sachs (1999). At the beginning, most wavelet research for time series analysis is focused on periodogram or scalogram analysis of periodicities and cycles evaluation (Priestley, 1996;Morettin, 1997;Gao, 1997;Percival and Walden, 2000). Bjorn (1995); Soltani et al. (2000) and Renaud et al. (2003) are some first researcher groups discussing wavelet for time series prediction based on autoregressive model. In this case, wavelet transformation gives good decomposition from a signal or time series, so that the structure can be evaluated by parametric or nonparametric models.
WNN is a neural network with wavelet function used in processing in transfer function. In time series forecasting cases, input used in WNN is wavelet coefficients in certain time and resolution. Recently, there are some articles about WNN for time series forecasting and filtering, such as Bashir and El-Hawary (2000); Renaud et al. (2003); Murtagh et al. (2004) and Chen et al. (2006).
Wavelet transformation that is mostly used for time series forecasting is Maximal Overlap Discrete Wavelet Transform (MODWT). The use of MODWT is to solve the limitation of Discrete Wavelet Transform (DWT), that requires N = 2 J where J is positive integer. In practice, time series data rarely fulfill those numbers, which are two powered with a positive integer.
Some present researches related with WNN for time series forecasting usually focus on how to determine the best WNN model which is appropriate for time series forecasting. The aim of this research is to develop an accurate procedure for WNN modeling of seasonal time series data and to compare the forecast accuracy with Multiscale Autoregressive (MAR) and ARIMA models.

Data:
The number of tourist arrivals to Bali through Ngurah Rai airport, from January 1986 until April 2008, is used as a case study. The in-samples are first 216 observations and the last 16 observations are used as the out-sample dataset. The analysis starts by applying MODWT decomposition to the data. Based on the scale and wavelet coefficients pattern, then the proposed of WNN model building procedure for time series data forecasting will be developed. This procedure is the improvement of general FFNN model building procedure for time series data forecasting. In this new procedure, the determination of the inputs in WNN model is done by using wavelet coefficient lags and the boundary effects. Whereas, the selection of the best WNN model is done by employing a combination between the inferential statistics for the addition contribution in forward scheme for selecting the optimum number of neurons in the hidden layer and Wald test in backward scheme for determining the optimum input unit.
Wavelets and prediction: Wavelet means small wave, whereas by contrast, sinus and cosines are big waves (Percival and Walden, 2000). A function (.) ψ is defined as wavelet if it satisfies: Commonly, wavelets are functions that have characteristic as in Eq. 1. If it is integrated on ( , ) −∞ ∞ the result is zero and the integration of the quadrate of function (.) ψ equals to 1 as written in Eq. 2. There are two functions in wavelet transform, i.e., scale function (father wavelet) and mother wavelet. These two functions give a function family that can be used for reconstructing a signal. Some wavelet families are Haar wavelet (the oldest and simplest wavelet), Meyer wavelet, Daubechies wavelet, Mexican hat wavelet, Coiflet wavelet and last assymetric wavelet (Daubechies, 1992).

Scale and wavelet equations:
Scale equation or dilate equation shows scale function φ experiencing contraction and translation (Debnath, 2001), which is written as: is scale function φ(t) experiencing contraction or translation in time axis with l steps with scale filter coefficient g l . Wavelet function ψ is defined as: Coefficient g 1 must satisfy conditions: The relationship between coefficients h l and g l is , or it can be written as

Maximal Overlap Discrete Wavelet Transform (MODWT): One of modifications from Discrete Wavelet Transform (DWT) is Maximal Overlap
Discrete Wavelet Transform (MODWT). MODWT has been discussed in wavelet literatures with some names, such as undecimated-Discrete Wavelet Transform (DWT), Shift invariant DWT, wavelet frames, translation DWT, non decimated DWT. Percival and Walden (2000) stated that essentially those names are the same with MODWT which have connotation as 'mod DWT' or modified DWT. This is the reason of this research using Maximal Overlap Discrete Wavelet Transform (MODWT) term.
DWT suppose the data satisfy 2 j . In real world most time series data has the length that is not following this multiplication. MODWT has the advantage, which can eliminate the presence of data reduction to the half (down sampling). So that in MODWT there are N wavelet and scale coefficients in each levels of MODWT (Percival and Walden, 2000).
If there is time series data x, with N-length, the MODWT transformation will give column vectors 0 1 2 J w , w ,..., w and 0 J v , each with N-length. Vector w J contains scale coefficients. As in DWT, in MODWT the efficient calculation is done by pyramid algorithm. The smoothing coefficient of signal X is obtained iteratively by multiplying X with scale filter or low pass (g) and wavelet filter or high pass (h). In order to abridge the relationship of DWT and MODWT, wavelet filter and scale filter definitions given by: and the scale filter must accomplish the following equation: L 1 L 1 2 l l l l 2m l 0 l 0 l 1 g 1, g and g g 0 2 Time series prediction by using wavelet: Generally, time series forecasting given by using wavelet is a forecasting method that use data preprocessing through wavelet transform, especially through MODWT. By the presence of multiscale decomposition like wavelet, the advantage is automatically separating the data components, such as trend component and irregular component in the data. Thereby, this method could be used for forecasting of stationary data (contain only irregular components) or non-stationary data (contain trend and irregular components). For example, suppose that stationary signal X = (X 1 ,X 2 ,…,X t ) and assume that value X t+1 will be forecasted. The basic idea is to use coefficients that are constructed from the decomposition, i.e., (Renaud et al., 2003): The first step that should be known is how many and which wavelet coefficients that should be used in each scale. Renaud et al. (2003) introduced a process to calculate the forecast at time (t+1) th by using wavelet model as illustrated in Fig. 1. Figure 1 represents the common form of wavelet modeling with level J = 4, order A j = 2 and N = 16. Fig. 1 illustrates that if the 18th data will be forecasted, the input variables are wavelet coefficients in level 1 at t = 17 and t = 15, level 2 at t = 17 and t = 13, level 3 at t = 17 and t = 9, level 4 at t = 17 and t = 1 and smooth coefficient in level 4 at t = 17 and t = 1. Hence, we can conclude that the second input at each level is t-2 j .
The basic idea of multiscale decomposition is trend pattern influences Low frequency (L) components that tend to be deterministic; whereas High frequency (H) component is still stochastic. The second point in wavelet modeling for forecasting is about the function used to process the inputs, i.e., wavelet coefficients to forecast at (t+1) th period. Generally, there are two kinds of function that can be used in this input-output processing, such as linear and nonlinear functions. Renaud et al. (2003) developed a linear wavelet model known as Multiscale Autoregressive (MAR) model. Moreover, Renaud et al. (2003) also introduced the possibility of the nonlinear model use in inputoutput processing of wavelet model, especially Feed-Forward Neural Network (FFNN). Furthermore the second model is known as Wavelet Neural Network (WNN) model. These two approaches use the lags of wavelet coefficients as the inputs, i.e. scale and smooth coefficients as in Fig. 1.
Where: j = The numbers of level (j = 1,2,…,J) A j = Order of MAR model (k = 1,2,…,A j ) w j,i = Wavelet coefficient value v j,t = Scale coefficient value a j,k = MAR coefficient value Wavelet neural network: Suppose that a stationary signal X = (X 1 ,X 2 ,…,X t ) and assume that X t+1 will be predicted. The basic idea of wavelet neural network model is the coefficients that are calculated by the decomposition as in Fig. 1 are used as inputs at certain neural network architecture for obtaining the prediction of X t+1 . Renaud et al. (2003) introduced Multilayer Perceptron (MLP) neural network architecture or known as Feed-Forward Neural Network (FFNN) to process the wavelet coefficients. The architecture of this FFNN consists of one hidden layer with P neurons that is written as: where, g is an activation function in hidden layer, which is usually sigmoid logistic. In this FFNN, the activation function in output layer is linear. Furthermore, model in Eq. 11 is known as Wavelet Neural Network (WNN) or Multiresolution Neural Network (MNN).

Procedures:
There are four proposed procedures for building WNN model for forecasting non-stationary (in mean) time series, i.e.: • The inputs are the lags of scale and wavelet coefficients similar to Renaud et al. (2003) • The inputs are the combination between the lags of scale and wavelet coefficients proposed by Renaud et al. (2003) and some additional lags that are identified by using stepwise • The inputs are the lags of scale and wavelet coefficients proposed by Renaud et al. (2003) from differencing data • The inputs are the combination between the lags of scale and wavelet coefficients proposed by Renaud et al. (2003) and some additional lags identified by using stepwise from differencing data In this research, the additional lags are the seasonal lags because of the data pattern. The first and second procedures are used for the stationary data, whereas the third and fourth procedures are used for data that contain a trend. This study only illustrates the fourth procedure. Stepwise method is used to simplify the process in finding the significant inputs. After building WNN model, the results at out-sample dataset are compared to MAR and ARIMA models to find the best model for forecasting the number of tourist arrivals to Bali.
At the proposed first new procedure, the selection of the best WNN model is done firstly by determining an appropriate number of neurons in the hidden layer. The starting step before applying the proposed procedure is the determination of the levels or J in MODWT. In this case, all scale and wavelet coefficient lags from MAR(1) and additional seasonal lags which are significant based on stepwise method are used as inputs. Different from linear wavelet model (MAR) that the modeling process was divided into two additive parts, namely modeling the trend by using wavelet coefficients and MAR modeling for the residual by using the wavelet and scale coefficient lags. In this proposed procedure, the modeling of WNN is done simultaneously by using scale and wavelet coefficient lags. This is based on the fact that WNN is nonlinear model expected to be able to catch data characteristics simultaneously by using scale and wavelet coefficients from MODWT. The first proposed procedure for WNN model building for forecasting seasonal time series data can be seen at Fig. 2.  Fig. 2: The procedure for WNN model building for forecasting seasonal time series data using inference combination of R 2 incremental and Wald test

RESULTS AND DISCUSSION
The time series plot of the number of tourist arrivals to Bali through Ngurah Rai airport is shown in Fig. 3. The plot shows that the data has seasonal and trend patterns. These data have been analyzed by using MAR and ARIMA models and the results showed that MAR(J = 4; [12,36], [12,36], [36],[0],[0])-Haar yielded better forecast than ARIMA model.
As the starting step, the modeling focuses to determine an appropriate number of neurons in the hidden layer. In this study, scale and wavelet coefficient lag inputs are assumed as lag inputs in nonlinearity test in the first step.
Every proposed procedure is begun by using nonlinearity test, i.e., White test and Terasvirta test. By using scale and wavelet coefficient lags as the inputs as proposed by Renaud et al. (2003), the results show that there is a nonlinear relationship between inputs and the output. Hence, it is correct to use a nonlinear model as WNN for forecasting the data. The next step of the fourth procedure is to determine an appropriate number of neurons in the hidden layer. This step is started from one neuron until the additional neuron show does not have significantly contribution.
The results of the selection process of the number of neurons which is appropriate with WNN model using lag inputs proposed by Renaud et al. (2003) can be seen in Table 1 for the Daubechies(4) wavelet family or D(4) and Table 2 for Haar wavelet family. Moreover, the results of forecast accuracy comparison between WNN and MAR could be seen in Table 3.
Based on the results in Table 1 and 2, the first proposed procedure shows that the best WNN model for forecasting the number of tourist arrivals to Bali consists one neuron in the hidden layer for both D(4) and Haar wavelet. In this architecture, the inputs are the lags of scale and wavelet coefficients of MAR(1) and multiplicative seasonal lags which are statistically significant from stepwise methods.    If the selection of WNN model is done based on cross-validation principle, then the best model is the model that yields the minimum value of RMSE at testing dataset, i.e., the WNN model that consists of one neuron in the hidden layer both for D(4) and Haar wavelets with RMSE 0.0973 and 0.0978 respectively. Hence, WNN model with one neuron in the hidden layer that uses D(4) wavelet is the best model.
In addition, the result of forecast accuracy comparison between WNN and MAR models at Table 3 shows that WNN model with one hidden neuron that uses D(4) wavelet family yields the most accurate forecast than other models.

CONCLUSION
Based on the results at the previous sections, it can be concluded that there is a difference pattern between scale and wavelet coefficients of MODWT circular decomposition. For non-stationary seasonal time series data, the scale coefficients have non-stationary and seasonal pattern, whereas the wavelet coefficients in each decomposition level tend to have a stationary pattern and the values are around zero. Then, new procedures for building NN-MAR based on these properties of scale and wavelet coefficients are proposed. The empirical results by using data of the number of tourist arrivals to Bali show that the proposed procedure for building a WNN model works well for determining appropriate model architecture. Moreover, the forecast accuracy comparison shows that the proposed procedure using stepwise in the beginning step for determining the lag inputs yields more parsimony model and more accurate forecast than other procedures.