An Evolving Autoregressive Predictor for Time Series Forecasting

Autoregressive (AR) model is a common predictor that has been extensively used for time series forecasting. Many training methods can used to update AR model parameters, for instance, least square estimate and maximum likelihood estimate; however, both techniques are sensitive to noisy samples and outliers. To deal with the problems, an evolving AR predictor, EAR, is developed in this study to enhance prediction accuracy and mitigate the effect of noisy samples and outliers. The model parameters of EAR are trained with an Adaptive Least Square Estimate (ALSE) method, which can learn samples characteristics more effectively. In each training epoch, the ALSE weights the samples by their fitting accuracy. The samples with larger fitting errors will be given a larger penalty value in the cost function; however the penalties of difficult-to-predict samples will be adaptively reduced to enhance the prediction accuracy. The effectiveness of the developed EAR predictor is verified by simulation tests. Test results show that the proposed EAR predictor can capture the dynamics of the time series effectively and predict the future trend accurately.


Introduction
Time series forecasting is a process to extract features from available data that rule the trend of the data and forecast future data based on the extracted features. It has enormous real world applications, such as electric load prediction (Quan et al., 2013;Goude et al., 2014), financial indices forecast (Li et al., 2013a;Yu et al., 2009) and machine health condition monitoring (Wang, 2013).
Some commonly used prediction tools are Autoregressive (AR) models, Autoregressive-Moving-Average (ARMA) models (Brockwell and Davis, 2009), Neural Networks (NNs) (Li et al., 2014a;2013b;2014b) and particle filtering (Li et al., 2014c). The NNs can capture data features through a training stage and conduct time series prediction based on the extracted features; however, they suffer from opaque modeling mechanism. AR is more compact than ARMA and it does not have estimation errors that result from the moving average part in ARMA.
The boosting technique is an ensemble learning method, which combines weak learners to improve the training accuracy whereby each weak learner addresses one particular data property (e.g., the data distribution). Boosting techniques are mainly used in pattern classification (Cao et al., 2012;Schapire and Singer, 1999). In time series forecasting, the boosting techniques have also been employed to improve prediction accuracy (Drucker, 1997). A boosting technique can also be used as a training method to optimize model parameters. An Evolving AR (EAR) predictor is proposed in this study for time series forecasting. The EAR has the generic AR model structure; however, it uses an Adaptive Least Square Estimate (ALSE) method to train model parameters. Some samples are hard to learn because they are noisy samples or outliers. If more effects are put to correctly learn these hard-to-learn samples, the prediction accuracy of the already well learnt samples will drop and the training process will suffer from the "overfitting" problem. The hard-to-learn samples are detected and their penalties are reduced in the proposed EAR to improve generalized prediction performance. The effectiveness of the proposed EAR predictor is verified here by simulations.
The remainder of this paper is organized as follows: Section 2 presents the theoretical foundation of the proposed EAR predictor. In section 3, the effectiveness of the proposed EAR predictor is examined by simulation tests. Some concluding remarks of the study are given in section 4.

The Proposed Evolving AR Predictor
The EAR predictor assigns penalties to samples at each training epoch according to their fitting errors. The model parameters can be updated adaptively with respect to different samples penalties so as to improve prediction accuracy. Those hard-to-learn samples may distort the training process, so their penalties will be reduced in EAR. The proposed EAR technique is given as follows.

EAR Model Structure
Consider the training data sets u(k): where, K is the number of samples in the training data set. For s-step-ahead prediction, the training data set can be re-arranged as input vector x(i) = [u(i), u(i +1), …, u(i + d-1)] and the output y(i) = u(i + d + s-1); i = 1, 2, …, N, where N = (K-d-s +1). d is the dimension of the input vector x(i).
The EAR model has the form of: where, Θ i are linear parameters; i = 1, 2, …, r -1. Equation 1 can also be written in the following matrix form Equation 2, 3 and 5: Where: (1) where, N t is the number of system state u.

Parameter Estimation Using ALSE
The linear parameters increment θ t of the EAR predictor at t th training epoch can be derived using the weighted least square estimate, WLSE: The weight matrix W is represented by Equation 7: where, N = N t -r-s+1 and V t (i) represent the penalties of sample i at training epoch t.

Formulation of Sample Penalties
In the proposed EAR, the penalties of sample i at training epoch one is set as 1 where N is the number of samples. Given the penalties V t , the update of the penalties at step t+1 will be performed by Equation 9: where, y d are the desired values; p t are the predicted values using Equation 1; β t is the learning rate of the tth parameter update epoch; ω t (i) is the weight regulator to reduce the penalties of hard-to-learn samples; ( ) The EAR model parameters in Equation 4 at step T are obtained from Equation 10: where, θ t is the linear parameter increment. The predicted values of EAR at step T are formulated as Equation 11: are the normalized learning rates.
Calculation of β t The upper bound of Z t can be derived as: and the learning rate at step t will be: Substituting Equation 13 into Equation 12, the minimum upper bound of Z t can be derived as Equation 14: Since

Mean Absolute Training Error
To satisfy Equation 19 , the upper bound of training MAE can be given as: Therefore, as more training epochs are used, the upper bound of the training MAE decreases.

Weight Regulator
Some samples may be noisy samples or outliers, which may mislead the training process and degrade the prediction accuracy. The following equation can be used to detect these irregular samples Equation 26: By summing up the previous weighted errors, the samples can be ranked according to their difficulty levels in learning. Then the weight regulator can be computed from Equation 27: where, It can be shown that ( ) 0 t i ω ≥ . By applying the weight regulator to the sample penalty update as shown in Equation 8, the hard-tolearn samples can be identified and their penalties will be reduced to improve generalized prediction accuracy.

Implementation of the EAR Predictor
The proposed EAR predictor is implemented using the following steps: • Normalize the data over a proper range (e.g., [0, 1]), so that constraints can be satisfied • Initialize the penalties of the training data set • Derive the parameter increment θ t using Equation 6 with the penalties V t • Compute the sum of weighted absolute error ( ) normalization factor • Repeat steps 3 to 6 at t = 1, 2, …, T

Performance Evaluation
To verify the effectiveness of the proposed EAR predictor, simulation tests are conducted to examine its prediction performance. The AR model with the same structure as EAR, but trained by Kalman-filter-based Maximum Likelihood Estimate (MLE) (Hevia, 2008), AR-MLE, is used for comparisons. To satisfy constraints, the data sets used in this section are normalized over the range of [0, 1]. Test results, however, are shown in their original scales.
The Mackey-Glass data set (Farmer, 1982;Li and Wang, 2011;Wang et al., 2012) is a commonly used simulation data set in the field of time series forecasting to compare the performance of predictors, due to its specific properties such as chaotic, non-periodic and non-convergence, it is given by: In this simulation test, the data set is obtained from Equation 28 with the initial conditions of τ = 30, x(0) = 1.2, dt = 1 and x(t) = 0 for t<0. About 500 samples are selected for training and 50 samples for testing. Onestep-ahead forecasting is conducted in the Mackey-Glass data prediction tests. To test the noise tolerance of the two predictors, noisy samples are intentionally added to the Mackey Glass training data; the red circled samples at time step 20, 100, 135, 215, 345, are shown in Fig. 1. γ is given the value of 2; α is given the value of 1.
The training MAE convergences of EAR(3), EAR(6) and EAR(9) are shown in Fig. 2. It is seen from Fig. 2 that the training MAEs decrease as more training epochs are used, which agrees with Equation 25. Figure 3a demonstrates the training data fitting and Fig. 3b shows the prediction performance. From Fig. 3b, it is seen that the EAR predictor outperforms AR-MLE, because EAR predictor can detect and process noisy samples to alleviate the noisy sample misleading effect.
The training MAEs and test MAEs of the two predictors with respect to model orders of 3, 6 and 9 are listed in Table 1; their corresponding training RMSEs and test RMSEs are given in Table 2. 100 training epochs are used in the EAR. From Table 1 and 2, it is seen that the training errors of EAR decrease as the model order increases, because a larger model order indicates more information is input to the predictor for processing and the predictions become more accurate.
From Table 2, the training RMSEs of the proposed EAR are larger than those of AR-MLE, because EAR reduces the penalties of noisy samples to improve the generalization capability of the predictor. Consequently, training errors at noisy samples are large, which leads to larger EAR training RMSE.

Conclusion
An evolving AR predictor, EAR, has been developed in this study for time series forecasting. The EAR can gradually learn the training data characteristics with more training epochs and accurately forecast the future states of a dynamic system. The noisy samples are addressed using a weight regulator to reduce their misleading effect in training. The effectiveness of the proposed EAR predictor is verified using Mackey-Glass simulation examples. Test results have shown that the EAR predictor can effectively capture the dynamic behaviour of a time series and predict its future states accurately.