Forecasting Air Temperatures Using Time Series Models and Neural-based Algorithms

,


INTRODUCTION
Recently, neural network models have been popular and found useful in forecasting a wide variety of historical data, although applications in climatology have been less widespread than other disciplines, even though climatic data often takes the form of series measures over time. Forecasting global temperature variations by neural networks was implemented using regularization network, multilayer perceptrons, linear autoregression and a local model known as the simplex projection method [1] . The forecasting results are consistent with the hypothesis that the climate dynamics is characterized by low-dimensional chaos and that it may have changes at some point after 1965, which is also consistent with the recent idea of climate change.
Different neural-network prediction algorithms have been proposed in the literature and used in many disciplines. Lapedes and Farber [2] reported that simple neural networks can outperform conventional methods for identifying historical data. Anderson et al. [3] , compared different recurrent training algorithms for the identification of time series data. Gómez-Ramírez et al. [4] implemented an adaptive algorithm of the architecture of Polynomial Artificial Neural Network (PANN) using Genetic Algorithm (GA) to improve the learning process. The performance of this algorithm was compared to a multilayer perception network. Peter Zhang and Min Qi [5] considered the issue of how to effectively model time series with both seasonal and trend patterns. They concluded that detrending and preprocessing of data is essential for obtaining reliable predictions. Other applications of neural networks in this area can be found elsewhere [6][7][8] .
In the Middle East, investigations of long term variations in temperature data are not receiving enough attention even though, these countries suffer serious environmental, agricultural and water resources problems. In this work, feed-forward neural-network (FFNN) and autoregression (AR) time series models were used in forecasting the annual mean temperatures, annual mean minimum temperatures and annual mean maximum temperatures during the period 1923-2003 in Amman Airport station, Jordan. The Amman station is a strategic and historical station in the Middle East, due to its location, reliability and length of the record. The performance of the two predictors was compared by out-of-sample forecasts and examining the mean square errors. Finally, forecasts of the annual mean temperature mean minimum temperature and mean maximum temperature for the coming 10 years at Amman station in Jordan are given using the two prediction models.

MATERIALS AND METHODS
In the following, the feed-forward neural-network (FFNN) algorithms and Autoregression (AR) time series models were described.
Feed-forward neural networks (FF-NN): When used as a black-box modeling tool, the FF-NN artificial neural network structure approximates the behavior of certain mechanistic phenomena by the use of optimization search techniques. This is accomplished into two steps; the network is fed with the input-output data of the actual process to be modeled and then these data are used to train the network for emulating the training data.
Basically, the neural networks are constructed in layers namely; input layer, hidden layer(s) and an output layer. Each layer is composed of one or more neurons. The neuron is the building block of these networks. The neuron, has a scalar input p which is passed through a connection that multiplies its strength by the scalar weight w, to form the product wp. The product wp is fed as an argument to a transfer function f, which produces the scalar output a. A scalar bias, b, of unity value is added to the product wp. Thus, the neuron has two inputs p and b and one output a. This structure can be written mathematically as: a f(wp b) = + (1) The transfer function which is some times called the activation function may take different mathematical forms. Some of these commonly used forms are; the linear, log-sigmoid, tan-sigmoid and the radial basis transfer functions. Selecting a proper activation function depends on the application used and network structure.
Neurons are stacked to form one layer as shown in Fig. 1. The R input vector p is fed to the first layer. These inputs are multiplied by an S×R weight matrix W and an S bias vector b is added to the product to produce an output vector a. This can be expressed as: ( ) a = f Wp + b (2) For n layers connected in series, the output vector at a certain time step k can be expressed as: Thus, for a 3 layer network equation 12 becomes: ( ) where y represents the network product vector. As shown in Fig. 1, the input vector p to the FF-NN network architecture consists of the recent input u(k) and the n past inputs u(n), n=k-1,…,k-n. The network is fed with a set of input-output pairs and trained to reproduce the outputs with a predefined degree of tolerance. For the case of time series data, the input is a vector of n+1 values and the output is a single value at the k+1 incidence of time. Network training is done by adjusting the neurons weights using an optimization algorithm to minimize the quadratic error between observed data and computed outputs.
Input-target training data are usually pretreated in order to improve the numerical condition of the optimization problem and to make better behavior of the training process. The input-target data is normally subdivided into three subsets namely; training, validation and testing subsets. The training subset data is used to accomplish the network learning and fit the network weights by minimizing an appropriate error function. Backpropagation is the training technique usually used for this purpose. It refers to the method for computing the gradient of the case-wise error function with respect to the weights for a feedforward network. The performance of the networks is then compared by evaluating the error function using the validation subset data, independently. The testing subset data is then used to measure the generalization of the network (i.e. how accurately the network predicts targets for inputs that are not in the training set) this is some times referred to as hold-out validation.
where Z t is a sequence of uncorrelated random variables, with zero mean and variance σ 2 and is written as Z t ~WN(0, σ 2 ), (WN stands for white noise).
The model could be written in the following form: , and , B k X t =X t-k .
Important properties of the AR(p) models like stationarity and causality depend on the location of the roots of the characteristic polynomial Φ(B) [9] . The following four steps are required to build an AR model: Model parameters estimation can be achieved using different methods. Among these methods, Yule-Walker estimates, which are moment type estimates, least squares estimates, conditional maximum likelihood estimates, and maximum likelihood estimates [9,10] . The maximum likelihood estimator which is used in this work is one of the popular estimation methods. For the AR(p) model given in (5), assuming independent identically normally distributed white noise {Z t }, the likelihood function is given by Brockwell and Davis [9] .
Before accepting the fitted model and developing forecasts of future values, the model must be validated by reconciliation its predictions with actual time series data.

RESULTS AND DISCUSSION
The fitted autoregression models of annual mean minimum, annual mean and annual mean maximum temperatures based on the minimum AICC criterion are given as follows: The fitted model of annual mean minimum temperature is The FFNN modeling was performed using the following procedure; data preprocessing, network structure selection, network training, early stopping, out of sample testing. A series of exploratory experiments were performed to select the best NN structure. A three layers network was finally selected with 4 sigmoidal neurons in the input layer, 30 sigmoidal neurons in the hidden layer and 1 linear neuron in the output layer. Due to the time dependence of the data, the network output was fed back to the input as shown in Fig. 1. Three time delays were selected, making the total number of inputs four in any time prediction instance. In order to guarantee the generalization of the trained neural network and confirm the acceptance of the network performance over a wide range of process operating conditions, the network needs to be trained with data which covers the entire range of possible network inputs. For the process under consideration, the input-target range spans all the experiments data (the period 1923-2003). The original historical data set of inputs was subdivided into three subsets for network training, validation and testing in a ratio of 4:1:1, respectively. The inputs and the targets were normalized so that they have zero mean and unity standard deviation. This would make the neural network training more efficient. Network training was accomplished by changing its weights and biases to achieve certain performance criteria. This is accomplished by using an optimization algorithm that searches for network parameters which minimizes the performance index.
The networks' training was stopped before reaching the performance criterion of 1× 10 -3 . This is because of the increase of the validation error. The normalized mean square error (MSE) attained for each network is less than 0.5, but the search gradients indicate that the training goal was not achieved completely. This is because the time series data is so complex and it can not be modeled with a high degree of efficiency such as the case with other function approximation applications. A certain degree of model deviation must be accepted in this case. The attained MSE for the three series give an indication of the good accuracy of prediction.
In forecasting, out-of-sample forecasts was implemented by a leave-three-out technique (need at least 3 years in an AR (3)). In this case we fit/learn with the total interval minus three years (81-3= 78 years) and test in the remaining three years. The test period was shifted through the whole available time. As demonstrated by the forecasting experiments, FFNN models gave the best forecasts. The FFNN and AR predictors for test period (78) years along with the time series plots of annual mean temperatures, annual mean minimum temperatures and annual mean maximum temperatures are shown in Fig. 2.
Since temperature memory decays, one expects that in a AR model the higher lag coefficient should be smaller than the lower lag regression coefficient. However, in the calculated models it is noticed that the third coefficient is larger than the second one for the three different time series. Also it is larger than the first coefficient for the mean temperature and slightly higher for the maximum temperature (0.2107 and 0.2187). One good reason for this contradicted trend in the values of the parameters can be attributed to the well known abrupt climatic changes in the time series behavior at this station. Major statistically significant change points in the mean minimum (night-time) and mean maximum (day-time) temperatures occurred in 1957 and 1967, respectively [11] . The presence of these abrupt changes might affect the natural behavior of the time series structure and a general conclusion about the comparative values of models parameters can not be drawn. The MSE for the two predictors are shown in Fig. 2. The MSE of the AR predictors are 0.3379, 0.3326 and 0.5122 for the annual mean, annual mean minimum and annual mean maximum temperatures respectively whereas the corresponding FFNN predictors MSE for the three time series are 0.1772, 0.0943 and 0.1231 respectively. It is clear that the MSE using the FFNN predictor are all through significantly smaller than the corresponding AR predictor, from which we conclude that the FFNN models gave the better forecasts than AR predictors. The AR models forecasts showed steady temperature profiles, on the other hand, the FFNN models were able to identify the dynamics of the temperature time series and gave more realistic forecasts. Generally, predicted annual temperatures shows steady rise from 1923 to 1950, then a decreasing (cooling) trend until late 1970's, followed by a further clear rise in temperature from late 1970's to the end of the record in 2003. These results agree generally with the trend patterns of the global annual mean temperatures by Smith [12] .
The forecasts for the future 10 years (2004-2014) are also shown in Fig. 2 for the three temperatures data sets. The AR and FFNN models show a decreasing pattern in annual air temperatures for the coming 10 years, especially, the annual mean minimum temperature.

CONCLUSION
As demonstrated by the forecasting experiments, FFNN models gave better forecasts than AR models. The neural network based models exhibited lower MSE as compared to the AR models. The AR models forecasts showed steady temperature profiles. On the other hand, the FFNN models were able to identify the dynamics of the temperature time series and gave more realistic forecasts. Finally, both predictors revealed a cooling trend in the annual air temperatures for the coming 10 years.