BAYESIAN MODEL AVERAGING WITH MARKOV CHAIN MONTE CARLO FOR CALIBRATING TEMPERATURE FORECAST FROM COMBINATION OF TIME SERIES MODELS

Global warming is an important issue related to the climate and weather forecast. It is shown by significantly increasing the atmospheric temperature level. Hence, improving the forecast accuracy of temperature is an important issue. The forecast is commonly done by performing a deterministic forecast meaning that the system will generate a point forecast without taking into account the uncertainty induced by model specification as well as the nature behavior. Ensemble forecast has been introduced to overcome this problem and it has been implemented in many Ensemble Prediction Systems (EPS) over the world. A problem arises in some developing countries that unable to develop such EPS due to the system restrictions. This paper discusses the performance of combined forecasts generated from a class of time series model as an alternative of EPS. The models are calibrated using Bayesian Model Averaging (BMA) where the parameters are estimated by Markov Chain Monte Carlo (MCMC). The results show that the proposed procedure is capable to increase the reliability of the forecast.


INTRODUCTION
Global Warming is an important issue related to climate and weather change. Global warming has increased the intensity of extreme events towards weather variables leading to climate change. Climate change is defined as the gradually change of air temperature, humidity, atmospheric air pressure, sun and rain intensity as well as wind speed within fifty to hundred years. Therefore, temperature as one of the climate change indicators is important to be forecasted.
There are two kinds of forecasts with regards to the output i.e., deterministic forecast and probabilistic forecast. By deterministic forecast, the system will generate a point forecast, while probabilistic forecast generates an interval forecast representing some degrees of uncertainty. It is well known that future projection of nature behavior such as temperature is highly affected by uncertainty and hence, implementing probabilistic forecast is expected to produce more reliable forecast than deterministic one. Some researches applying deterministic approach for temperature prediction are (Tektas, 2010;Hippert et al., 2000;Prybutok et al., 2000;Saima et al., 2011;Soe et al., 2012) among others. Indonesia is one of the countries that currently still implements deterministic forecast to some climate variables.
The probabilistic forecast is a result of using several model outputs for the forecast, that is known as ensemble forecast (Gneiting, 2008;Gneiting et al., 2005;Murphy, 1998;Ustaoglu et al., 2008;Sloughter et al., 2013). The idea of the ensemble forecast is to model the uncertainty induced by several factors such as model specification, nature behavior, initialization and the others. Zhu (2005) discusses the sources of uncertainty comprehensively. In fact, ensemble forecast has been applied in many Ensemble Prediction System (EPS) of developed countries such as National Centers for Environmental Prediction in US, the European Centre for Medium-

JMSS
Range Weather Forecasts (ECMWF), the Met-Office UK, Meteo-France, Environment Canada, the Japanese Meteorological Agency, the Bureau of Meteorology Australia, the Korea Meteorological Administration and many others. Forecasting by ensemble has been proven to be able to generate reliable forecast compared to deterministic forecast from a single model.
The model outputs used in the ensemble forecast of EPS involves of numerical processes with relatively high degree of complexity. This leads to a problem in implementation of EPS, in particular for developing countries due to unavailability of resources such as supercomputers. This paper proposes an alternative of numerical ensemble forecast using outputs of several time series models as the ensemble. The ensemble forecast has characteristic of being underdispersive  and hence it has to be calibrated to remove the bias and match the distribution of observation with forecasts. Several calibration methods have been introduced in many previous researches such as Bayesian Model Averaging (BMA) of Wilson et al., 2007;Sloughter et al., 2013), Dressing kernel of (Wang and Bishop, 2005), Model Output Statistic of (Gneiting and Raftery, 2007;Soe et al., 2012) and many others. The BMA method is the most widely used method in the EPS. There are two common procedures used to estimate the parameters of the BMA i.e., Expectation Maximization and Markov Chain Monte Carlo (MCMC), referred hereafter to as BMA_EM and BMA_MCMC respectively. The BMA_MCMC was firstly introduced by (Vrugt et al., 2008). The paper showed that the MCMC procedure works well in predicting the wind speed and offers some flexibility in the application. They also pointed out several advantages of using MCMC compared to EM algorithm.
In this study, the BMA_MCMC is applied to ensemble temperature forecast where the outputs are generated from several time series models instead of generated from numerical process. This approach can be considered as an extension of forecast combination proposed by (Granger, 1989), where the forecast is combined and calibrated in this study. In this case, we intend to model uncertainty induced by the model specification. The performance of the proposed methodology is compared to the results of BMA_EM documented in (Kuswanto, 2011).

Bayesian Model Averaging
Bayesian Model Averaging (BMA) is one of the statistical methods for calibration that combine some information from several model outputs. The BMA for dynamic ensemble forecast can be written as: where, w k is the positive posterior probability of k-th forecast with sum of one and K is the number of ensemble members. Equation (1) requires a specification about the prior distribution of the underlying climate variable (e.g., temperature) which is approximated by normal distribution such that: where, a k and b k are the bias correctors derived from simple linear regression of y to f k for every ensemble member. The mean of the BMA forecast is given by Equation (2): while the variance of the BMA at period t is:

Markov Chain Monte Carlo (MCMC)
The BMA parameters i.e., variance and weight are estimated using Sampling of MCMC. The MCMC simulation allow us to estimate parameters of a complex posterior distribution with high dimensionality. Granger (1989) proposes an algorithm called as Differential Evolution Adaptive Metropolis (DREAM) where N different Markov chains are run simultaneously in parallel. If the state of a single chain is given by a vector θ with dimension of d, thus each generation of N in DREAM defines a population Ω with dimension N×d. The jump of every chain is generated by randomly taking the difference among several other chains of Ω (without replacement) such as: where, δ represents the number of pairs used to produce the candidates. The ratio metropolis is used as the criteria to decide whether to accept or reject the candidate.

Continuous Ranked Probability Score (CRPS)
The CRPS measures how reliable the calibration result of probabilistic forecast. The formula for calculating the CRPS is given by: is the cdf of true observation and K is the number of ensemble member. Small value of CRPS indicates that the forecast is reliable (Gneiting and Raftery, 2007).

Data
This study analyzes daily mean temperature observed at climatological Station Juanda-Indonesia spanning from 2008-2009. The data is divided into two parts i.e., training and testing. The training data is used to generate build the time series model for generating the ensemble, while the testing data is used for evaluation of the forecast performance. We use different sizes of training windows to implement the BMA i.e., 10, 15, 20 and 25.

Steps of the Analysis
The steps of the analysis that is carried out in this study are described as follows:

RESULTS AND DISCUSSION
The mean daily temperature observed at meteorological station Juanda Indonesia is 27,63 degree with the maximum reached 32.6 degree and minimum of 24.2 degree. The distribution of the data is approximately normal shown by the skewness of 0.32. It validates the previous assumption that the temperature in this case is approximated by normal distribution as prior distribution. In this study four time series models are constructed and the forecasts generated from these models will be combined or calibrated using BMA. The models belong to the class of ARIMA. The selected models are ARMA(2,1), ARIMA(1,1,1), ARIMA(1,1,2), IMA(0,1,3) denoted hereafter to as M1, M2, M3 and M4 respectively. These models are the best models chosen among several candidate models by considering some rules in the ARIMA modeling. Figure 1 and 2 shows the time series plots of the 1 day and 7 day lead forecasts.
It is clear that the spread of the forecasts is underdispersive meaning that the have a low spread and tend to generate similar values for all models. The 7 day forecast yield on more bias than 1 day forecast. It is therefore necessary to calibrate the forecasts and we apply BMA_MCMC as the calibration method. The illustration of the bias correction as the result of regression between ensemble member and observation is given in Table 1. The mean in the last column shows the deterministic forecast for each member that will be combined with some degrees of model performace represented by a weight for each. Meanwhile Table 2 illustrates the bias correction for forecast on 26th December 2009.
Note that the observation on 29th December is 28 degree which means that bias correction on that date is capable to generate forecast with low bias. The parameters estimated by MCMC procedure using different training windows are shown in Table 2.
The BMA combines M1 to M4 with respects to its performance (represented by the weight) and yields on the following probabilistic forecast. We perform the corresponding CRPS for each training window.

JMSS
From Table 3, we know that the minimum CRPS is 0.2494 obtained using 10 days training window and this is the optimum setting. Figure 3 depicts the predictive distribution of the calibrated forecast using training window of 15 days.
The predictive distribution in Fig. 3 clearly shows the performance of each model in the forecast. Higher weight represents better the performance of the model in the system. We replicate the similar procedure to calibrate the 7 day forecast for the forecast on 6th December 2009. Table 4-6 show the parameters of the BMA i.e., bias correction, BMA parameters and the predictive distributions respectively.
We observe different result of the predictive distribution in term of the optimum training window. However, the bias as well as the interval of the forecast are nearly the same as 1 day forecast. The optimum setting is obtained using 20 day training window.
The performance of the calibrated ensemble forecast is evaluated from the system. It means that the choice of the optimum training window is based on the performance of each setting for the whole periods of validation. The criteria of the assessment is CRPS. The CRPS value reflects the reliability of the forecast. The analysis shows that 1 day forecast has the best performance under 25 days training window while for the best performance for 7 day forecast is shown by 15 training window. The summary of the value is given by Table 7. Indeed, CRPS measure the reliability with respect to the bias and compactness of the forecast.            In term of the proportion of observation captured by the produced interval, it can be seen in Fig. 4 and 5.

JMSS
From the figures, it can be seen that using 1 day forecast 91% of the observations can be covered by the produced interval, while 7 day forecast is able to capture the observations with the proportion of 92%. It is rational to have this as the widht of interval forecast becomes wider for longer lead time.
The table indicates that the calibration using BMA_MCMC is capable to generate more reliable forecast than the uncalibrated one. It is shown by the lower value of CRPS of calibrated forecast. It means that the calibration will generate more reliable probabilistic forecast.
Having compared the performance of the BMA_MCMC with the unclaibrated forecast, we will compare the performance of BMA_MCMC with BMA_EM. As previously mentioned, the parameters of the BMA can be esimated using those two algorithms. We directly compare the BMA_MCMC with the performance of BAM_EM documented in (Kuswanto, 2011). The study discusses the performance of BMA_EM using the same dataset and ARIMA models as used in this study.
For the optimum training length, both BMA algorithms yield on the same suggestion i.e., using 25 day training window for 1 day ahead forecast. However, different results are observed for lead time of 7 day forecast. BMA_MCMC suggests to use 15 training window while BMA_EM found 10 day as th eoptimum one. Nevertheless, the performance of both algorithms is the same in particular short term forecast. For long lead time forecast, we suggest to use 10 day training window to generate more reliable forecast.

CONCLUSION
Calibration of the temperature forecast is done by generating four simple time series models which is treated as ensemble members as commonly used in any Ensemble Prediction System. The forecasts are under dispersive and hence it has to be calibrated. By calibration, the bias of the mean forecast is removed and it is expected that the generated probabilistic forecast yields on reliable forecast. Applying BMA_MCMC to calibrate the ensemble forecasts generated from four ARIMA models is capable to produce more reliable forecasts than the uncalibrated forecast. It has been shown also that using the proposed simple procedure works well for short and modest lead time forecasts. Different setting of training windows has been investigated during the implementation of the procedure in order to find the optimum one. Compared to the performance of another BMA algorithm, the BMA_MCMC is unable to outperform BMA_EM. However, both have similar performance and there is gain in using these approaches. Further investigation is necessarily to do in order to study the performance of the BMA applied to models from different classes of time series models such as Transfer function, ANFIS. Moreover, applying BMA to other climate variables generated by similar procedure as applied in this study is worthy to be carried out. Overall, we have shown that using combination of time series models can be a proxy of ensemble forecasts generated from numerical ensemble prediction system. It is very simple and easy to be implemented.