Weather Forecasting Using Merged Long Short-Term Memory Model (LSTM) and Autoregressive Integrated Moving Average (ARIMA) Model

: Weather forecasting is an interesting research problem in flight navigation area. One of the important weather data in aviation is visibility. Visibility is an important factor in all phases of flight, especially when the aircraft is maneuvering on or close to the ground, i


Introduction
Weather forecasting attracts great attention of researchers from various research communities due to its effect to the global human life. The current wide availability of massive weather observation data motivated researchers to explore pattern hidden in the large dataset and make weather prediction. The advent of information and computer technology in the last decade also support the research of weather forecasting to get more accurate results.
Weather forecasting is an interesting research problem in flight navigation area (Maunder et al., 2000). One of the important weather data in aviation is visibility. Visibility is an important factor in all phases of flight, especially when the aircraft is maneuvering on or close to the ground, i.e., during taxi-out, take-off and initial climb, approach and landing and taxi-in.
Aircraft departure and arrival is limited by the visibility (or RVR) to an extent that depends on the sophistication of ground equipment, the technical equipment fitted to the aircraft and the qualification of the flight crew. Many aerodromes and aircraft are fitted with equipment that makes a landing possible in very low visibility conditions. However, in very low visibility, it may prove impossible for the pilot to navigate the aircraft along the runway and taxiways to the aircraft stand.
The widely adopted Autoregressive-Moving-Average (ARIMA) model was proposed by Box and Jenkins (1976). This statistical model consists of two polynomial parts, one for the auto-regression and the other for moving average. Many extension of this model have been proposed in literature such as: Autoregressive Integrated Moving Average (ARIMA), Auto-Regressive Conditional Heteroscedasticity (ARCH).
ARIMA method (Box and Jenkins, 1976) has assumption that the time series under study are generated from linear processes in order to make a successful prediction.
There are three advantages of linear models assumption that are it is easily to be understood, it is enable to be analyzed in great detail, it is easily to be explained and implemented.
Many significant efforts to solve weather forecasting problem use statistical modeling including machine learning techniques with successful results. Afsin proposed a neural network fuzzy wavelet model for long term rainfall forecasting (Afshin et al., 2011) and Belayneh proposed standard precipitation index drought forecasting using wavelet neural networks and support vector regression (Belayneh and Adamowski, 2012). Salman proposed Recurrent Neural Network (RNN) used heuristically optimization method for rainfall prediction based on weather dataset comprises of ENSO (Salman et al., 2015). ConvLSTM with the Trajectory GRU (TrajGRU) model to predict the future rainfall intensity in a local region over a relatively short period of time that can actively learn the location-variant structure for recurrent connections. TrajGRU is more efficient in capturing the spatiotemporal correlations than ConvGRU (Shi et al., 2015). ConvLSTM is a variant of LSTM (Long Short-Term Memory) containing a convolution operation inside the LSTM cell. Seongchan proposed a model to predicts the amount of rainfall from weather radar data using convolutional LSTM (ConvLSTM). This model showed that twostacked ConvLSTM reduced RMSE by 23.0% over the linear regression (Kim et al., 2017).
A recurrent convolutional neural network forecast meteorological attributes, such as temperature, air pressure and wind speed used visualization system. These system helped the user to quickly assess, adjust and improve the network design (Roesch and Günther, 2017). A hybrid approach model that combines trained predictive models with a deep neural network that simulate the joint statistics of a set of weather-related variables. The result show how the base model can be enhanced by spatial interpolation that uses learned longrange spatial dependencies (Grover et al., 2015).
Although many models have been proposed over the past ten years, there is no single model which predict weather variables with high accuracy. In addition, most of prominent weather forecasting models only used predictor variables as input. The novelty of the proposed method for forecasting a weather variable is the used of the moderating variables and the merged Long Shortterm Memory Model. This research will explore several weather variables as moderating variable to forecast visibility variable. Therefore, the purposes of this research are developing a merged-LSTM model and ARIMA model to predict visibility variable, analyzing influence of the intermediate variable to predict visibility and comparing the Root Mean Square Error (RMSE) resulted by these two models.

Dataset and Data Preprocessing
This dataset for this research was obtained from Weather Underground (https://www.wunderground.com/) which collects weather data including temperature, dew point, humidity and visibility on Hang Nadim Airport in Indonesia Area. The range of data for this study was obtained from year 2012 to year 2016 that consist of 40,026 time series data.
The main data preprocessing applied in this research are normalizing (2.1), rescaling into range [0,1] (2.2) and smoothing using Moving Average (MA) with lag = 9 (2.3). Consider weather time series data in T time interval Where: x t = Observation at t t x′ = Normalized data at t t x′′ = The result of data smoothing using moving average at t Correlation between two weather variables are measured using coefficient correlation (r) that was computed using Equation 2.4: where, -1≤r≤1; s x and s y are standard deviation variable X and Y respectively which were computed using the following formula: However, after being preprocessed by moving average method, the data distribution looked a bit smoother. Finally, for the purposes of model cross-validation, the total training data was divided randomly into 27,915 (70%) training and 11,906 (30%) testing dataset.

ARIMA Model
The acronym ARIMA stands for Auto-Regressive Integrated Moving Average. The stationarized series in the forecasting equation are called "autoregressive" terms. The forecast errors are called "moving average" terms.
An Autoregressive Integrated Moving Average (ARIMA) is statistical properties and the well-known Box-Jenkins methodology (Box and Jenkins, 1976). A nonseasonal ARIMA model is classified as an "ARIMA(p,d,q)" model, where: • p is the number of autoregressive terms • d is the number of nonseasonal differences needed for stationary • q is the number of lagged forecast errors in the prediction equation An ARIMA model can be viewed as a "filter" that tries to separate the signal from the noise and the signal is then extrapolated into the future to obtain forecasts.
Tseng propose a hybrid forecasting model, which combines the seasonal time Series ARIMA (SARIMA) and the neural network Back Propagation (BP) models, Frequency known as SARIMABP. The SARIMABP model was also able to forecast certain significant turning points of the test time series (Tseng et al., 2002).
Tektas presents a comparative study of statistical and neuro-fuzzy network models for forecasting the weather of Göztepe, İstanbul, Turkey using Adaptive Network Based Fuzzy Inference System (ANFIS) and Auto Regressive Moving Average (ARIMA) models (Tektaş 2010), Mallick and Jain (2017) try to predict weather for a particular period by using the strategy of Autoregressive Integrated Moving Average (ARIMA) and Exponential Smoothing (ETS) and Naseem et al. (2017) propose ARIMA to predict air quality in Islamabad urban area using meteorological variables as predictors.

LSTM Model
This study use LSTM model which is a deep learning model proposed by Schmidhuber (Hochreiter and Schmidhuber, 1997). The model has been successfully used in many research fields such as, large scale image classification (Real et al., 2017), video classification (Yoo, 2017), natural language processing (Elkaref and Bohnet, 2017), anomaly detection (Luo et al., 2017;Lee et al., 2018). LSTM was used as a foundation for weather forecasting model because of several reasons that are (1) the model ability to solve long lag relationship in time series data (2) the model ability to address vanisihing gradient problems that occur in the training deep structure neural networks (Gers et al, 2000).
LSTM is a recurrence Neural Network that first introduced by (Hochreiter and Schmidhuber, 1997) as a specific Recurrent Neural Network (RNN) architecture that was designed to model temporal sequences. Better than the conventional RNN, LTSTM is able to sort error backflow problem so that this algorithm only use the error feedback that can make more accurate prediction whereas all unsupported feedback are removed. This algorithm has sorting capability due to LSTM contains special units called memory blocks in the recurrent hidden layer. The LSTM overcome the weakness in conventional RNN that show backpropagation algorithm in RNNs cause error signals that flows backward in time tend to explode or diminish; therefore, the temporal evolution of the backpropagated error exponentially depends on the size of the weight. In other words, the strength of LSTM is special units called memory blocks in the recurrent hidden layer.
From the diagram in Fig. 3, it can be seen that each LSTM block receives the following signals: Input signal (x), input gate signal (i), recurrent signal (h) and forget gate signal (f); and produces output gate signal (o).
The memory blocks contain memory cells with selfconnections storing the temporal state of the network in addition to special multiplicative units called gates to control the flow of information. Each memory block in the original architecture contained three gate types which are namely: • Input gate: The input gate controls the flow of input activations into the memory cell • Output gate: Output gate controls the output flow of cell activations into the rest of the network • Forget gate: Scales the internal state of the cell before adding it as input to the cell through the selfrecurrent connection of the cell, therefore adaptively forgetting or resetting the cell's memory In addition, the modern LSTM architecture contains peephole connections from its internal cells to the gates in the same cell to learn precise timing of the output.
In order to facilitate analysis so that the process goes smoothly, LSTM architecture is often unfolded over t(time)-dimension which can be represented by the following diagram.
Output from each LSTM cell (Fig. 4), h t , is computed using the following formula: where, f t be forget gate's activation vector; i t be input gate's activation vector; o t be output gate's activation vector; W f , W i , W o , U f , U i , U o are weight matrices to be learned during model training; σ be activation function; and ° be element-wise multiplication. Given a weather variable as the predictor variable and another weather variable as moderating variable, the general structure of merged-LSTM model can be illustrated below.
The detail structure of the merged LSTM model are shown in Fig. 5. As can be seen from (2.13) Where: x t = Be the input signal to Fully Connected (FC) part of merged-LSTM as the average of predicted values from each LSTM, b be bias ˆt y = Be predicted value, y t be actual value N = Be the total number of training samples σ = be activation function, x t input to FC h(P t ) = be output of LSTM-1 whose input is the predictor variable h(I t ) = be output of LSTM-2 whose input is the moderating variable(s)

Root Mean Square Error (RMSE)
Root Mean Square Error (RMSE) is a frequently used measure of the difference between values predicted by a model and the values actually observed from the environment that is being modelled. Root mean square error is commonly used in climatology, forecasting and regression analysis to verify experimental results.
The formula is: where, X obs is predicted values and X model is modelled values at time/place i and n is the number of samples.

Model Training and Cross-Validation
In this study, the LSTM and the merged-LSTM model were supervised trained using Adam algorithm to obtained model parameter prediction that optimized a predetermined objective function. In this model training process, model cross-validation used leave-one-out technique with 70:30 proportion of training and testing datasets. The proportions of training and testing dataset are purposively set out. Model performance was measured using Root Mean Square Error (RMSE) metrics as formulated in Equation 2.15.
The ARIMA model uses 17.000 data. In this model training process, model cross-validation uses Leave-oneout technique with 70:30 proportion of training and testing datasets.

Results
The performances of each model forecasting visibility variable were summarized in Table 1.
As described in Table 1, the merged-LSTM that is used visibility as predicted variable without intermediate variable achieve lower RMSE than ARIMA model. The value of RMSE of LSTM are 0.00009 lower than ARIMA's RMSE that is 0.948.
LSTM model that is used visibility as predicted variable and dew point as moderating variable tends to achieve lower RMSE than RMSE of ARIMA which is used only visibility as the input time series. The RMSE of the mentioned model was 0.00007 while ARIMA model achieved 0.948. This value of 0.00007 is the lowest RMSE.
The lowest RMSE is also achieved by LSTM model which is used intermediate combination of temperature and dew point and combination of temperature and humidity. Table 1 shows that the merged LSTM with or without intermediate model has lower RMSE value than ARIMA model. The merged LSTM has better performance than ARIMA. Figure 6 shows prediction result of LSTM Model combination of temperature and dew point compare to the test (actual) time series. As described by Fig. 6, the comparison has slight deviation.
The comparison of prediction result of the ARIMA Model and the test (actual) time series is shown by Fig. 7. The figure shows that deviation between predicted and actual test data is wide.  The most important findings of this research are combination of the weather variable that has the most impact on the accuracy of weather forecasting and the research artifacts (scripts and dataset). This artifacts will be available for other researchers in the same domain.