Evaluation of Seasonal Autoregressive Integrated Moving Average Models for River Flow Forecasting

Corresponding Author: Kassahun Birhanu Tadesse Civil Engineering Science, University of Johannesburg, South Africa and Debre Markos University, Ethiopia Email: kbirhan@gmail.com Abstract: Reservoir operation policies cannot be functional in instant decision making without forecasting the future reservoir inflows. For forecasting inflows into reservoirs with only hydrological data is available like Koga irrigation dam, multivariate forecasting models cannot be used to generate accurate river flow information. As a result, an evaluation of univariate Seasonal Autoregressive Integrated Moving Average (SARIMA) models was done for forecasting monthly Koga River flow with Gnu Regression, Econometrics and Time-series Library (GRETL) software. The stationarity of historical river flow sequence was checked by Augmented Dickey-Fuller (ADF) unit root analysis. Then, seasonality was removed from the river flow time series by seasonal differencing. Using seasonally differenced correlogram characteristics various SARIMA models were identified and evaluated, their parameters were optimized and diagnostic checks of forecasts were performed using residual correlograms and LjungBox tests. Finally, based on minimum Akaike Information criteria, SARIMA (1, 0, 1) (3, 1, 3)12 model was selected for Koga River flow forecasting. The stationarity test of the forecasted values of this model has proved the similarity of forecast values and patterns with those of the historical ones. Thus, irrigation managers could use this model and forecast information for optimal irrigation planning and development of reservoir operation strategies in order to protect farmers and downstream environment from water shortages. Moreover, the use of stationarity test of forecast flow patterns is useful and applicable in selecting best forecast model during forecasting of any river flows.


Introduction
For reservoir based irrigation schemes for which cropping pattern varies year to year due to many factors, adaptive reservoir operation is vital in order to balance reservoir water supply and crop water demands. The management is adaptive when the judgments are based on both present and forecasted inflows (Labadie et al., 1981). Hence, reservoir operation policies cannot be functional in instant decision making without forecasting the future reservoir inflows (Karamouz et al., 2003). Forecasting is a planning instrument which aids decision makers to anticipate the upcoming improbability based on the characteristics of historical and current observations (Box et al., 2008).
A time series modeling is one of the many techniques used for forecasting. The fundamental theory in modeling time series is that the future is a manifestation of the precedent and any statistical relation that could be originated in the precedent statistics can be utilized to the future (Vahdat et al., 2011). For areas where only the hydrological data is available, the stochastic univariate time series that use probability and statistics are the superior alternatives for forecasting (Momani and Nail, 2009;Adhikary et al., 2012). Among these stochastic univariate models is Autoregressive Integrated Moving Average (ARIMA) model (Brockwell and Davis, 2002;Wang et al., 2008). An ARIMA method is relatively systematic, flexible and grasps more original time series information (Murthy et al., 2017).
For ARIMA time series that has a tendency of showing a periodic behavior after a certain time interval, an extension of ARIMA called multiplicative SARIMA modeling approach is used (Box et al., 2008). Many scientists have used ARIMA/ SARIMA models for forecasting of stream flow, rainfall and evapotranspirations. An ARIMA model performs better than Fiering, Artificial Neural Network and Wavelet ANN models in forecasting monthly inflow to Iffezheim reservoir (Germany) (Zhang et al., 2017). A study conducted to forecast residual water consumption in Tunisia indicates that the SARIMA model out performs neural networks in terms of forecasting accuracy (Sebri, 2013). SARIMA model was also used by Papalaskaris et al. (2016) to perform short-term forecasts of monthly rainfall in Kavala city, Greece, which was aimed at identifying the potential patterns of flood and drought cycles occurring in this area.
For each monthly stream flow and water temperature series, seasonal differencing in ARIMA models is the best stationarisation method in terms of periodic effect elimination and model forecasting accuracy as compared to seasonal standardization and spectral analysis (Moeeni et al., 2017). According to Teymouri and Fathzadeh (2015), the SARIMA model performed best among Thomas-Fiering and Spectral Analysis types of stochastic mathematical models in forecasting flows of five rivers in the Atrak basin, north eastern Iran. Mohamed and Ibrahim (2016) used a SARIMA model for forecasting monthly rainfalls in Nyala station (Sudan).
SARIMA model is more capable of forecasting monthly reservoir inflow, especially for low values and in short-term forecasting than the hybrid Artificial Neural Network-Genetic Algorithm model (Moeeni et al., 2017). SARIMA model also outperformed the ANN and hybrid SARIMA-ANN models in predicting monthly base flow to Jamishan reservoir in Iran (Moeeni and Bonakdari, 2016). SARIMA model is capable of forecasting monthly reference evapotranspiration (Mossad and Alazba, 2016).
Forecasting of river flows has great use for optimal planning and operation of the irrigation reservoirs like Koga dam in Ethiopia. This irrigation scheme was designed to irrigate 7000 ha command areas, though the potential irrigable area is 7572 ha (MacDonald, 2006). However, the actual/current irrigated area is less than the design command area. For example, the maximum actual irrigated area was 5123 ha in 2011/12 and 5144.36 ha in 2012/13 (Birhanu et al., 2015). This is 73% of the design command areas.
Moreover, in 2014/15, the amount of water stored in the reservoir was about 60% of storage capacity due to drought. As a result, farmers were enforced to reduce their irrigable area by half from what they were used to irrigate. Because of unavailability of forecast information of water availability, many farmers wasted their time, energy and money for extra unirrigated land preparation. From this, it can be concluded that there are water scarcity and uncertainty of water availability due to global and local climate change in the Koga irrigation project. Therefore, planning, operation, and management of the irrigation scheme should be supported by forecast information of the reservoir water availability.
The only data available for the Koga River catchment is stream flow data. Multivariate forecasting models cannot be used for stream flow forecasting of data scarce watercourses such as the Koga River. In such conditions, a univariate SARIMA models are more appropriate for forecasting of river flow. Thus, this research aims to evaluate SARIMA models for monthly river flow forecasting into the reservoir. The developed model will help irrigation managers in developing optimal irrigation planning and adaptive reservoir operation policies for Koga Dam.

Area Description
Koga Irrigation scheme is found in the Upper Blue Nile River Basin, near Bahir Dar City (Ethiopia) (Fig. 1). The Koga catchment lies in the Tana Basin between 11°10' and 11°22' North Latitude and 37°02' and 37°17' East Longitude. It covers an area of 22,000 hectares at dam site (37°08' E and 11°20' N). The maximum, minimum and average monthly temperatures for the study area are 26.8, 11.6 and 19.2°C, respectively. Its average annual rainfall is 1578 mm. The rainfall has a uni-modal characteristic that extends from May to October. The highest concentration of rainfall occurs in July. The average daily and the annual reference Evapotranspirations (ETo) are 4.24 mm and 1548 mm, respectively.

River Flow Data
Koga River flow data (1960 to 2012) was collected from hydrology department of Ministry of Water Irrigation and Energy. The first 516 river flow observations (1960 to 2002) were used for model training. The rest 144 data (1996 to 2012) were used for model validation "in-sample" forecast. Finally, all flow data (1960 to 2012) were used for "out of sample" forecasting (2013 to 2018). Gnu Regression, Econometrics and Timeseries Library (GRETL) computer program was used for river flow modeling and forecasting.

SARIMA Modeling
SARIMA (p,d,q)(P,D,Q) S model (Box et al., 2008;Wang et al., 2008;Tsay, 2010) is shown in Equation 1. Equation 2 is the expanded form of this model: Where: Ф and φ = Autoregressive (AR) parameters of seasonal and nonseasonal components, respectively Θ and θ = Moving average (MA) parameters of seasonal and nonseasonal components, respectively = random variable P and p = AR orders Q and q = MA orders. D and d are differencing terms In SARIMA modeling and forecasting, the following four sequential but iterative steps were followed (Box et al., 2008).

Model Identification
As stream flow data are not normally distributed, log transformation was performed before modeling and forecasting. Then, the stationarity of the log transformed time series was checked using graphical methods: sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) (Stoffer and Dhumway, 2010;Singh et al., 2012;Box et al., 2015;Mirzavand and Ghazavi, 2015). Moreover, an Augmented Dickey-Fuller (ADF) unit root test (Enders, 1995;Elliott et al., 1996) and KPSS (Kwiatkowski et al., 1992) test were performed for stationarity analysis. Then, initial SARIMA models were identified from the ACF and PACF plots. As model identification involves a great deal of examination (Nazuha et al., 2010), max values for AR and MA components and integration terms (d and D) were set based on (Brockwell and Davis, 2002) and Cryer and Chan (2008) principles. For example, for a seasonal component of SARIMA models, the number of parameters is less than three for AR and MA terms and rarely larger than one for seasonal integration term (D) (Brockwell and Davis, 2002).

Parameter Estimation
Model parameters were determined by the method of conditional maximum likelihood (Stoffer and Dhumway, 2010) during model calibration and validation.

Diagnostic Checking
Residual correlogram observation and Ljung-Box Q Tests (Ljung and Box, 1978) were performed for confirming the white noise (randomness) of forecast residuals.

Forecasting
A final step is the application of the identified model in forecasting future time steps ahead. Model performance measures (Equations 3 to 5): The Akaike Information (AI) (Akaike, 1974), Schwarz Bayes (SB) (Schwarz, 1978) and Hannan-Quinn (HQ) (Hannan and Quinn, 1979) were used to test model accuracy for best model selection: where, L likelihood function of the model, k and n are respectively the numbers of parameters and residuals. A model having the lowest indices is best (Marco et al., 2012). Model selection was also supported by visual inspection and stationarity test of trend and seasonality patterns of forecast values using ADF (Enders, 1995;Elliott et al., 1996) and KPSS (Kwiatkowski et al., 1992) tests.

Model Identification
The monthly time series plot for log transformed Koga River flow (1960 to 2012) is shown in Fig. 2. The trend is not noticed in Fig.2. Moreover, Augmented Dickey-Fuller (ADF) unit root test statistic (observed) value -4.73 is smaller than test critical (estimated) value -0.27 for α = 5%. This rejects the null hypothesis (i.e., there is unit root) at their level. Besides, KPSS test proves the stationarity of the time series as the computed p-value of 0.07 is greater than the significance level (α = 0.05). Thus, based on time series plot and ADF and KPSS tests, the historical time series is found to be stationary.
However, as significant spikes are observed every 12 months interval from ACF plots and at 12th lags of PACF plots (Fig. 3), Koga River flow series is seasonally nonstationary. To make it stationary, single seasonal differencing was performed on the log transformed river flow data. The correlogram of the seasonally differenced river is shown in Fig. 4. Based on this Figure, SARIMA (p, 0, q) (P, 1, Q) 12 models were selected for further investigation and their lag orders were estimated as follows.

Graphical Methods for white Noise Test
Residual correlogram up to lag 36 is shown in Fig.  5. All of the ACF and PACF of the residual values at various lags except for lag order 25 were settled within tolerance interval at 95% confidence limits. This shows the existence of no significant correlation between residuals. Hence, the residuals are random or white noise. Therefore, the selected model can be considered as an appropriate model.

Ljung-Box Test for white Noise
Another test employed to check the residuals independence was Ljung-Box-Pierce Chi-Square statistics. From residuals autocorrelation test up to order 36, the Ljung-Box Q' statistics is 29.60 with p-value 0.38 which is larger than α = 0.05. Hence, the autocorrelations of residuals were considered white noise. Based on the above diagnostic checking, the selected SARIMA (1, 0, 1) (3, 1, 3) 12 model is satisfactory to signify the Koga River flow series. Therefore, the model is fit for future KogaRiver flow forecasting.

SARIMA Forecasting
A historical and model forecast graph by SARIMA (1, 0, 1) (3, 1, 3) 12 is shown in Fig. 6. The mathematical expression of this model shown by Equations 6 was developed by substituting significant parameters from Table 2 into Equation 2. As the SARIMA forecast values are log-transformed, the actual river flow forecast model is expressed by Equation 7. Therefore, the river flow time series of the actual forecast values is shown in Fig.  7. In addition to its best performance, the pattern (trend and seasonality) of this forecast graph from 2012 to 2018 is similar with that of historical ones. This is supported by ADF and KPSS unit root tests for the trend of forecast values. As the computed p-values of ADF test is greater than the significance level (a = 0.05), the time series of Koga River flow forecasts have no trend.
In addition, as the computed p-value from KPSS test is greater than one at a = 0.05, the forecasted time series is stationary. The additional use of the stationary test of trend and seasonality patterns of forecast values using ADF and KPSS tests is solely new in time series study. In another way, this method was not used in any one of similar published papers. Its use increases researcher's accuracy in selecting best models in addition to using performance indices like Akaike Information (AI) criterion. There are models with minimum AI values but with constant forecast flow pattern (lacking stochastic component). This technique avoids selection of such models. Thus, based on time series plot observation and ADF and KPSS tests, the forecasted time series is stationary. As the model is good in capturing Koga River flows, the forecast information would be of high importance for water resources managers for protecting the irrigators from water scarcity.

Conclusions and Recommendations
This study provides reliable river flow information for efficient planning and operation of reservoir and irrigation scheme. Out of several SARIMA models evaluated, the most appropriate river flow forecasting model for Koga River is SARIMA (1, 0, 1) (3, 1, 3) 12 . This model can provide future monthly stream flow information into the reservoir which can help reservoir manager to estimate storage levels for the months ahead. Based on forecast information reliable water supply can be determined and river flows into or out of the dam can be regulated. Moreover, forecast information is useful in developing cropping pattern optimization models to select optimum combination of crops that maximize the benefit/yield of an irrigation project. In this case, crop types can be selected and the area under each crop can be determined. This in turn is used to determine farm water demand on which reservoir operation is based. A decision on whether to allow second round cropping or not could also be made based on the actual amount of water stored in the reservoir and future water availability.
Another important application of forecast information is for development of optimal reservoir water release policy or schedule. This in turn avoids critical water shortages throughout irrigation period and water excess at the end of operation period. Moreover, the selected model provides accurate river flow information for developing adaptive real time reservoir operation policies for Koga irrigation scheme. In this case, forecast values are updated or replaced with the actual (measured) values. At the times when the flows are not properly forecasted, planners can add or subtract 95% confidence intervals on the forecast values. Therefore, in conclusion, the developed forecasting model and forecast information could serve as a practical tool for planning, operation and sustainable utilization of scarce water resources for food security and poverty reduction of farmers in the irrigation project and downstream environmental protection. It also plays a great role in controlling famine that sometimes occurs in the other parts of the country. Hence, SARIMA modeling and the followed methodology (the use of stationarity test for forecast flow patterns) could be utilized for forecasting of any other rivers of clear seasonal flow patterns, which are located in rural catchments as the river flow is a cumulative effect of river, catchment and rainfall characteristics. However, as SARIMA modeling is time-consuming and dependent of human judgment, automatic method of best model identification and selection is recommended in the future study.