A Generalized Discrete-Time Long Memory Volatility Model for Financial Stock Exchange

We proposed a simple way to combine a few long memory models in financial market volatility modeling using daily, range and high frequency data. This model was able to fit the return, range of daily return or realized volatility under a parametric heavy-tailed distribution. Model was flexible to include additional volatility information as the contemporaneous variables. Empirical results found that the proposed model provides substantial improvement in the model fitting, specification and most importantly, a better out-of sample forecasting in the Malaysian stock market.


INTRODUCTION
The time varying conditional volatility ARCH-GARCH models [1][2] are widely used in financial time series analysis such as risk management, portfolio analysis and derivative pricing. The presence of long persistence volatility in asset returns [3][4] has further improved the model specification in volatility modelling. Both the ARCH effect and long persistence volatility have enriched the definitions of efficiency market hypothesis [5] . A series of studies [6][7] investigate the market efficiency by analyzing the degree of Hurst's parameter [8] for different financial markets. With the evidence of long persistence property, some extension definitions of market efficiency hypothesis have been proposed such as fractal market hypothesis [9] and heterogeneous market hypothesis [10] . The most prominent application of volatility model is the measurement of value-at-risk(VaR) in risk and portfolio analysis. VaR measurement is directly related to the expected volatility over the relevant time horizon such as daily, weekly and etc. The commercial application of VaR by Morgan [11] with his risk management model, RiskMetrics TM , has successfully applied VaR in portfolio investments. Hence, a volatility model with correct specification is very important in the market efficiency and risk analysis studies.
Besides the model specification issue, the appropriate data handling approach also contributes to the development of volatility modelling. The most common approach is based on the closing prices which daily returns are subsequently calculated. Due to its availability and simplicity, most of the empirical finance literatures using this daily return (squared returns/residual) as the measurement of latent volatility such as GARCH and stochastic volatility models. However, as indicated in [12] , this estimation of volatility can be very noisy. Another alternative measurement of volatility is using the range, the logarithm difference between the highest and lowest prices. The literatures that focus on range-based volatility proxy include the earliest work by Parkinson [13] and Garman and Klass [14] . However, the latent volatility is also not directly measurable with the usage of daily information. Hence, it is hard to identify the outperform models in terms of model fitting performance and specification adequacy. The vast development of recent information and communication technology(ICT) has encouraged the usage of high frequency data as an observable proxy for latent volatility which facilitates a more accurate estimation as well as forecasting. Moreover, the availability of high frequency financial assets information such as stocks, stock indexes, foreign exchange, etc. has facilitated a more accurate estimation as well as forecasting. The study of observable(realized) volatility using high frequencies asset prices are discussed in [15] . The above study has motivated the usage realized volatility in both FX and equity markets due to the remarkably good out-ofsample forecasting.
In this paper, we have selected the Kuala Lumpur stock exchange(KLSE) index transaction prices during the recovery period 1 st January 2003 to 15 January 2006 (745 and 266710 observations for low and high frequency data respectively). The Malaysian stock market had experienced a massive slid due to the major factor of drastic depreciation of Malaysian Ringgit(RM) where the RM(RM2.50 in first half of 1997) was depreciated to the weakest rate recorded against USD as RM4.88 on 9 th January, 1988. Malaysian government had implemented the one USD pegged to RM3.80 in the 1 st September 1998 to stabilize the RM from currency speculation. After the implementation, the Malaysian economics show significant recovery. In our empirical study of recovery period, the Malaysia stock market was speculated by the RM-USD un-pegged regulation(implemented at year middle of 2005 where the RM was expected undervalued by approximately 6.5%), the merged of MESDAQ (http://www.klse.com.my) in KLSE besides the Main board and Second board previously started in year 2002, the fluctuating of petrol prices, etc. We intended to study the long persistence volatility, the presence of risk premium and how the market participants react with respect to good and bad events using the low and high frequency data.
In order to account the stylized empirical facts of the volatility, we proposed a combination of ARFIMAX, HAR and ARCH-type generalized models. The model can flexibly include the additional volatility information (such as weekly, monthly) as the contemporaneous variables. A battery of statistical tests has been employed to diagnose the model specifications. As a result, the proposed model shows substantial improvement in in-sample estimation compare to other models. For forecasting performance evaluations, the proposed model with the inclusion of additional contemporaneous variable is outperformed compare to other models.

MATERIALS AND METHODS
Data Source: In our empirical study, the KLSE index transaction prices are obtained from the DATA STREAM for 745 daily closing prices. On the other hand, the high frequency data1 is available from the data vendor which consists of 266710 data points. The availability of KLSE high frequency (minutely) data is not as popular as other mature market such as S&P 500, NIKKEI 250, CAC40, etc. where minutely data can be obtained up to 15 years start from 1991. The original dataset includes prices for every trade for 1-minute interval data. However, we further extract the 5 and 20minute intervals in order to study the statistical behaviour of the emerging market. As a comparison to the mature market such as S&P500 etc., we like to study whether the most common 5-minute interval is also suitable for the Malaysian emerging stock market. The interday return series, r t is defined as the close-toclose prices on consecutive trading days. The percentage interday returns can be expressed as: The proposed HARFIMA-GARCH model: In this paper, we proposed a flexible volatility model which is able to form the ARCH-type, range-based and realized volatility model specifications. In addition, this model is capable to include the possible past realized volatility with different frequencies such as weekly or monthly information as the contemporaneous variables. In short, the proposed model integrates the ARFIMAX, HAR and FIGARCH models in general. The maximum likelihood estimation uses the iterative optimization algorithm to determine the second derivatives (Hessian matrix) under the standardized t-distribution with υ>2 degree of freedom.
The HARFIMA(f k ,p 1 ,d 1 ,q 1 )-FIGARCH(p II ,d II ,q II ) can be expressed as follows: where • y I,t and y II,t represent either the return, rangebased volatility, return's volatility or volatility of realized volatility series according to a specific model: -For ARCH-type model: the y I,t and y II,t represent the r t and 2 t σ respectively; -For range-based volatility model: the volatility estimator is adopted from the Parkinson [13] and Garman and Klass [14] approaches with the assumption of the expected return is equal to zero where the mean return is not statistically different from zero at 5% level under the t-test(t-statistic 1.7521) ; -For realized volatility model, Martens [15] suggested to use a scaled sum of squared intraday returns to represent the scaled realized volatility as follow: suggests that the financial stock market volatility tends to rise in response to good (bad) news and responses reversely to bad (good) news; • c 1 and c 2 denote the risk-return tradeoff. If the return-volatility poses a positive relationship, we assume that for a more volatile(riskier) securities, the rational market participants require a greater risk premium. On the other hand, if the relationship is negative, it implies that the market participants are more favorable in saving; • α I (B), β I (B), α II (B) and β II (B) denote the stationary finite polynomials in the backshift or lag operators with B k y t =y t-k where in general: • d I and d II represent the long memory parameters with the constraint vary to their respective models. The (1-B) d can be expressed as an infinite binomial distribution for non-integer powers: The HARFIMA(f k ,p 1 ,d 1 ,q 1 )-FIGARCH(p II ,d II ,q II ) model includes other models as special cases: • The ARFIMAX of Granger (1980) when c h,k =0 (k= weekly and monthly) with no conditional heteroscedastic volatility; • The Heterogeneous AutoRegressive with time varying volatility model [16] when represents the contemporaneous realized volatility with the coefficient, c h,k , which indicates the significant influences of possible different frequencies of past realized volatility with the k = lagged daily, weekly and monthly and its associates n =1, 5 and 20 respectively; • The joint ARMA-GARCH [2] and ARMA-FIGARCH [3] respectively when c h,k =0 (k= daily, weekly and monthly), d I = c 1 = c 2 = 0; 0< d II <0.5 to ensure a stationary ARMA-FIGARCH. The shock term, a t , follows a conditional time-varying variance and the ε t~i id, with certain specific parametric distributions; The ARFIMA-FIGARCH when c h,k = c 1 = c 2 = 0. Davidson [17] argued that the d I in ARFIMA is structural different from d II where the persistence is increases when d I approaches 0.5 compare to d II approaches 0. The reverse behaviour may be due to the parameter acts directly on the squared errors but not on the conditional variance.

One-day-ahead Forecasting evaluation:
The out-ofsample forecasting is evaluated by using some standard statistical measurements such as mean squared error(MSE), mean error(ME), mean absolute error(MAE) and Mincer-Zarnowitz Regression [18] . For Mincer-Zarnowitz regression, 2 t σ is the proxy of the actual volatility (realized volatility) for time period t and 2 t σ is the forecasted conditional variance for time t.
The simple linear regression model is illustrated as follow: σ (13) Conditioning upon the forecast, the forecast is unbiased and optimal only if a=0 and b=1. The determinant coefficient, R 2 indicates the power of predictability of the selected models.

RESULTS AND DISCUSSION
The results in Table 1 show that all the standard deviations for the realized volatility, RV t,I are slightly greater than RV t,II for all the selected frequencies which explained that with the presence of lunch-break and over-night effects, the realized volatility become more noise. Due to this, we stick to the RV t,II results for further analysis. The scaling value (1+c) is approximately 1.111 which is lower than the result reported by Marten [15] of 1.205 in S&P500 future index series.    We have selected three properties of realized volatility namely the normality, the long persistence behavior and the risk premium to study in this session. Table 2  and logarithm 20-minute interval indicates the most closely to a Gaussian distribution with the kurtosis close to 3 (3.0387, 3.3209, 3.5384 respectively) and skewness nearly zero (0.3829, 0.2827,0.2578 accordingly). We further investigate the Q-Q plots for squared-returns and the 20-minute interval that are illustrated in Fig 1. The lnRV t indicates approximate a linear line compare to return-squared which is skewed to the right. Our results are different from the work by Andersen and Bollerslev [12] which suggested 5-minute interval in FX. In our study, the dissimilar outcome might cause by the low trading in the emerging market. In addition, [19] shown that 25-minute and 15-minute returns provided the optimal sampling frequency in their studies.  The observable volatility is represented by the realized volatility with 20-minute interval. Notes: a, b and c denote 10%, 5% and 1% level of significance. The values in the parentheses represent the p-value Next, we exercise the variance-time plot and rescaled-range analysis to examine the Hurst's parameter. In Table 3, the lnRV t series shows the strongest persistency with the value 0.725, follows by (0.675) and finally the squared-return (0.560). The long persistence behaviour encourages us to model the realized volatility using a fractionally integrated model. Finally, the ARCH-Mean effect and news impact are examined by investigating the scatter plots with regression line for the realized volatility and current and lagged returns respectively. In Fig. 2, the scatters plots display an ordinary least squares(OLS) regression line with a constant and lnRV t . From the plot of 20-minute realized volatility and current return, the R 2 shows insignificant linear relation between the lnRV t and current return which suggests the absence of ARCHmean effect. However, for lnRV t and lagged return, the lagged positive and negative returns yield significant occurrences of high volatility as indicated in Fig. 2. This implied that the news impact may be significant in model estimation. The formal test for risk premium and relation between the news and realized volatility are carried out in the next session. Table 4 reports the estimation results of realized volatility, range-based and ARCH-type models with the student-t distributed innovation, ε t . For all the realized volatility and range based models only the coefficient, c 2 , is significantly different from zero which indicated that only the lagged negative returns yield great volatility.  table reports the descriptive statistics for all the residual and standardized residual series. All the series exhibit leptokurtic with kurtosis around 3.500 which is biased from normal distribution. Due to this, we calculated both the normality assumption and bootstrapped p-values for the BDS test statistics. Both the p-values show similar results and this is acceptable because our series do not differ a lot from the normal distribution. p and p b denote the p-value for the BDS test statistic for assumption of asymptotic normal distribution and bootstrapped pvalue respectively. (7): Engle and Ng [21] news impact test based on the regression This leverage effect implied that downward movement (shock) in the stock market is followed by a greater volatility than upward movement of the same magnitude. In addition, the negative value of c 2 implied that the market participants are mostly in the precaution condition or only observed the market movement instead of taking part in the stock market. In another word, the market participants are preferred to save their money than taking the risk. This may be due to the bad experience or lack of confidence to the volatile market.

Maximum likelihood estimation results:
All the models fitted well in a heavy-tailed studentt distribution with degree of freedom (υ) approximately 12 except for range-based model. This stylized fact is commonly seen in almost all the financial asset pricing. The four long memory models indicated long memory parameters, ds with the 0.2674, 0.1293, 0.2648 (for the three former realized volatility models) and 0.2842 for the range-based model respectively. For the proposed HARFIMAX-GARCH model, the reduction of the long persistence behaviour is due to the inclusion of weekly volatility information as compare to the original HAR-GARCH with news impact model. This implied that the contemporaneous variables, for some extent but not fully explained the initial volatility model specification.
Overall, the highest log-likelihood value is -104.78 (HARFIMAX-GARCH) with the smallest AIC (0.3491) as compare to others. However, the BIC value for HARFIMAX-GARCH is suffered for greater BIC value 0.4299 as compare to others due to the additional parameters. For ARCH-type models, we have selected AR-GARCH and ARFIMA-FIGARCH models to compare the in-sample estimations as a whole. Both the models show almost similar results in log-likelihood function and estimation performance evaluations. The ARCH-type model indicated better in-sample estimation as compare to range-based model, however, inferior if compare to the realized volatility models. Table 4, the ARFIMAX's exhibits possibility of conditional heteroscedascity in their squared-residuals and LM-ARCH test and BDS test [20] as well. This implied that the most parsimony ARFIMAX model is not adequately specified with the existence of conditional heteroscedastic effect. On the other hand, the HARX-GARCH suffered from the misspecification in the standardized residual as well as the sign bias test under 5% and 10% significant levels respectively. Except the above mentioned models, Table 3 shown that all the volatility models are not significant at 1% level for the Ljung-Box test, serial correlation and ARCH effect test respectively. The further discussion of the BDS test is illustrated in the footnote of Table 4. On the other hand, the sign-bias and negative/positive tests for all the squaredstandardized residual o the studied models show no evidence of unexplained non-linearity, sign or size bias in the negative/positive side at 1% significance level. As a result, the asymmetric models with news effect are adequately estimating the conditional standard deviations.

Diagnostic results: In
One-step-ahead forecasting evaluations: The insample estimations sometimes do not guarantee outperforms forecasting results. In order to verify this, we conducted a 50 one-step-ahead daily forecast for the R 2 =0.0000 R 2 =0.0000 KLSE stock index. The one-day horizon forecast comparisons are based on MSE, ME, MAE and Mincer-Zarnowitz Regression. As indicated in Table 5, the ARCH-type models show smallest MSE, ME and MAE results. This followed by the realized-volatility models and finally the range-based models. However, the Mincer-Zarnowitz regression analysis indicates that the estimated parameter (b) for ARCH-type models is not significantly different from zero. This implied that the ARCH-type models could not be used to predict future volatility. On the other hand, the realized volatility and range-based models show strong rejection of the b parameter and the coefficients of determination, R 2 , can be ranked based on the nearest to unity.
The HARFIMAX-GARCH shows the highest b and R 2 (0.8690 and 0.0984) as compare to the other models. This implied that the out-of-sample forecasting for HARFIMAX-GARCH is superior as compare to the ran

CONCLUSION
In this paper, we constructed a generalized model which is able to account for time-varying heteroskedasticity, long-persistence and leverage effects in a specific case of Malaysian stock market volatility for low and high frequency data. Our empirical result found that the contemporaneous different time horizons realized volatilities have substantial reduced but not fully eliminated the volatility persistence which may cause by other unknown sources of volatility. In future work, the Value-at-risk (VaR) for long and short trading positions can be determined based on the estimated volatility model.