FORECASTING RETURNS FOR THE STOCK EXCHANGE OF THAILAND INDEX USING MULTIPLE REGRESSION BASED ON PRINCIPAL COMPONENT ANALYSIS

The aim of this study was to forecast the returns f or the Stock Exchange of Thailand (SET) Index by ad ding some explanatory variables and stationary Autoregre ssiv Moving-Average order p and q (ARMA (p, q)) in the mean equation of returns. In addition, we used Principal Component Analysis (PCA) to remove possib le complications caused by multicollinearity. Afterwar ds, we forecast the volatility of the returns for t he SET Index. Results showed that the ARMA (1,1), which in cludes multiple regression based on PCA, has the be st performance. In forecasting the volatility of retur ns, the GARCH model performs best for one day ahead ; and the EGARCH model performs best for five days, t en days and twenty-two days ahead.


INTRODUCTION
In order to forecast the return r t for specific purposes, many researchers have made different assumptions for µ t as appears in Equation (2). For example, Sopipan et al. (2012) assumes µ t to be a constant and Sattayatham et al. (2012) assume µ t to be an ARMA process with a one-week delay. Caporale et al. (2011) assume the returns employ both fractional and non-fractional models.
The financial returns r t (r t = 100. In (P t /P t-1 ) for t = 1, 2, …, T-1, P t denoting the financial price at time t depend concurrently and dynamically on many economic and financial variables. Since the returns have a statistically significant autocorrelation themselves, lagged returns might be useful in predicting future returns. In order to model these financial returns, Tsay (2010) assumes that r t follows a simple time series model such as a stationary ARMA (p, q) model with some explanatory variables X it . In other words, r t satisfies the following Equation 1: Here P it denotes the financial price asset i for i = 1, 2,…, n at time t, r t-5 S = 1, 2, …, P is the returns at lag s-th, ε t represents errors assumed to be a white noise series with an i.i.d. mean of zero and a constant variance 2 ε σ , µ 0 , β I , φ 1 , θ m are constants and n, p and q are positive integers.
Generally, the variance of errors ε t in the model (1) is assumed to be a constant; some authors use this assumption in the modeling of gold prices (Ismail et al., 2009). But in this study, we shall consider the case where Science Publications JMSS the variance of ε t is not constant. That is, we shall introduce the heteroskedasticity model to forecast the volatility of returns using GARCH, EGARCH, GJR-GARCH and Markov Regime Switching GARCH (MRS-GARCH) with distribution normal, student-t and General Error Distribution (GED).
The objective of this study is to forecast returns for the Stock Exchange of Thailand (SET) Index by using model (1). We vary the process µ t using four different types and compare the performance of the different types. Moreover, we forecast the volatility of returns with heteroskedasticity models.
In the next section, we present the basics of principal component analysis to remove possible complications caused by the multicollinearity of explanatory variables and the volatility models. The empirical study and formulae for model estimation are given in section 3. The methodology and results are presented in section 4 and the conclusions are presented in section 5.

Principal Component Analysis
The given is an n-dimensional random variable X t = (X 1t 'X 2t ,…,X rt )' with covariance matrix Σ x , where (.)' denotes the transpose matrix. Principal Component Analysis (PCA) is concerned with using fewer linear combinations of X i to explain the structure of Σ x . If X it denotes returns as appears in (2) for i = 1,2,..,n, then PCA can be used to study the source of variations of these n returns.
In order to cope with the problem of multicollinearity, we transform the explanatory variables in model (1) into the principal components. Then the new model for forecasting r t is Equation 4: where, Z it i = 1,2,…,n are i-th principal components of explanatory variables at time t. We follow Tsay (2010) by assuming that the asset return series r t is a weakly stationary process.

Volatility Models
Since we aim to find a suitable model for the volatility of r t we shall give a brief review of some known volatility models of interest to us. These models are GARCH, EGARCH, GJR-GARCH and MRS-GARCH.
The GARCH (1,1) model is as follows: t where η t is i.i.d. distributed with zero mean and unit variance,a 0 ,a 1 >0 and β 1 >0 to ensure positive conditional variance. The inequality a 1 +β 1 <1 must be satisfied for a stationary covariance process of returns.
The Exponential GARCH (EGARCH) model was coped with the skewness often encountered in financial returns. The EGARCH (1, 1) model is defined as: where, ξ is the asymmetry parameter to capture the leverage effect.
The GJR-GARCH model was accounted for the leverage effect; it is a model that allows for different impacts of positive and negative shocks on volatility. The GJR-GARCH (1,1) model takes the following form: where, I {εt-1>0} is equal to one when ε t-1 is greater than zero and equal to zero elsewhere. The conditions a 0 >0,(a 1 +ξ)/2>0 and β 1 >0 must be satisfied to ensure positive conditional variance. The Markov Regime Switching GARCH (MRS-GARCH) model has two regimes which can be represented as follows: where, S t = 1 or 2, h st is the conditional variance with measurable functions of F t-τ for τ≤T-1. In order to ensure easily the positivity of conditional variance, we impose the restrictions a 0,st >a 1,st ≥0 and β 1,st ≥0. The sum a 1,st +β 1,St measures the persistence of a shock to the conditional variance.

EMPIRICAL STUDIES AND METHODOLOGY
Naturally, the Thai stock market has unique characteristics, so the factors influencing the price of stocks traded in this market are different from the factors influencing other stock markets (Chaigusin et al., 2008). Examples of factors that influence the Thai stock market and the statistics used by researchers who have studied these factors in forecasting the SET Index are shown in Table 1.

Data
The data sets used in this study are the daily return closing prices for the SET Index at time t (dependent variables) and the daily return closing prices for twelve factors (explanatory independent variables). These twelve factors are the following: The actual closing prices for these twelve factors were obtained from http://www. efinancethai.com. We used data sets from April 5, 2000, to July 5, 2012. We divided these data into two disjoint sets. The first set, from April 5, 2000, to December 30, 2011, was used as a sample (2,873 observations). The second set, from January 3, 2012, to July 5, 2012, was used as out-ofsample (125 observations). The plot for the SET Index closing prices and returns is given in Fig. 1.
Descriptive statistics and the correlations matrix are given in Table 2 and 3. As can be seen from Table 3, there are highly significant correlations (p<0.01) between the dependent variables and the explanatory variables. Therefore, these explanatory variables were used to predict the SET Index. Also, there are highly significant correlations (p<0.01) among the explanatory variables. These correlations provide a measure for the linear relations between two variables and also indicate the existence of multicollinearity between the explanatory variables. However, multiple regression analysis based on this dataset also shows that there was a multicollinearity problem with the Variance Inflation Factor (VIF> = 5.0) as shown in Table 2. One approach to avoid this problem is PCA. Hence, we used twelve explanatory variables to find the principal components and overall descriptive statistics for selected Principal Components (PCs), as shown in Table 4 and 5, respectively.

Results of Principal Component Analysis
Bartlett's sphericity test for testing the null hypothesis where the correlation matrix is an identity matrix was used to verify the applicability of PCA. The value of Bartlett's sphericity test for the SET Index was 18,167.07, which implies that the PCA is applicable to our datasets ( Table 2). Moreover, Kaiser's measure of sampling adequacy was also computed as 0.788, which indicates that the sample sizes were sufficient for us to apply the PCA. The results for PCA (

Forecasting the Returns the Set Index By Mean Equations
We forecast the returns for the SET Index (r t : = µ t +ε t ) using four mean equations (µ t ): Constant, ARMA (1, 1), multiple regression based on PCA and ARMA (1,1), which includes multiple regression based on PCA. Afterwards, we compare error using two loss functions, i.e., Mean Square Error (MSE) and Mean Absolute Error (MAE).  Researchers

Forecasting the Volatility of Returns the Set Index
We applied Ljung and Box to test serial correlation for returns (r t ) and squared mean returns adjusted (r tµ t ) 2 where µ t is the mean equation in Table 5 (No. 4).
We used a specified lag from the first to the tenth lags and we used the twenty-second lag in Table 6. Serial correlation for returns is confirmed as stationary because the Autocorrelation Function (ACF) values decrease very fast when lags increase and this is confirmed by the Augmented Dickey-Fuller Test (-52.76**) as in Table 1. We analyzed the significance of autocorrelation in the squared mean adjusted returns series with the Ljung-Box Q-test and used Engle's ARCH test to test the ARCH effects. Therefore, the squared mean for the adjusted return is non-stationary, which suggests conditional heteroskedasticity. Table 9. Result loss function of out-of-samples with forecasting volatility Panel A. Out of sample for one day ahead and five days ahead (short term)

JMSS
One day ahead Five days ahead

EMPIRICAL METHODOLOGY
This empirical section adopts the GARCH type and the MRS-GARCH (1, 1) models to estimate the volatility of the returns on the SET Index. The GARCH type models considered are the GARCH (1, 1), EGARCH (1,1) and GJR-GARCH (1,1). In order to account for the fat-tailed feature of financial returns, we considered three different distributions for the innovations: Normal (N), Student-t (t) and Generalized Error Distributions (GED).

GARCH Type Models
Panel A of Table 7 presents an estimation of the results for the GARCH type models. It is clear from the table that almost all parameter estimates in the GARCH type models are highly significant at 1%. However, the asymmetry effect term ξ in the EGARCH models is significantly different from zero which indicates unexpected negative returns, implying higher conditional variance as compared to the same-sized positive returns. All models display strong persistence in volatility ranging from 0.8895 to 0.9572, that is, volatility is likely to remain high over several price periods once it increases.

Markov Regime Switching GARCH Models
The estimated results and summary statistics for the MRS-GARCH models are presented in Panel B of Table  7. Most parameter estimates in the MRS-GARCH are significantly different from zero at least at the 95% confidence level. But a 1 ,β 1 are not significantly different in some areas. All models display strong persistence in volatility ranging from 0.6650 to 0.9892, that is,

JMSS
volatility is likely to remain high over several price periods once it increases.

In-Sample Evaluation
We used various goodness-of-fit statistics to compare volatility models. These statistics are the Akaike Information Criteria (AIC), the Schwarz Bayesian Information Criteria (BIC) and the Log-Likelihood (LOGL) values. Table 8 presents the results for the goodness-of-fit statistics and loss functions for all volatility models. According to the BIC, the MSE2 and the QLIKE, the GJR model performs best in modeling SET Index volatility. However, the contrast in the AIC, the LOGL, the MSE1, the R2LOG, the MAD2 and the MAD1 suggests that the MRS-GARCH performs best.

Forecasting Volatility in Out-of-Samples
We investigate the ability of the GARCH, EGARCH, GJR-GARCH and MRS-GARCH models to forecast volatility for the SET Index out-of-sample set. In Table 9, we present the results for loss function for out-of-samples in forecasting volatility for one day ahead, five days ahead (short term), ten days ahead and twenty-two days ahead (long term). We found the GARCH model performs best for one day ahead; the EGARCH model performs best for five days, ten days and twenty-two days ahead.

CONCLUSION
We considered the problem of forecasting returns for the SET Index by using a stationary Autoregressive Moving-average order p and q (ARMA (p, q)) with some explanatory variables. After considering four types of mean equations, we found that ARMA (1, 1), which includes multiple regressions based on PCA, has the best performance (MSE = 0.5393, MAE = 0.5947). In forecasting the volatility of the returns for the SET Index, GARCH type models such as GARCH (1, 1), EGARCH (1, 1), GJR-GARCH (1, 1) and MRS-GARCH (1, 1) were considered. We found that the GARCH (1, 1) model performs best for one day ahead and the EGARCH (1, 1) model performs best for five days, ten days and twenty-two days ahead respectively.