FORECASTING THE FINANCIAL RETURNS FOR USING MULTIPLE REGRESSION BASED ON PRINCIPAL COMPONENT ANALYSIS

The aim of this study was to forecast the returns f or the Stock Exchange of Thailand (SET) Index by ad ding some explanatory variables and stationary Autoregre ssiv order p (AR (p)) in the mean equation of retu rns. In addition, we used Principal Component Analysis ( PCA) to remove possible complications caused by multicollinearity. Results showed that the multiple regressions based on PCA, has the best performance .


INTRODUCTION
In order to forecast the return r t for specific purposes, many researchers have made different assumptions for µ t as appears in Equation (2). Kyimaz and Berument (2001) assume µ t to be a regression model with a one-week delay; Supoj (2003) assumes µ t to be an autoregressive process; Ozturk (2008) assumes µ t to be a constant and Sattayatham et al. (2012) assume µ t to be an ARMA process with a one-week delay.
The financial returns r t ( t t t -1 r = 100 × ln(P / P ) for t = 1,2,…,T-1, P t denoting the financial price at time t depend concurrently and dynamically on many economic and financial variables. Since the returns have a statistically significant autocorrelation themselves, lagged returns might be useful in predicting future returns. In order to model these financial returns ssumes that r t follows a simple time series model such as a stationary AR (p) model with some explanatory variables X it . In other words, r t satisfies the following Equation 1: Here P it denotes the financial price asset i for i = 1,2,…,n at time t, r t-j , j = 1,2,….,p is the returns at lag j-th, ε t represents errors assumed to be a white noise series with an i.i.d. mean of zero and a constant variance 2 ε σ , µ 0 ,α i and β j are constants and n, p are positive integers.
Note that the variance of errors ε t in the model (2) is assumed to be a constant; some authors use this assumption in the modeling of ground-level ozone (Agirre-Basurko et al., 2006;Pires et al., 2008).
The objective of this study is to forecast returns for the SET Index by using model (1). We vary the process µ t using four different types and compare the performance of the different types.
In the next section, we present the basics of principal component analysis to remove possible complications caused by the multicollinearity of explanatory variables.
The empirical study and methodology is discussed in section 3. Forecasting the returns is described in section 4 and the conclusions are presented in section 5.

PRINCIPAL COMPONENT ANALYSIS
An important topic in multivariate time series analysis is the study of the covariance (or correlation) structure of the series. For example, the covariance structure of a vector return series plays an important role in portfolio selection. In what follows, we discuss some statistical methods useful in studying the covariance structure of a vector time series.
Given a m-dimensional random variable t 1t 2t nt t-1 t-p R = (X ,X ,...,X ,r ,...,r )' with covariance matrix R ∑ , a Principal Component Analysis (PCA) is concerned with using a few linear combinations of R t to explain the structure of Σ R . If R f denotes the monthly log returns of m assets, then PCA can be used to study the source of variations of these m asset returns. Here the keyword is few so that simplification can be achieved in multivariate analysis. PCA applies to either the covariance matrix Σ R R ∑ or the correlation matrix (ρ R ) of R f . Since the correlation matrix is the covariance matrix of the standardized random vector * -1 t t R = S R , where S is the diagonal matrix of standard deviations of the components of R t , we use covariance matrix in our theoretical discussion. Let δ i = (δ i1 ,...,δ im )' be a mdimensional vector, where I = l,…m.
∑ R jt is a linear combination of the random vector R t . If R t consists of the simple returns of m stocks, then Z it is the return of a portfolio that assigns weight δ ij to the jth stock. Since multiplying a constant to δ i does not affect the proportion of allocation assigned to the jth stock, we standardize the vector δ i so that m ' 2 i i ij j=1 δ δ = δ = 1 ∑ . Using properties of a linear combination of random variables, we have The idea of PCA is to find linear combinations δ i such that Z it and Z jt are uncorrelated for i≠j and the variances of Z it are as large as possible. More specifically: • The first principal component of R t is the linear combination Z 1t = δ 1 ' R t that maximizes Since the covariance matrix Σ R is non-negative definite, it has a spectral decomposition. Let If some eigenvalues λ i are equal, the choices of the corresponding eigenvectors e i and hence Z it are not unique.
In addition, we have The result says that: Consequently, the proportion of total variance in R t explained by the ith principal component is simply the ratio between the ith eigenvalue and the sum of all eigenvalues of Σ R . One can also compute the cumulative proportion of total variance explained by the first i principal components (i.e., one selects a small i such that the prior cumulative proportion is large. In order to cope with the problem of multicollinearity, we transform the explanatory variables in model (1) into the principal components. Then the new model for forecasting r t is Equation 3: where, Z it , i = 1,2,…,m are i-th principal components of explanatory variables at time t.

JMSS
We follow Tsay (2005) by assuming that the asset return series r t is a weekly stationary process.

EMPIRICAL STUDIES AND METHODOLOGY
Naturally, the Thai stock market has unique characteristics, so the factors influencing the price of stocks traded in this market are different from the factors influencing other stock markets (Chaigusin et al., 2008). Examples of factors that influence the Thai stock market and the statistics used by researchers who have studied these factors in forecasting the SET Index are shown in Table 1.

Data
The data sets used in this study are the daily return closing prices for the SET Index at time t (dependent variables) and the daily return closing prices for twelve factors (explanatory independent variables). These twelve factors are the following: The actual closing prices for these twelve factors were obtained from http://www.efinancethai.com. We used data sets from April 5, 2000, to July 5, 2012. We divided these data into two disjoint sets. The first set, from April 5, 2000, to December 30, 2011, was used as a sample (2,873 observations). The second set, from January 3, 2012, to July 5, 2012, was used as out-ofsample (125 observations). The plot for the SET Index closing prices and returns is given in Fig. 1.
Descriptive statistics and the correlations matrix are given in Table 2 and 3. As can be seen from Table 3, there are highly significant correlations (p<0.01) between the dependent variables and the explanatory variables. Therefore, these explanatory variables were used to predict the SET Index. Also, there are highly significant correlations (p<0.01) among the explanatory variables. From Table 4 there are significant correlations between SET and lagged returns of the SET with first and second laggs. These correlations provide a measure for the linear relations between two variables and also indicate the existence of multicollinearity between the explanatory variables. However, multiple regression analysis based on this dataset also shows that there was a multicollinearity problem with the variance inflation factor (VIF> = 5.0) as shown in Table 2. One approach to avoid this problem is PCA. Hence, we used twelve explanatory variables to find the principal components and overall descriptive statistics for selected Principal Components (PCs), as shown in Table 5 and 6, respectively.

Table 1. Impact factors on the Stock Exchange of Thailand Index (SET Index) Factors
Researchers

FORECASTING THE RETURNS THE SET INDEX BY MEAN EQUATIONS
In this section, we forecast the returns for the SET Index (r t := µ t + ε t ) using three mean equations (µ t ): constant, AR (2) and multiple regression based on PCA. Afterwards, we compare error using two loss functions, i.e. Mean Square Error (MSE) and Mean Absolute Error (MAE). The parameters for mean equations for forecasting the SET Index and the value of loss functions are shown in Table 6. We found that the mean equation ARMA (1,1) that includes multiple regression based on PCAs (Table 6) has the best performance (MSE = 0.8886, MAE = 0.7463). So, we use this mean equation for forecasting the returns for the SET Index.

CONCLUSION
We considered the problem of forecasting returns for the SET Index by using a stationary Autoregressive order p (AR (p)) with some explanatory variables. After considering four types of mean equations, we transformed AR and explanatory variables to PC. We found that multiple regressions based on PCA, has the best performance(MSE = 0.8886, MAE = 0.7463).