Estimation under Multicollinearity: A Comparative Approach Using Monte Carlo Methods

Problem statement: A comparative investigation was done experimentally for 6 different Estimation Techniques of a just-identified simultaneous three-equation econometric model with three multi-collinear exogenous variables. Approach: The aim is to explore in depth the effects of the problems of multicollinearity and examine the sensitivity of findings to increasing sample sizes and increasing number of replications using the mean and total absolute bias statistics. Results: Findings revealed that the estimates are virtually identical for three estimators: LIML, 2SLS and ILS, while the performances of the other categories are not uniformly affected by the three levels of multicollinearity considered. It was also observed that while the frequency distribution of average parameter estimates was rather symmetric under the OLS, the other estimators was either negatively or positively skewed with no clear pattern. Conclusion: The study had established that L2ILS estimators are best for estimating parameters of data plagued by the lower open interval negative level of multicollinearity while FIML and OLS respectively rank highest for estimating parameters of data characterized by closed interval and upper categories level of multicollinearity.


INTRODUCTION
One of the most frequently suggested solutions to the problem of multicollinearity in single equation estimation is the use of simultaneous econometric model. In the simultaneous model, the problem of multicollinearity may still exist in the individual equations. If the simultaneous equation solution to this problem is adopted there may be an intolerable rise in the size of the model with the consequent depletion of the number of exogenous variables which are usually required for policy simulation.
Since some measure of multicollinearity has to be tolerated in simultaneous econometric models, this study therefore investigates the comparative performance of six different estimation techniques namely: Ordinary Least Squares (OLS), Limited Information Maximum Likelihood (LIML), Two-Stage Least Squares (2SLS), Indirect Least Squares (ILS), Three-Stage Least Squares (3SLS) and Full Information Maximum Likelihood (FIML) under three different levels of multicollinearity between the multicollinear exogenous variables. The performances of the estimators are evaluated based on the average or mean values of parameter estimates and total absolute bias of parameter estimates. The aim is to explore in depth the phenomena effects and examine the sensitivity of findings to increasing sample sizes and increasing number of replications Goodnight and Wallace (1969), Hoerl and Kennard (1970) and Goodnight and Wallace ( 1972) .
Studies on estimation under multicollinearity effects of simultaneous models revealed that a high degree of multicollinearity among the explanatory variables has a disastrous effect on estimation of the coefficients, β by the OLS Fisher (1966). This method was considered by Hendry (1976), RAY, (1970) and Pleli and Tankovic (2005) as naive approach because the estimators are biased and inconsistent. They however categorized other methods as limitedinformation approach (2SLS, ILS) and full-information approach (3SLS, FIML). Adenomon and Fesojaiye (2008), Agunbiade and Osilagun (2008) merely compared the Seemingly Unrelated Regression (SUR) with the OLS technique and confirmed the superiority of the SUR estimator to the OLS estimators. In the opinion of Ayinde (2007), where he compared OLS with some GLS estimators, he observed that with increasing replications OLS estimator is preferred in estimating all the model parameters at all levels of correlation. However, this opinion negates Pleli and Tankovic (2005) in which they advised an econometrician to avoid the use of naïve approach (OLS) in estimating the parameters of a system of simultaneous equations. Framework of the model: In this study, a Monte-Carlo approach is employed for the following just-identified econometric model having three structural equations: Where: y 1t , y 2t and y 3t = Endogenous or jointly dependent variables x 1t , x 2t and x 3t = Predetermined (exogenous or lagged endogenous) variables The u1t, u 2t and u 3t denote stochastic disturbance terms which are assumed to be independently and identically normally distributed with zero means and finite covariance matrix, β 13 , β 21 , β 32 are coefficients of endogenous variables while the γ 11 , γ 12 , γ 21 , γ 23 , γ 32 , γ 33 are the coefficients of predetermined variables making nine structural parameters for the model.
Express the model (1) in matrix form yields:

MATERIALS AND METHODS
The methodology employed in this study is the Monte-Carlo Approach (MCA). The Monte-Carlo method is the nearest thing to a controlled laboratory type experiment in econometrics Intrilligator et al. (1996); Johnston (1984) , Agunbiade (2007), Carlin et al., (1992), Kmenta and Joseph (1963), Parker (1972), Wagnar (1958), Olayemi and Olayide (1981) and Koutsoyiannis (2008). The MCA has been applied not only to Multicollinearity effect but also to choice of alternative estimators in determining the impact of heteroscedasticity, serial correlation and other violations of basic econometric assumptions on the performance of different estimators in a given study. It is also used to solve problems on both pure and social sciences Belsley et al., (1992), Farrar and Glauber (1967), Feldstein (1973) and Mishra (2004). In order to assemble data that will conform to the model specified, our data series are generated as follows: 1. We set sample sizes N at 100, 200 and 300 and replication numbers R = 200, 400 and 600 for this study. These values are arbitrary although it compares favorably with sample sizes in other similar studies 2. The following numerical values are arbitrarily assigned to each of the structural parameters of the model: β 13 = 1.8 γ 11 = 0.2 γ 12 = 1.2 β 21 = 1.5 γ 21 = 2.5 γ 23 = 2.1 (3) β 32 = 0.9 γ 32 = 0.4 γ 33 = 3.3 3. Values are assigned to each of the elements of the variance-covariance matrix of the disturbance terms of the model at any given sample points: e 7.0 5.0 4.0 5.0 4.5 3.5 4.0 3.5 3.0 4. Values of the predetermined variables x 1t , x 2t and x 3t are generated from a pool of uniformly (0,1) distributed random numbers Kmenta (1971) using the Microsoft Excel package such that the correlation coefficients ρ (x 1 , x 2 ), ρ(x 2 , x 3 ) and ρ(x 1 , x 3 ) are in the following ranges of the three levels of multicollinearity considered: • Relatively highly negatively correlated (ρ xi , xj <-0.05) which is referred to as Lower Open Interval Negative (LON) • Feebly negatively or positively correlated (-0.05≤ρ xi,xj ≤+0.05) which is referred to as Closed Interval Negative or Positive (CNP) • Relatively highly positively correlated (ρ xi,xj >+0.05) which is termed Upper Open Interval Positive (UOP) Consequently there are three sets of X's in each category of the multicollinearity group. We perform the correlation matrices to ascertain the usefulness of data set: 5. Values of the disturbance terms u 1t , u 2t and u 3t are specified to each sample point. A two-stage process is employed to generate these values: • Three sets of random normal series were generated and standardized to obtain independent series ε t of random normal deviates • The generated series are transformed into three series of random disturbance in order to obtain covariance matrix predetermined in step (3) above for the model. The method presented by Nagar (1969) for transformation of independent series of standard random deviates into series of random deviates with zero mean and specified variance-covariance matrices is used for this purpose. This is described below According to Nagar (1969), since ∑ is a positive definite matrix, we can decompose it by a non-singular upper triangular matrix P such that: and Eq. 6 to: 11 12 13 22 23 where, ij ji , i j σ = σ ≠ it can be shown that: The three random disturbance series are thus formed using: 6. The endogenous variables are then generated from the values already obtained for the X's (step 4) and the U's (step 5) and the values assigned to the structural parameters (step (2)). This is most conveniently done using the reduced form model derived as follows: Using our three-equation model: This can be written as: Rewriting Eq. 10, to make y t the subject of the relations, we have: Where: 13 32 13 1 21 21 13 13 32 21 32 21 32 Alternatively, Eq. 11 can be written in terms of its reduced form parameters: where, Π is the reduced form of parameters defined as: Equation 14 Where: The Eq. 17 is used to determine the values of the endogenous variables at each sample point.
7. The final stage of this experiment is the estimation of the structural parameters with the aid of the generated data sets 1t 2t 3t 1t 2t y , y ,y ,x ,x and 3t x . The following estimation methods are employed: • Ordinary least squares method • Two stage least squares method • Limited information maximum likelihood method • Indirect least squares method • Three-stage least squares method • Full-information maximum likelihood method

RESULTS AND DISCUSSION
In theory and as confirmed by Johnston (1984), when an equation is just identified, estimates of parameters obtained by 2SLS, LIML, ILS and 3SLS should be identical. However, the results obtained among the six estimation techniques used in the study revealed that the estimates are virtually identical for the three estimators: LIML, 2SLS and ILS (referred to as L2ILS). The performance of the four categories of the estimators (OLS, L2ILS, 3SLS and FIML) are not uniformly affected by the three levels of multicollinearity. For the three cases of multicollinear exogenous variables the frequency distribution of average parameter estimates under FIML, 3SLS and L2ILS was either negatively or positively skewed with no clear pattern while the distribution was rather symmetric under the OLS ( Fig. 1 and 2). However, the performance of these estimators improved better as sample size increased.
A comparative performance evaluation of the four categories of estimators using the Average of parameter Estimates revealed that: • are peaked at the middle closed interval where exogenous variables are feebly correlated. The reverse is noted for parameters of Eq. 2 (trend type "v") where most estimators attain their minimum when multicollinearity level is feebly closed negative or positive interval • No remarkable asymptotic pattern is noticed in the performance of the estimates of the parameters of each estimator • The performance of estimators is not affected by changes in the replication numbers, that is, no evidence of sensitivity of the distribution of estimators to number of replications which appears to attest to the stability of the results obtained in this study The following are the main findings based on the use of Total Absolute Biases (TAB) of parameters estimates: • The estimates of the absolute biases for the estimators are relatively smaller when compared with some earlier work Oduntan (2004) where he studied two just identified equations • The Model Total Absolute Bias (MTAB) of OLS and 3SLS estimators increased asymptotically while the estimates of MTAB do not reveal any such asymptotic behavior for L2ILS and FIML (though the sample size N = 200 appears to be the turning point) • Model total absolute bias as expected did not reveal any sensitivity to changes in replication • The ranking of estimators based on the Average Model Total Absolute Bias (AMTAB) and the Coefficient of Variation (CV) revealed that the four estimators rank uniformly in the following order: OLS, 3SLS, FIML and L2ILS • As correlation levels changes from LON through CNP to UOP MTAB decreased consistently (\) falls for OLS, rose consistently (/) for L2ILS and has the minimum at the middle level (\/) for FIML. The behavior is inconclusive for 3SLS • Expectedly, the trends ranked as follows in decreasing order of frequency; the concave "\/" type the downward sloping '/' and the capital lamda '/\' • The asymptotic distribution of 'best' estimators revealed that L2ILS are best in estimating the parameters of LON. FIML is best in the CNP while OLS consistently remained the best estimator for positively multicollinear exogenous variables (UOP). The findings are similar when average of estimates was used The Table 1-9 in the appendices are attached as part of the tables generated in the course of the analysis. They were used to arrive at our conclusion. Also, the charts are attached to reveal our claim of the nature of distribution. Equation 2 Equation     Table 3: Performance of estimators using average N = 100, R = 600  Equation

CONCLUSION
This study has established that L2ILS estimators are best for estimating parameters of data plagued by the Lower Open Interval Negative level of multicollinearity, while OLS performed poorly for this category. In the closed interval which yielded estimates that are closest to the true parameter values, FIML ranks highest while OLS gave a clear lead in estimating parameters of data characterized by the UOP. When compared with some earlier research, this study exhibit smaller biases and suggest that the higher number of equations and parameters may likely reduce the adverse effects of multicollinearity. We further, recommend that only CNP estimates should be used when faced with multicollinearity problems.
Finally, since a Monte Carlo simulation technique was used the scope of generalization is unavoidably limited to all the assumptions made in generating the data sets used.

Areas for further research:
The following areas may be explored by other researchers for further contribution to knowledge: • The effect of inclusion of structural equations with different status of identification (just or over identified) • The effect of multicollinearity of exogenous variables under more than three levels of correlation coefficients • The effect of multicollinearity of exogenous variables for models with three structural equations for both upper and Lower triangular