SELECTING THE COVARIANCE STRUCTURE IN MIXED MODEL USING STATISTICAL METHODS CALIBRATION

In this article the analysis of experiment of repea ted measures design is considered which is used oft n in different fields of studies. In order to analyze th experiment of repeated measures design efficie ntly we need to select the suitable covariance structure which required a lot of efforts. In the current pa per an approach is used to guide the selection of the c ovariance structure for the analysis of repeated measures design with high rate of success. Five wel l known model selection criteria are used in the approach. Simulation study is used to evaluate the approach in terms of its ability to select the righ t covariance structure. The evaluation of the approac h was in terms of its percentage of times that it identifies the right covariance structure. Overall, the simulation study showed excellent performance for the approach in all the study cases. The main r esult of our article is that we recommend consideri ng the approach as a standard way to select the right covariance structure.


INTRODUCTION
The correct analysis of a study according to the design of experiment used is very important factor to the success of any study. An inaccurate analyzed of a study can produce misleading results for that study. Repeated measures experimental designs require special attention, since in practice the observations within each subject are more likely to be correlated with different covariance structures that makes their analysis different from other factorial experiments (Bellavance et al., 1996;Gill, 1992;McCulloch, 2003). Considering the right covariance structure for the observations within each subject is an important aspect of the analysis of repeated measures experiment; this is where the dependency due to the repeated measures is taken into account.
The mixed procedure of the SAS System is used for analyzing data of repeated measures experiment since it has the capability of fitting the data with different covariance structure according to linear mixed model setup (Littell et al., 1999). There was a lot of attention in the earlier history of the linear mixed model on adequately modeling the covariance structure (Chi and Reinsel, 1989;Diggle, 1988;Goldstein et al., 1994;Keselman, et al., 1998;1999a;Núñez-Antón and Zimmerman, 2000). Therefore the first step need to be considered in the statistical analysis of data of repeated measures experiment is deciding what the suitable covariance structure of the data is. Researchers often use the information criteria such as AIC, (Akaike, 1974), BIC, (Schwarz, 1978), CAIC, (Bozdogan, 1987), HQIC, (Hannan and Quinn, 1979) and AICC, (Hurvich and Tsai, 1989), for deciding what the correct covariance structure is according on the observed data (Keselman et al., 1999b;Littell et al., 2000;Singer, 1998). Many studies have investigated the performance of those information criteria in selection of the covariance structure (Yanosky, 2007; Science Publications JMSS Ferron et al., 2002;Gomez et al., 2005;Guerin and Stroup, 2000;Keselman et al., 1999b). Unfortunately, those criteria do not always select the correct covariance structure and thus possible impacted of misspecification of the covariance structure on statistical properties of the inferences must be taken to account (AL-Marshadi, 2008;Yanosky, 2007;Ferron et al., 2002;Gomez et al., 2005;Guerin and Stroup, 2000;Keselman et al., 1999a). Ferron et al. (2002) found that the AIC on average select the correct covariance structure about 79% of the time and the SBC select the correct covariance structure less frequently, on average 66% of the time. In contrast, (Keselman et al., 1998) found that the AIC and SBC were only success in select the correct covariance structure 47 and 35% of the time respectively. Resent a Monte Carlo simulation study investigated the misspecification impact of the covariance matrix in the linear mixed model (Brandon, 2013).
Our research objective is evaluating the approach in selecting the right covariance structure, where the evaluation of the approach was in terms of its ability to identify the right covariance structure.

METHODOLOGY
The design of the simulated experiment is quite similar to the setup used in (AL-Marshadi, 2008) which is described below.
The treatments were arranged in a basic form of repeated measures design which consists of a completely randomized experimental design with data collected in a sequence of equally spaced points in time. The design of the simulated experiment is consists of: t = 3 treatments r = 7 or 10 subjects per treatment level and a = 7 repeated measures within each treatment level In mixed procedure, five model selection criteria available, which can be used to select an appropriate covariance structure. The five model selection criteria are: • Akaike Information Criterion (AIC), (Akaike, 1974) • Schwarz Bayesian Information Criterion (BIC), (Schwarz, 1978) • Bozdogan Corrected Akaike Information Criterion (CAIC), (Bozdogan, 1987) • Hannan and Quinn Information Citerion (HQIC), (Hannan and Quinn, 1979) • Hurvich and Tsai the corrected Akaike Information Criterion (AICC), (Hurvich and Tsai, 1989) In this study the previous five information criterions were used with the approach and the approach were evaluated in terms of its ability to identify the right covariance structure.
The algorithm of the approach involves using the bootstrap technique (Efron, 1983;1986) and hierarchical clustering methods with single linkage distance measure approach (Khattree and Naik, 2000) as tools to calibrate with the five information criterion in selecting the right covariance structure. The idea of using the bootstrap to improve the performance of a model selection rule was introduced by (Efron, 1983;1986) and is extensively discussed by (Efron and Tibshirani, 1993).
In the context of the mixed model, the algorithm for using the approach can be outlined as follows.
Let the vector y ij is defined as follows: and i = 1, 2,….,t; j = 1, 2,….,r; k = 1, 2,….,a., (AL-Marshadi, 2008): 1. Generate the bootstrap sample on case-by-case using the observed data (original sample) (i.e., based on resampling from (y i1 y i2 ,…., y ir ) for each of the ith group independently from the others). The bootstrap sample size is taken to be the same as the size of the observed sample (i.e., r). The properties of the bootstrap when the bootstrap sample size is equal to the original sample size are discussed by (Efron and Tibshirani, 1993) (1) and (2) (W) times 4. Researchers often use the previous collection of information criteria in the selection of the right model such as selecting the model with the smallest value of the information criteria (Keselman et al., 1999a;Littell et al., 2000;Singer, 1998 To justify that in short, let us consider each model separately and then each average of information criteria approximately follows normal distribution according to central limit theorem. Therefore, we can consider the averages of the information criteria of each model as a random vector that follows 5-dimensional multivariate normal distribution. In this stage Clustering method will play the main rule in our algorithm by clustering the models of candidate covariance structures to two clusters according to the five correlated variables (the averages of the five information criteria). One of the two clusters will be called the cluster of the best set of models of covariance structures. The cluster of the best set of models will be determined according to the cluster that contains the model of general covariance structure UN (Unstructured covariance structure). Then the best model of covariance structure will be the model of simplest covariance structure in the cluster of the best set of models of covariance structures. We refer to our approach as the Approach of Collaboration of Statistical Methods in Selecting the Correct Covariance Structure (ACSMSCCS).

THE SIMULATION STUDY
A simulation study of mixed model analysis of repeated measures data was conducted to evaluate ACSMSCCS approach in terms of its percentage of times that it identifies the right covariance structure. Kenward and Roger (1997) was considered for computing the denominator degrees of freedom for the tests of fixed effects from all the analyses in this study, where data are simulated under the null hypothesis. Also, the percentage of times that REML failing to converge with normal situation was reported, when the PROC MIXED procedure used REML without any interfering.
Correlated multivariate normal data were generated according to the described experiment (AL-Marshadi, 2008). There were 12 scenarios to generate data involving six covariance structures and two different sample sizes (r = 7 and 10 subjects per treatment). For each scenario, 5000 datasets were simulated. SAS PROC IML code was written to generate the datasets according to the described design (AL-Marshadi, 2008). The algorithm of ACSMSCCS approach was applied to each one of the 5000 generated data sets. The Percentage of times that the ACSMSCCS approach selects the right covariance structure was reported.
Six common covariance matrix structures were used to simulate correlated error models for the simulated experiment. The six settings of the common covariance matrix are given in Table 1 which can be categorized to six common covariance structures. The first one, (Setting No. 1) represents Compound Symmetry (CS) covariance structures. The second one, (Setting No. 2) represents first-order Autoregressive (AR) (1) covariance structure. The third one, (Setting No. 3) represents Toeplitz (TOEP) covariance structure. The fourth one, (Setting No. 4) represents Heterogeneous Compound Symmetry (HCS) covariance structure. The fifth one, (Setting No. 5) Heterogeneous first-order Autoregressive (ARH) (1) covariance structure. The sixth one, (Setting No. 6) represents Unstructured (UN) covariance structure. Table 2 summarizes results of the percentage of times that the ACSMSCCS approach selects the right covariance structure from the six Covariance structures, when W = 10, r = 7. Table 3 summarizes results of the percentage of times that the ACSMSCCS approach selects the right covariance structure from the six Covariance structures, when W = 10, r = 10. Also, the comparison of the results in Table 2 and 3 may suggest that the performance of the approach improved with increasing of sample size.

RESULTS
Finally, Table 4 showed the percentage of times that the PROC MIXED procedure failing to converge when the PROC MIXED procedure used REML without any interfering for all the investigated settings of covariance matrix and W = 10 and r = 7. Table 5 showed the percentage of times that the PROC MIXED procedure failing to converge when the PROC MIXED procedure used REML without any interfering for all the investigated settings of covariance matrix and W = 10 and r = 10. In general, the comparison of the results in Table 4 and 5 may suggest that the convergence problem could be overcome with the increasing the of sample size.

CONCLUSION
In our simulation, we considered repeated measure design, looking at the performance of the ACSMSCCS approach for selecting the right covariance structure with different settings of the covariance structures. Overall, the ACSMSCCS approach provided excellent tool to select the right covariance structure under the investigated covariance structures. In future studies, it would be interesting to investigate the performance of the approach with other experimental designs such as repeated repeated measure design where there are two levels of repeated measures.
In addition, there is a need to consider more covariance structures and clustering algorithm in the future studies.