A COMPARISON BETWEEN CLASSICAL AND ROBUST METHOD IN A FACTORIAL DESIGN IN THE PRESENCE OF OUTLIER

Analysis of Variance (ANOVA) techniques which is ba sed on classical Least Squares (LS) method requires several assumptions, such as normality, constant va riances and independency. Those assumptions can be violated due to several causes, such as the presenc e of an outlying observation. There are many eviden t in literatures that the LS estimate is easily affected by outliers. To remedy this problem, a robust proc edure that provides estimation, inference and testing tha t are not influenced by outlying observations is pu t forward. A well-known approach to handle dataset wi th outliers is the M-estimation. In this study, bot h classical and robust procedures are employed to dat a of a factorial experiment. The results signify th at the classical method of least squares estimates instead of robust methods lead to misleading conclusion of the analysis in factorial designs.


INTRODUCTION
In statistics, conducting an experiment is one way to obtain the data. Related to the data obtained, there are important things we need to consider, namely the presence of one or more outliers in the data. This problem has been dealt with in great detail in linear regression problems but may not get much attention in the context of experimental design. The decision to retain or discard outliers depends on the purpose of the study. Many studies have been done when we considered to keep the outlier in the data. Gentleman and Wilk (1975) and John and Drapper (1978) studied about outliers design of experiment in a two-way anova through residual analysis. Few years later, John (1978) incorporated his previous study to discuss the problems that arise in detecting the presence of one and two outliers in factorial experiments.
The presence of outlier, especially in experimental data is responsible for misinterpretation of experimental data which indicate that no abnormalities in the results where in fact it is not. The consequences of the presence of outliers are well known. Nelder (1971) noted that 1% gross error in such an experiment can result in a false inference, while 1 to 10% gross errors are rather rule than exceptions in reality. Bhar and Gupta (2001) pointed out that even a single outlier may alter the inference to be drawn from the experiment.
Our goal in this study is to show that outlier has an effect on the factorial designs, which may give missleading results. Then, a robust technique is put forward to deal with the presence of outlier in design of experiment. We will show the the performance of a robust technique of M estimator in comparison with the classical Least Squares method. The comparison of both methods will be presented using an empirical dataset.

Outliers in Design of Experiments
Many literatures discussed about outliers including how to identify outliers and how to deal with the presence of outliers. Cook (1977) introduced a statistic to indicate the influence of an observation with respect to a particular model. Related to experimental designs, Daniel (1960) had discussed how to locate outliers in an experimental design. He defined that an outlier in a factorial experiment is an observation whose value is not in the pattern of values produced by the rest of the data. A year after, Bross (1961) had studied a strategic appraisal analysis of the problem of outliers in patterned experiments. Recent articles by Seheult and Tukey (2001) introduced a method of outlier detection and robust analysis in a factorial experimental design. Bhar and Gupta (2001) proposed a new criterion of detecting outlier in experimental designs which is based on average Cook-statistic. Meanwhile, Zhou and Julie (2003) realized the fact that in practice, experiments may yield unusual observations (outliers). In the presence of outliers in a data, estimation methods such as ANOVA, truncated ANOVA, Maximum Likelihood (ML) and modified ML do not perform well, since these estimates are greatly influenced by outlier. Zhou and Julie (2003) verified that with robust designs, one can get efficient and reliable estimates for variance components regardless of outliers which may happen in an experiment. Their work is then followed by Goupy (2006) who described how to discover an outlier and estimate its true value. The method is based on the use of a dynamic variable and the "small effects" of the Daniel's diagram.

Linear Model of a Factorial Experiment
Usual general linear model of an experimental design is written as follows Equation (1): where, Y n×1 is a vector of response variable, X n×p is the design matrix of nonstochastic constant, θ p×1 is vector of parameters to be estimated and ε n×1 is vector of errors with zero expectation, E(ε) = 0 and covariance matrix V(ε) = σ 2 I. In standard ANOVA, the underlying regression estimator is the least squares estimator, where parameters are chosen to minimize the regression sum of squares.

The use of Cook's Distance
There are many articles in the literatures that discuss outlier detections. In this article, we consider to employ Cook's Distance which was developed by Cook (1977). Cook's Distance is one of the important methods in statistics to identify outlier or influential observation. It is used for assessing influence in regression models. Cook's Distance usually denoted by D i , identifies cases with unusual values that have considerable influence on a numerical analysis. Cook distance of the i-th observation is based on the differences between the predicted responses from the model constructed from all of the data and the predicted responses if the i-th observation is eliminated. Fox (1997) suggested a cutoff value of 4/(n-k-l) for detecting influential cases where n is the number of observations and k is the number of predictor (factor).
In linear regression model, Cook's distance, D i is defined as: But since our model here is based on linear model in a design of experiment, we can simplify the Equation (2) above become: where, H = X(X'X) -1 X', h i = x i '(X'X)x i and p = number of predictors in model plus one. It can be seen from the Equation (3) that D i is calculated using leverage values and standardized residuals. It considers whether an observation is influential with respect to all fitted values. The template is used to format your paper and style the text. All margins, column widths, line spaces and text fonts are prescribed; please do not alter them. Your paper is one part of the entire proceedings, not an independent document. Please do not revise any of the current designations.

Robust M Approach
Robust linear models are useful for filtering linear relationships when the random variation in the data is not normal or when the data contain significant outliers. The main purpose of robust regression is to provide resistant (stable) results in the presence of outliers.
Many robust methods have been developed to rectify the problem of outliers. In this study we employ the M Science Publications JMSS estimators and incorporate this method in linear model two-way experimental designs. It is well known that the least squares estimation method optimize the fit of the model by minimizing the sum of the squared deviations between the actual and predicted Y values, Σ(y-ŷ) 2 . The method can be represented as Equation (4) where, ρ is a symmetric function with a unique minimum at zero. In general, a sensible ρ-function should have the following properties: Two procedures commonly used to solve the nonlinear normal equations for the M-estimates are the Newton-Raphson and the Iteratively Re-weighted Least Squares (IRLS). Practically, the most widely used procedure is the IRLS. In IRLS, the initial fit is calculated and then a new set of weights is calculated based on the results of the initial fit. The iterations are continued until a convergence criterion is met.
ROBUSTREG procedure in SAS provides two linear tests to asses a particular effect. The first test is a robust version of the F test, which is named to as the ρ (rho) test (SASI, 2008). Under H 0 , 2 2 n q S~λ χ , where λ is the standardization factor, which is equal to:   (SASI, 2008). In design of experiment, null hypothesis both ρ and 2 n R tests specify no significant contribution of a particular effect on response variable. When H 0 of no effects is correct, the 2 n R has chi-squares distribution with q degrees of freedom 2 q χ .

Empirical Results
To illustrate the comparisons between classical and robust approach in dealing with outlier in factorial experiments, we provide an empirical example. In this example we consider a famous dataset discussed by Daniel (1960) Table 1. The analysis is conducted by SAS release 9.2. For data without any outliers (clean data), we employ PROC GLM, meanwhile the contaminated data will be analyzed using PROC PROBUSTREG.
We now apply the classical Least Square (LS) approach to the clean data since we knew that the LS is always better in dealing with 'clean' observations. From Table 2 and 3, it is clear that a single outlier has nullified the main effect of chemical B to the response variable. In addition, the presence of an outlier has also reduced the usual goodness-of-fit measurement of R 2 . When there is no outlier in the data, both chemical A and B account for about 88.61% of the variability of the response variable. But, it is reduced to 71.14% when there is an outlier in the data. Table 1. Hypothetical two-way experimental data as mentioned in Daniel (1960      To verify that the observation of third row and third column of the modified data is an outlier, we employ the Cook's distance approach. The result is displayed in Table 4. The presence of a single outlier in the data inflates the Cook's distance from 0.183 of the clean data to 0.668. The Cook's distances indicate that cases 11 are an influential observation. The presence of this outlier has made the effect of chemical B insignificant. This result has huge impact in the analysis and as a result in applied science, especially in industry.

JMSS
We used PROC ROBUSTREG of SAS Release 9.2 and employ the robust M to rectify this problem. In comparison with the classical LS, the M estimator produces better results in dealing with the outlier.
By using the M estimator, as we can see in Table 5 and 6, we discovered that both chemicals A and B significantly contribute to the response variable with p values of the test statistics are equal to 0.0092 and 0.0037, respectively. From the results we can conclude that the robust M estimator has proven to reduce the effects of outlier on the analysis and lead to significant conclusion of the chemical B and the response.

CONCLUSION
In this study we enlightened the importance of employing a robust method in the experimental designs, especially for the factorial experiments to reduce the effects of outliers on the analysis. The numerical example indicates that in the presence of even a single outlier has large effect on the LS procedures. However, the M procedure is less affected by outlier. It can improve the analysis and nullify the effects of outlier. The results of the analysis clearly show that robust approach correctly identifies the significant factors in the presence of outlier.