Robust Linear Discriminant Analysis

: Linear Discriminant Analysis (LDA) is the most commonly employed method for classification. This method which creates a linear discriminant function yields optimal classification rule between two or more groups under the assumptions of normality and homoscedasticity (equal covariance matrices). However, the calculation of parametric LDA highly relies on the sample mean vectors and pooled sample covariance matrix which are sensitive to non-normality. To overcome the sensitivity of this method towards non-normality as well as homoscedasticity, this study proposes two new robust LDA models. In these models, an automatic trimmed mean and its corresponding winsorized mean are employed to replace the mean vector in the parametric LDA. Meanwhile, for the covariance matrix, this study introduces two robust approaches namely the winsorization and the multiplication of Spearman’s rho with the corresponding robust scale estimator used in the trimming process. Simulated and real financial data are used to test the performance of the proposed methods in terms of misclassification rate. The numerical result shows that the new method performs better if compared to the parametric LDA and the robust LDA with S -estimator. Thus, these new models can be recommended as alternatives to the parametric LDA when non-normality and heteroscedasticity (unequal covariance matrices) exist.


Introduction
Linear Discriminant Analysis (LDA) is one of the most widely used statistical approaches for analyzing attribute variables in supervised classification (Elizabeth and Andres, 2012). The purpose of LDA is to determine which variable discriminates between two or more classes and to construct a classification model for predicting the group membership of new observations. In short, LDA aims for reliable group allocations of new observations based on a discriminant rule which is developed from a training data set with known group memberships. LDA are known to perform optimally when the assumptions of normality and homoscedasticity (equal covariance matrices) are met (Croux et al., 2008). However, the high dependencies of its calculation on sample mean vectors and pooled sample covariance matrix may increase misclassification rate in the existence of outliers (Sajobi et al., 2012). It is a known fact that mean, which possesses zero breakdown point, is very sensitive to outliers. To overcome this sensitivity problem in LDA, researchers seek for alternatives in Robust Linear Discriminant Analysis (RLDA). By substituting the classical estimators with robust estimators such as M-estimators, Minimum Covariance Determinant (MCD) (Hubert and Driessen, 2004;Alrawashdeh et al., 2012), Minimum Volume Ellipsoid (MVE) (Chorl and Rousseeuw, 1992) and S-estimators (He and Fung, 2000;Croux and Dehon, 2001;Lim et al., 2014), we can develop robust discriminant model with minimum classification error rate (Croux et al., 2008).
In this study, two approaches, namely trimming and winsorizing are proposed in the construction of new RLDA models to create discriminant rule that are robust to deviation. The coordinate-wise based estimators have been applied in this research with the purpose of producing at least one successful RLDA models to solve classification problems. Unlike the usual trimming and winsorizing process, the trimming employed in our work take into consideration the shape of data distribution.
Through this trimming approach, only outliers will be trimmed away leaving just the good data. A simulation study and a real life financial data are used to investigate the performance of the proposed RLDA. We are interested to classify "distress" and "non-distress" banks in Malaysia for the real life financial problem. Therefore, our work is scoped to two populations only due to the nature of the real life problem. The proposed RLDA are then compared to the classical LDA and also to the well-known robust LDA with Sestimators. The performance of the discriminants rules are evaluated by misclassification rate provided by simulation and real life study.
The rest of this paper is structured as follows. Section 2 describes about discriminant rules for classical LDA and proposed RLDA. The results and discussions based on the simulation study and real life problem application will be delivered in the section 3. Lastly, the concluding remark will be provided in section 4.

Discriminant Rules
Suppose that we have one group of p-dimensional feature data, x 1 , from population π 1 of H 1 distribution with mean µ 1 and covariance matrix Σ 1 and the other group of data, x 2 , from population π 2 of H 2 distribution with mean µ 2 and covariance matrix Σ 2 . A discriminant rule can be constructed to assign one new observation x 0 to π 1 or π 2 . One of the familiar models to unravel this problem is via classical LDA which is derived under the assumptions that all the populations have identical covariance, such that Σ 1 = Σ 2 = Σ. The classical discriminant rule is defined as Equation 1 (Johnson and Wichern, 2002): where, p 1 and p 2 are the prior probability that an individual comes from population π 1 and π 2 respectively. Practically, the overall misclassification probability can be minimized based on this classical discriminant rule. Since the classical parameters, µ and Σ, are usually undefined, hence we need to estimate the parameters from the sample data. However, the performance of the classical discriminant rule will be badly affected if nonnormality and/or heteroscedasticity (unequal covariance matrices) occur (GlèlèKakaï et al., 2010). It is clear that the classical discriminant rule will become non-robust due to the sensitivity of classical estimates.
By plugging robust estimators for the location, µ and scatter Σ, a robust discriminant rules can be developed.
In this study, we introduced two robust estimators namely the automatic trimmed mean, which is also known as modified one-step M-estimator (MOM) and its winsorized version, referred to as winsorized modified one-step M-estimator (WMOM) to construct RLDA model. Trimming and winsorizing are among the strategies to deal with extreme values. MOM estimate of location is modified from the one-step M-estimator which was introduced by Wilcox and Keselman (2003). Based on the concept of trimmed means, the MOM estimator is derived using data left from empirically determined trimming. Briefly, MOM estimator is a highly robust location estimator which possesses highest breakdown point and is defined as Equation 2: Another strategy in dealing with extreme values is the winsorization approach. Winsorization follows the process of trimming, but instead of discarding the trimmed values, they are replaced by the remaining highest and lowest values. Winsorized MOM (WMOM) follows the same trimming process as MOM before replacing the trimmed values with the remaining lowest and highest end of the data (Haddad et al., 2013). Unlike MOM, WMOM will retain the original sample size and this approach can reduce the problem of losing information due to trimming. WMOM estimate of location and scatter can be defined as where W ij is the winsorization of a random sample. Meanwhile, the covariance matrix will be estimated using two approaches; the winsorized covariance and the product of spearman correlation coefficient and rescaled Median Absolute Deviation (MADn). These covariance matrices will be paired with the corresponding WMOM and MOM location estimates to form robust discriminant rule denoted as RLDA WM and RLDA M , respectively.

Results and Discussions
In this section, simulation study and real data application are implemented to evaluate on the performance of the two proposed RLDA models. These models will then be compared against the classical LDA model and existing RLDA with S-estimators model.
Condition A was generated from uncontaminated populations while conditions B, C, D, E and F were generated from contaminated populations. The procedure started by generating a training data set based on the various conditions to develop a discriminant rule for each condition. Next, generate another data set of size 2000 for both groups from uncontaminated populations to validate the corresponding discriminant rules. This experiment is replicated about 2000 times for each condition.
In this study, the percentage of contamination and dimension of variables were fixed at 20% and 3, respectively, for conditions A, B, C, D and F. Shift in location with equal and unequal sample sizes were considered in conditions B and C respectively. For condition D, the shift in shape was matched with equal sample sizes. Unequal sample sizes and heteroscedasticity are considered in condition E with almost 17% on contamination percentage. Lastly, extreme contamination was considered in condition F with both location and shape were shifted. Table 1 presents the results of misclassification rate for the classical LDA and RLDA.
From Table 1 we notice that all the models perform equally well when there is no contamination. Theoretically, under ideal condition, that is when all the assumptions are fulfilled, classical LDA should perform optimally and the results in A concur with the theory. Nevertheless, all the RLDA do not perform much worse than the classical LDA. In contrast, when there is contamination, the results show that the misclassification rate for the classical LDA inflated above all the other models (RLDA). In cases B, C and E, the RLDA M and RLDA WM perform better than others. They also perform as good as RLDA S for the rest of the cases (D and F). Furthermore, the proposed models (RLDA M and RLDA WM ) are more efficient in computational aspect.     Table 1, no one single model can be the best across all the conditions, but taking into account the consistencies of the means and standard deviations of the misclassification rates (which are always small), RLDA M is the better one. It is comparable to classical LDA under perfect condition (no contamination) and consistently produces small misclassification rate even under contamination of data. The existing RLDA with Sestimators performs poorly under a few cases, namely B, C and E.

Real Data Application
Besides simulation study, all the models were also being put to test on real data, specifically, to classify financially distressed and non-distressed banking institutions in Malaysia. The bank data were extracted from selected balance sheet in annual report of 27 commercial banks from year 1988 to 1999. Two independent variables were used to capture variation in financial crisis. The variables were ratio of total shareholder's fund to total assets (CA) and ratio of total shareholder's fund to total Equity (EQ). Table 2 shows the results of Lilliefor normality test for both variables in each group.
Normality checking on the financial data showed a violation of normality assumption. The performance of each model was based on its corresponding Apparent Error Rates (AER) and estimate of error rates using cross Validation (CV). The results of the real data analysis are presented in Table 3.
The real data results reveal that all RLDA are able to detect outliers and produces smaller error rates than the classical LDA. However, among the RLDA, the two proposed models produce the smallest error rates as compared to the existing RLDA S . Both models are found to be equally good as they produce equal error rates via AER as well as CV. The simulation and real life problem results proven that the proposed RLDA models provide a comparable performance or better in LDA.

Conclusion
In this study, we present two robust estimators namely modified one-step M-estimator (MOM) and winsorized one-step M-estimator (WMOM) to alleviate the classification problem. These two robust estimators used trimming and winsorizing approach to eliminate the outliers of the data and then form the robust discriminant rule. Their function as the substitutes for the classical estimators in Linear Discriminant Analysis (LDA) model very much improves the misclassification rates. Even when compared to the existing robust LDA using Sestimator, the simulation and real data analysis prove that the two proposed models are comparable or better. The proposed models produce lowest error rates as compared to the other investigated models. Generally, we can conclude that MOM and WMOM estimators should be considered in solving classification problems especially when non-normality and/or heteroscedasticity are suspected. Thus, these new robust models are good alternatives for parametric LDA especially under violation of assumptions.