Performance Analysis of LDA, AdaBoost and Ensemble Bag Classifiers for Automatically Recognizing Nine Common Facial Expressions

: Facial expression recognition holds a prominent role in today’s digital world with more Human –Computer Interaction happening in day to day life. Successful identification of facial expressions needs extraction of descriptive attributes from the active facial patches and accurate classification. This paper is presented with the comparison of three multi class classifiers namely Linear Discriminant Analysis, AdaBoost and Ensemble bag. Nine common facial emotions such as happiness, sad, anger, fear, disgust, surprised, confused, neutral and sleepy are recognized and classified. The feature descriptors are formed by combining Local Binary Patterns and Grey Level Co-occurrence Matrix. Feature descriptor formed from LBP operator supports handling the illumination invariance in the image and GLCM being capable of deriving second order textual information proves to be a good feature descriptor. Twenty-one active facial patches are extracted from the facial land marks eyebrows, iris, nose, sides of nose and lip corners. Feature vectors are generated for these twenty one facial patches, which considerably reduced the dimension feature vectors for classification. Each classifier is then trained using training set which consists of feature vector and corresponding expression of the image used for training. After training testing was done and the accuracy of recognition are analyzed. The experiments were done on facial expression data bases JAFFE and YALE. The proposed method obtained an accuracy of 98.03% for recognizing nine expressions.


Introduction
The study and developments of systems and devices which have the capabilities to recognize, interpret, process and simulate human affects are called affective computing. A computing device with this capacity should be able to gather the cues to user emotion from various sources. Facial expression plays a vital role in affective computing as human thoughts are well communicated through facial expressions. Analysing and identifying facial expression demands the measurements of face movements and grouping of the corresponding expression Tian et al. (2017).
The frame work of automatic facial expression recognition has mainly four components, pre-processing, face detection, feature descriptor generation and classification. Pre-processing methods when applied on input image can enhance the image features by suppressing unwanted distortions present in the image. Some of the common pre-processing methods applied for face detection includes cropping, resizing, filtering, normalization, Histogram Equalization (HE), Histogram Specification (HS), Logarithm Transformation, Gamma Intensity Correction (GIC), Self-Quotient Image (SQI). For handling the illumination invariance in images including shadow problems, pre-processing based on Gamma Correction and Variance Normalization can be used Rahman et al. (2016).
The face detection can be done based on feature based methods or image based methods. Among the two available methods, feature based methods are used mostly as it offers lesser computational complexity which is well suited for real time face detection Hjelm (2001). The existence of facial features such as mouth, cheeks, nose, eyes, pupil, eyebrows are considered for locating face from the given image Bakshi and Singhal (2014). Different approaches to build face detection algorithm can be done using genetic algorithms, machine learning, Eigen-face technique and Haar features Lajevardi and Hussain (2012).
Third component in the expression recognition is the selection and extraction of variables from the face image. These variables or attributes selected must be able to describe and differentiate the different expressions from one another effectively. These image attributes are extracted from the image characteristics such as colour, texture, edges and shapes. The textural information in the image can be used as feature descriptors for classification. Principal Component Analysis (PCA), LBP and GLCM are some of the textural descriptors used.
In the classification process the feature descriptors extracted from the images are used for training the classifier Ou (2012). The training allows the classifier to create decision boundaries for the classes. When a new training set is given to the classifier, it uses the sample data for learning by adjusting the decision boundaries generated earlier. Popular classifiers in facial expression recognition are Support Vector Machine (SVM), Hidden Markov Model (HMM), Ensemble classifiers, Neural Networks (NN).

Applications
Facial expression recognition can be useful in many applications where the emotional arousal and elicitation of expressions are taken into account. Different emotions are elicited upon touching different physical objects, seeing different videos and photos, exposed to different sounds and environments. This fact finds the expression recognition and analysis to be applied for dishonesty detection in crime scene analysis, healthcare sectors, in marketing and advertising and in designing of User Interface (UI) for websites.

Related Work
Many articles have been published in the area of automatic facial expression in the recent years, with different techniques utilized for feature vector generation and with different classifiers. Feature extraction is meant to generate a vector to represent the textural details of the image, which is further given as the input for the classifier used. Pre-processing methods can improve the overall accuracy rate of expression recognition system Barnouti (2016). By performing pre-processing on the face image can enhance the textural feature information. In image resizing technique, re-sampling of the image between the pixels of affixed range is carried out. After resizing the image will be having lesser number or more number of pixels with respect to the original image. Image resizing can be done either through traditional methods or through content aware methods. Cropping and scaling are traditional methods, while warping, multi operator, seam carving are content aware resizing techniques Rubinstein et al. (2009). Image intensities can be adjusted to enhance the contrast of the image using the pre-processing technique called histogram equalization Rahman et al. (2016).
Face detection algorithms had undergone through many analysis by the researchers. The major challenges in face detection are presence of structural components like spectacles, moustache or beard, pose variations which can affect the localization of facial features and occlusion Su and Guo (2015). Eigen face based algorithm Muller et al. (2004), Turk et al. (1991), Viola Jones face detection algorithm Viola and Jones (2001), Deshpande and Ravishankar (2017), AdaBoost algorithm, Neural Network based algorithms are some of the popular face detection algorithms used for facial expression recognition.
Feature extraction is meant to generate a vector to represent the textural details of the image, which is further given as the input for the classifier used Medjahed, (2015). The features extracted are mainly color features, textural features and shape related features and are used for classification and regressions Zhang et al. (2017). Zhang et al. (2012) al in their work used a combination of Local Binary Patterns (LBP) Local Fisher Discriminant Analysis (LFDA) for generating the feature vector and SVM classifier for the classification, Zhao and Pietikainen (2007). Principal component analysis (PCA) is another feature extraction method, where every image is considered as a one dimensional array with pixel values. From this one dimensional array values correlations if any are found and their coefficients are extracted to form a signature of the image. The major disadvantages of PCA are, it cannot give good results in images with pose invariance and illumination invariance Clader et al. (2001), Sahoolizadeh et al. (2008). LBP alone has been used by many researchers for creating the feature vectors Happy and Routray (2014).
Michel et al. in their work have used Support Vector Machines (SVM) as the classifier Michel and El Kaliouby (2015), Bajpai and Chanda (2010). Philip Michel et al used Cohn-Kanade AU coded facial expression database and achieved total accuracy of 87.9%. Jun Ou in his work implemented facial expression recognition by generating the feature vector using Principal Component Analysis (PCA) and classifying with K-Nearest Neighbour (KNN) algorithm Happy and Routray (2014). Hidden Markov Model (HMM) can be used effectually for classification Ramkumar and Logashanmugam (2016).

Key Findings
The proposed work was successful in recognizing nine facial expressions, along with six base expressions and three expressions more including confused, sleepy and neutral. A combination of LBP feature descriptor and GLCM feature descriptors were used for the generation of feature vectors, which was found to be effectual in the classification. By performing a comparison study on the accuracy of the classification carried out, it was found that Ensemble bag classifier deliver better performance.

Objectives of the Work
The main objective of this work is to recognize nine facial expressions using a new a feature vector which is formed by combining two feature descriptors namely LBP and GLCM and to classify the nine classes of expressions using the supervised classifiers LDA, AdaBoost and Ensemble bag and to compare and analyses the performance of these three classes for the facial databases JAFFE and Yale.
LDA uses supervised learning and helps in training the data with the target class. LDA minimizes the dimensionality of the feature vector given to "C" number of values where "C" is the number of classes to be assigned. This implies that the feature vector will be mapped to a sub space with nine values for recognizing nine expression classes. When dimensionality reduction is applied, there is a chance for information loss from the original representation of features. Therefor classification using LDA will be helpful to find out how dimensionality reduction affects the accuracy of classification.
AdaBoost and Ensemble Bag classifiers being machine learning methods for classification creates its decision boundaries as hyper planes which does not demand linear separability of the variables for classification. Ensemble bag uses a group of classifiers and training id done on resampled feature vector. The comparison of these three classification can bring an insight towards the classification performance with linear and non-linear classification methods. Figure 1 illustrates the overview of the proposed frame work for the automatic recognition of facial expression. Nine facial expressions, such as happy, sad, anger, surprise, disgust, fear, confused, neutral and sleepy, are considered here in this study. First, given input image is pre -processed for enhancing the quality of the image and the face is located from the input image. Next step is to detect and extract the land marks from the face such as Eye corners, iris, eyes, nose and lip corners. From the land marks twenty one active facial patches are extracted for the generation of feature descriptor. A combination of LBP and GLCM has been proposed in this study as the feature descriptor. These feature descriptors are given as input for the classifiers for learning purpose. Three classifiers LDA, AdaBoost and Ensemble bag are used for classifying. The performance of these three classifiers are compared by computing the accuracy rate of classification done.

Pre-Processing and Face Detection
Gaussian filtering has been applied on the input image for reducing the noise present in the image. This will help further in edge detection done for land mark localization. Viola-Jones algorithm is used for face detection. The Haar features used in this algorithm for detecting face uses the human face properties like, eyes region are darker, nose brighter etc. It has a high truepositive rate making the algorithm robust and with lesser computations done on the image make it suitable for real time environment.

Landmark Detection
After detecting the face the next step is the detection of facial landmarks like lip corners, nose, eyes and eyebrows.

Extraction of Active Facial Patches
During the elicitation of facial expressions facial muscles moves accordingly, some expressions share same muscle movements where as some have different muscle movement. Only some facial areas need to be evaluated for studying the facial expression elicitation. Thus 19 facial patches are identified and utilized for the classification of facial expressions. A facial patch is said to be discriminative if it can classify two expressions accurately Zhong et al. (2012). From the image processing aspect it would be advantageous if it is possible to reduce the amount of pixel information to be processed. Texture analysis of only these 21 patches need to be performed for the classification. Figure 2 shows the 21 facial patches identified from P1 to P21. Patches P1 and P4 were derived directly from the lip corners, P9 and P11 obtained from immediate patch below P1 and P4. P5, P13 and P12 derived from one side of the nose, whereas P2, P7 and P8 from the other side of the nose.P2, P5 from the sides of nose and P3, P6 derived from the area mid-way between nose and eyes. P14 and P15 derived from below the eyes region. P16 derived from the Centre region of both eyes, P17 is the patch which is immediately above P16. Patches P18 and P19 derived from the inner eyebrows. Newly introduced patches P20 and P21 are derived directly from both the iris regions. Figure 2 shows the different intermediate images obtained during the run time of the proposed work. When an image with a human face is given, the face part is detected first followed by the detections of eyes, nose, eyebrows and lip corners. These Region of Interest (ROI) extraction is essential for extracting the 21 facial patches from the face image. Figure 2i shows the identified 21 facial patches from the input image.

Feature Descriptor Generation
LBP is a histogram based feature vector generator which has been proved to work effectively in illumination invariant images too. A combination of LBP operator and GLCM is proposed in this study for obtaining the feature vector, which improves the accuracy of the presently available automatic facial expression methods. From the literature review it is learned that different facial expression can be differentiated by studying the textural difference in the nineteen facial patches located at various face components like eyebrows, eyes, nose and lips. After extracting the 21 facial patches LBP operator and GLCM is applied on each patch to generate a robust feature vector.

LBP
LBP being an illumination invariant feature descriptor has been utilized in this study for the generation of feature vector. Applying LBP operator results in a binary number which is generated by comparing the neighboring pixels intensity values to that of the Centre pixel. LBP operator is defined as: where, i c is the pixel value at the coordinate (x, y) and in are the pixel values at coordinates in the neighborhood of (x, y). Figure 3 illustrates the generation of LBP value from a 3×3 grid of pixels: A histogram is formed with the LPB values generated, which in turn is utilized as feature vector descriptor, given by: where, n is the number of labels produced by LBP operator.
LBP operator applied on each patch will produce a 1X1 matrix, thus producing 21 matrices in this study.

GLCM
Human visual systems uses second order distribution of grey levels as discriminators in identifying textures Shijin Kumar and Dharun (2016). The grey level cooccurrence probabilities can be used to generate texture features by analyzing the relative positions and intensities of the neighboring pixels in an image Haralic et al. (1973). GLCM is a second order method where conditional joint probabilities of all pair wise combinations of grey levels in the spatial window of interest given two parameters inter pixel distance (δ) and orientation (θ) Clausi (2002). When descriptors are modelled with GLCM the radius and angle has a very crucial role to play Haralic et al. (1973): C ij is the co-occurrence probability between the grey levels i and j, P ij represents the number of occurrences of grey levels i and j within a specific window defined using the pair (δ, θ) and G is the number of quantization levels.
This work proposes G = 8, δ = 4 and θ = 0o. GLCM for 10 properties such as auto correlation, contrast features, dissimilarity, energy, entropy, homogeneity, variance, difference variance, difference entropy feature and correlation where obtained as [1×2] matrix for each of the facial patches. Table 1 shows the mathematical modeling of each of the 10 properties used in the work.

Optimization of Feature Descriptor
From the 21 facial patches identified a feature vector of the order of 21×11 matrix was formed for each of the image, which has been further used for training the images and to be classified as an expression: ( ) There for every facial image is represented as a feature vector of 11×21 matrix in this study. The values obtained for each of the feature vector will vary according the variations in the textual facts of each of the 21 patches. The facial images with the same facial expression will have feature vector values which are close to each other. That is the standard deviation of the values in the feature vectors of the same class (facial expression) will be less and that of the other classes will be more. When the facial images are trained with these feature vectors a statistical analysis will be done by the classifier and a set of lower bound values and upper bound values for each of the expression will be formulated. Upon testing a new image with a new feature vector these statistics present within the system can assign one the class labels by checking standard deviation of the input image feature vector and the stored feature vectors lower bound and upper bound values.

Classification
When the feature vector of an image is submitted to the classifier, the feature vectors are partitioned into decision boundaries whose separating planes or hyper planes are determined by the sample patterns used for training. Each region of separation will be dominated by a particular facial expression. This work compares the performance and efficiency of LDA, Ensemble AdaBoost and Ensemble Bag classifiers for the nine facial expression recognition.

LDA
Linear Discriminant Analysis (LDA) is based on supervised learning and finds the linear combinations of the available features which separates the classes from one another. LDA considers the statistical data from the training set and finds the mean and variance of the variables for each class. The main objective of LDA is to reduce the dimension to "C" features where "C" is the number of classes to be recognized. By performing dimensionality reduction the computational cost for classification becomes less.  , variance gx y µ is the mean of g ij , µ x , µ y , σ x , σ y are the means and standard deviations of g x and g y Fig. 4: Illustration of resampling of feature vector on imaginary data Opitz and Maclin (1999) AdaBoost Ada Boost or Adaptive Boosting is a machine learning technique for classification. AdaBoost gets its output by computing the weighted sum of all the weaker classifiers. The mathematical representation of learning is shown in Equation 6.
where, x is the input and ft is the weak classifier.

Ensemble Bag
A group of classifiers are used for training and classification. Each classifier in the set is given with a redistribution feature set used for training. If the length of the feature set is "N" each classifier will be given with a randomly selected feature variables from the original set. Among the newly generated feature set some variables may get repeated, while some may left out. After training when testing is done, even though individual classifier may produce high test error, but when combined as a bag the test error is much lower than any individual classifier Opitz and Maclin (1999). The random redistribution of feature set variables are shown in the Fig. 4.

Algorithm
Step 1: Start Step 2: Input image Step 3: Pre-processing and face detection Step 4: Land mark extraction and active facial patch extraction Step 5: Feature descriptor generation Step 6: Classification and training Step 7: Testing Step 8: Stop

Experimental Setup
Data Set Used JAFFE, Yale were the data bases used for experiments and analysis.

Fig. 5: Sample images from JAFFE and Yale facial expression dataset
The JAFFE data base includes a total of 213 images (256×256 pixels) with seven facial expressions, six basic plus neutral, which was captured from 10 females from Japan. Yale dataset was used for testing the face images with illumination variance. Images of 15 persons are available with Yale dataset in which images with expressions including sleepy and images with spectacles are also present. The size of each image is 320×243 pixels. Figure 5 shows the samples taken from JAFFE and Yale data set.

Tools
Implementation was done in Matlab17a on an Intel® Core ™i3-500U CPU with 2.00GHZ processor frequency and 4 GB RAM memory, windows 10, 64 bit Operating System.

Results and Discussion
Classifiers LDA, AdaBoost and Ensemble bag were trained using training set. A training set includes the feature vector generated for that image along with the facial expression of that image. After training all the nine expressions, testing was performed for verifying the results produced. From the testing results accuracy of classification were derived, using ground truth and the result obtained.
The classifiers were trained and tested with 102 images. The result of classifiers were evaluated by analyzing the true positives and false positives the classifier deliver as output. The true positives in LDA are 88 and false positives 14. True positives in Ensemble Ada boost are 94 and false positives 8.True positives in Ensemble Bag is 100 and false positives 2. The performance analysis of a supervised learning algorithm can be visualized using an error matrix or confusion matrix. The automatic expression recognition analysis of nine expressions Happy, Surprise, Anger, Sad, Disgust, Fear, Confused, Neutral and Sleepy has been presented as tables. Table 2-4 shows the analysis and confusion matrix for the classifier LDA, Ensemble Ada Boost and Ensemble Bag respectively. Table 5 shows the performance analysis by comparing the accuracy obtained for each of the expression using LDA, AdaBoost and Ensemble Bag classifiers. Figure 6 shows the graphical representation of the accuracy rate obtained when LDA, Ensemble Ada boost and Ensemble bag classifiers were used for classifying the same test images.       Table 6 shows the comparison of related work with the proposed work on JAFFE database. Figure 7 shows the extracted twenty-one active facial patches in sample images.

Conclusion
This paper is presented with an effectual method for automatic facial expression recognition system which recognizes nine facial expressions and compares the performance analysis of three classifiers LDA, Ensemble AdaBoost and Ensemble bag.
From the experiments carried out onto different facial expression dataset, 9 expressions were automatically identified by considering 21 facial patches from the face image. Feature vectors of these 21 facial patches were generated by concatenating LBP values of each of the patches along with the GLCM vectors generated for the same 21 facial patches. The combination of LBP and GLCM proved as a strong feature vector set for the classifiers used for performance analysis. The precision rate was found by measuring the number of True Positives and False Positives obtained from each of the classes tested. This is a multiclass classification problem were 9 labels are to be assigned accordingly after analyzing the feature set. The learning was accomplished by assigning corresponding labels to the input images. The classifier LDA attained a precision of 89.67%, AdaBoost with 92.85% and Ensemble Bag classifier achieved a precision of 98.03%. From the performance analysis of the three classifiers used, Ensemble Bag delivered the best results with highest precision rate. Fused data set were used for the availability of 9 expressions considered in this study. The usage of different data set proves the robustness of the feature vectors selected and the classifiers.

Future Scope
Facial expression recognition of pose invariant face images, face images with structural components like moustache and beard may be considered as the extension for the proposed work.

Author's Contributions
Sumithra, M.D.: Contributed in identifying methods, implementation, results analysis and manuscript preparation.
M. Abdul Rahiman: Contributed in the proposed methodology and analysis of the results and article revision.

Ethics
There is no ethical issue involved in this article, as it is original contribution of the author.