An Efficient Implementation of Weighted Fuzzy Fisherface Algorithm for Face Recognition using Wavelet Transform

: Problem statement: The paper addresses the face recognition problem by proposing Weighted Fuzzy Fisherface (WFF) technique using biorthogonal transformation. The weighted fuzzy fisherface technique was an extension of Fisher Face technique by introducing fuzzy class membership to each training sample in calculating the scatter matrices. Approach: In weighted fuzzy fisherface method, weight emphasizes classes that were close together and deemphasizes the classes that are far away from each other. Results: The proposed method was more advantageous for the classification task and its accuracy was improved. Also with the performance measures False Acceptance Rate (FAR), False Rejection Rate (FRR) and Equal Error Rate (EER) were calculated. Conclusion: Weighted fuzzy fisherface algorithm using wavelet transform can effectively and efficiently used for face recognition and its accuracy is improved.


INTRODUCTION
Face recognition involves two major steps. In the first step, some features of the image are extracted. In the second step, on the basis of the extracted features the classification is performed. There can be various features that can be extracted from the facial images.
In the field of pattern recognition, feature extraction for dimensionality reduction is an important topic of research, because in many practical technologies high dimensionality is a major cause of limitation. Also the large quantities of features, degrade the performances of the classifiers, when the size of the training set is small compared to the number of features. In the past several decades, many face recognition methods have been proposed, in which the most well-known methods are Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Fisherface method outperforms the eigenface (Belhumeur et al., 1997;Zhuang and Dai, 2007;Yuen et al., 2009;Rizon et al., 2006;Abusham et al., 2008;Ma et al., 2006;Chan et al., 2010) method in case of large variation of lighting conditions, different face poses and different facial expressions. The basic idea of Fisher Linear Discriminant (FLD) is to calculate the fisher optimal discriminant vectors so that the ratio of the between-class to withinclass scatter matrix is maximized. Fisherface combines the techniques of Principal Component Analysis with the linear discriminant analysis.
Recently, Fuzzy Fisherface was proposed for face recognition (Shieh et al., 2010). Fuzzy Fisherface computed fuzzy within-class scatter matrix and betweenclass scatter matrix by incorporating class membership. Although it was proved to be effective, Fuzzy Fisherface did not completely incorporate the class membership into the definition of between-class and within-class scatter matrices. One of the major drawbacks of the Fuzzy fisherface method is that it fails to consider the different contribution of class to the discrimination (Kwak and Pedrycz, 2005). In order to overcome the above drawbacks, the weighted form of the between-class scatter matrix is introduced, in which a weight is added to the well-known K-nearest neighbour classifier.
Feature extraction: Advantages of wavelets are very flexible: several basis exists and one can choose the basis which is more suitable for a given application and provide a spatial and frequency decomposition of the image at the same time. Wavelet transforms have advantages over traditional fourier transform for representing functions that have discontinuities and sharp peaks and for accurately decomposing and reconstructing finite, non-periodic and/or nonstationary signals (Rizon, 2010). The original image is resized to 128×128 and applied to Discrete Wavelet Transformation (DWT). It is decomposed into 4 frequency bands which is one lowfrequency band (LL) and three high-frequency bands (LH, HL, HH). If the information of low-frequency band is again transformed, the sub-level frequency band information will be obtained as shown in Fig. 1.
For the proposed Weighted Fuzzy Fisherface technique Biorthogonal wavelet transformation is used which supports both continuous wavelet transform and discrete wavelet transform. The special feature of this transform is less vanishing points, which removes fewer details and produce little distortion. Here for simplicity Biorthogonal wavelet 1.1 is used for feature extraction and dimensionality reduction.

Fisherface method:
The eigenface algorithm takes advantage of the fact that, the variation within class lies in a linear subspace of the image space. Hence, the classes are convex and, therefore, linearly separable. In the face recognition problem, when one seeks insensitivity to lighting conditions, linear methods for dimensionality reduction is chosen.
Fisherface method performs dimensionality reduction using linear projection and still preserves linear separability. Fisher's linear discriminant is a class specific method, as this method tries to shape the scatter in order to make it more reliable for classification. Also it maximizes the ratio of the between-class scatter and within-class scatter matrix. Fisher's LDA (Belhumeur et al., 1997) looks for a linear subspace, within which the projections of the different classes are best separated as defined by maximizing the discriminant criteria.
Let {x i where i=1,2,…,n } be a set of n samples in N dimensional space and c is the number of classes. Denote the ith class samples by n i and n be the total number of class samples. Then the between-class, the within-class and the total-class scatter matrices are defined in Eq. 1-3 respectively: where, µ i is the mean of the ith class and n j 1 i 1 xj n = µ = ∑ is the global mean of all samples.
The null space of the between-class scatter matrix S B contains no useful information for recognition, hence it is discarded by diagonalization. The within-class scatter matrix S W is then projected into the linear subspace of S B and factorized using eigen analysis to obtain the solution. The solution of linear discriminant analysis method contains c-1 eigenvectors with non-zero eigenvalues.
If any singular matrix in S B or S W is involved in finding the eigenvector and eigenvalue, the diagonalization must start from the non-singular matrix. Since the scatter matrix S B has a maximal rank of c-1, it is often singular. For a singular scatter matrix S W , Fisher's LDA is under constrained.

Nonsingular within-class scatter matrix:
If S W is nonsingular, the optimal projection W opt as in Eq. 4 is chosen as the matrix with orthonormal columns which maximizes the ratio of the determinant of the betweenclass scatter matrix of the projected samples to the determinant of the within-class scatter matrix of the projected samples i.e.: where, { w i | i = 1,2,…m } is the set of eigenvectors of S B and S W corresponding to the m largest eigenvalues {λ i | i = 1,2,…..,m} arranged in the descending order, i.e., Eq. 5: Since there are at most c-1 nonzero eigenvalues, an upper bound on m is c-1, where c is the number of classes.
Singular within-class scatter matrix: If S W is singular, first the PCA approach for the dimensionality reduction is used such that it becomes non-singular in the lower dimensional space. This is implemented by solving the principal eigenvectors of total scatter matrix S T . By using PCA the dimension of the feature space is reduced to n-c and then by applying FLD the dimension is reduced to c-1. Let W pca be the PCA transform matrix and W fld be the fisher LDA transform matrix.
The optimal transform matrix W opt , in the case of singular S W is given by Eq. 6-8: The optimization for W pca is performed over (N× (n-c) matrices with orthonormal columns, while the optimization for W fld is performed over (n-c) × m) matrices with orthonormal columns. The smallest c-1 principal components are eliminated while computing W pca . There are certainly other ways of reducing the within-class scatter while preserving between-class scatter. The method which is currently used chooses W to maximize the between-class scatter of the projected samples after first reducing the within-class scatter.
Fuzzy fisherface method: In the above fisherface method, it is noted that the scatter matrices S B and S W are computed under the assumption that each class is fully assigned to a given class. In face recognition, however, as the faces may be affected by large environmental (including the illumination, poses, expression,) variation conditions, it is advantageous to assign a class membership to each sample rather than merely use the binary class assignment.
In this study, a Fuzzy K-Nearest Neighbor (FKNN) algorithm, which makes use of the distribution of samples and considers the discriminative information in the null space of fuzzy within-class scatter matrix. Samples distribution information of every class is represented by fuzzy membership degree. Kwak and Pedrycz proposed to use the following fuzzy scatter matrices S B and S W to replace SB and S W . Eq. 9 and 10: where, n n i ij j ij j 1 j 1 is the mean of the ith class and u ij is the class membership grade of the jth sample x j to ith class.

FKNN algorithm:
The class membership gradient can be computed by using the following steps (Kwak and Pedrycz, 2005): Step 1: Compute the Euclidean distance matrix Step 2: Set diagonal elements of this matrix to infinity Step 3: Sort the distance matrix in an ascending order.
Collect the class labels of the patterns located in the closest neighborhood of the pattern under consideration.
Step 4: Compute the membership grad u ij as in Eq.11-14: If i equals to the label of the jth sample: Then: where, n ij stands for the number of the neighbors of the jth sample that belong to the ith class. As usual, u ij satisfies two obvious properties: where, u ij ∈[0,1].

MATERIALS AND METHODS
Some drawbacks are there on the Fuzzy Fisherface method: • The sample distribution information is not completely used in the definitions of fuzzy between-class and within-class scatter matrices • In PCA transformed space, the fuzzy within-class scatter matrix still might be singular • The null space of the fuzzy within-class scatter matrix contains discriminative information for classification Weighted fuzzy fisherface: Hence, Fuzzy fisher criterion is modified to achieve better recognition rate in this proposed method. More specifically, if two of the class means are far away from each other, which means that they are well separated, then their contributions to the discrimination task is minor. However, if two of the class means are close together, which means that they are not well separated, then finding the discriminant vectors that can better separate them is used to improve the discriminant performance. The fuzzy fisherface method introduces the class membership to each training sample in order to enhance the discriminant ability. Since the global mean µ is common to all the classes and is irrelevant to the class membership of the samples, this expression may not fully take advantage of the class membership. The between-class scatter matrix, S B is reformulated (Zhou et al., 2009) as given in Eq.18: Then each class mean µ i is replaced with the fuzzy class mean i µ % , to obtain the fuzzy between-class scatter matrix as in Eq.19: To control the contribution of the class mean difference between i µ % and j µ % to the between-class scatter matrix B S % ,Weighted Fuzzy Fisher face algorithm is proposed where a weight denoted by ∆ ij is introduced. Moreover, if i µ % and j µ % are far from each other, then ∆ ij is given a small value, otherwise ∆ ij will be given a larger value. Thus, weight ∆ ij is defined as given in Eq. 20: where, a is a parameter to be chosen. Then, the weighted fuzzy between-class matrix is defined by Eq. 21: Accordingly, the optimal transform matrix of weighted fuzzy fisherface, denoted by W WFF , can be found by solving the eigenvector B (24) Figure 2 shows the block diagram of the general workflow of the weighted Fuzzy fisherface technique.

RESULTS
To evaluate the performance of the proposed method, the ORL face database (http://www.cl.cam.ac.uk) and the Yale (http://cvc.yale.edu/projects/yalefaces/yalefaces.html) face database is used. In face recognition both databases are widely used. The ORL database contains 40 distinct subjects, where each subject contains 10 different poses with varying lighting conditions. The original face images are all sized 112×92 pixels. Figure 3 shows ten face images of one subject in the ORL database.
The Yale face database contains 165 face images of 11 subjects that include variations in both facial expression and lighting condition. The original face images are sized 243×320 pixels. Figure 4 shows ten face images of one subject in the Yale database.
First the image is resized to 128×128. Then two level biorthogonal discrete wavelet transform is applied to get the feature extraction and also the dimensionality is reduced. Then experiment is performed by considering various values for k (K-nearest neighbour) and a. Better recognition rate with less computation is achieved by taking k = 5 and a = 425. Table 1 shows that the weighted fuzzy fisherface method achieves better recognition result than fuzzy fisherface method using ORL and Yale databases.    Figure 6 shows the comparison of recognition rate for different subjects on ORL database for fuzzy and weighted fuzzy fisherface with DWT. Figure 7 shows the plot of various performance measures.

DISCUSSION
The accuracy of face recognition system is defined by two parameters False Acceptance Rate (FAR) and False Rejection Rate (FRR).

CONCLUSION
The proposed weighted fuzzy fisherface method using Biorthogonal wavelet transform for face recognition is superior to fuzzy fisherface method by having full advantage of dimensionality reduction, fuzzy membership and the different contributions of the class means to the discrimination. Also it takes into account classification errors occurring between pairs of classes, unlike the fisher face. Several experiments are carried out successfully to confirm the effectiveness of the proposed method "Weighted Fuzzy Fisherface for face recognition using Wavelet Transform" with less computation time and better recognition rate. The performance measure is computed and its accuracy is calculated. The Equal Error Rate (EER) is found to be 0.133 and the threshold is fixed at 0.14 where the False Acceptance Rate (FAR) and False Rejection Rate (FRR) intersect. The lower value of FAR determines the effectiveness of the system.