Improving the Performance of Machine Learning Based Multi Attribute Face Recognition Algorithm Using Wavelet Based Image Decomposition Technique

,


INTRODUCTION
Face recognition is an important part of today's emerging biometrics and video surveillance market. Face Recognition can benefit the areas of: Law Enforcement, Airport Security, Access Control, Driver's Licenses and Passports, Homeland Defense, Customs and Immigration and Scene Analysis. Face recognition has been a research area for almost 30 years, with significantly increased research activity since 1990. This has resulted in the development of successful algorithms and the introduction of commercial products. But, the researches and achievements on face recognition are in its initial stages of development. Although face recognition is still in the research and development phase, several commercial systems are currently available and research organizations are working on the development of more accurate and reliable systems. Using the present technology it is impossible to completely model human recognition system and reach its performance and accuracy.
However, the human brain has its shortcomings in some aspects. The benefit of a computer system would be its capacity to handle large amount of data and ability to do a job in a predefined and repeated manner. The observations and findings about human face recognition system will be a good starting point for automatic face attribute analysis.
Literature review: Face recognition has gained much attention in the last two decades due to increasing demand in security and law enforcement applications. Face recognition methods can be divided into two major categories, appearance-based method and feature-based method. Appearance-based method is more popular and achieved great success (Sakthivel and Lakshmipathi, 2010), Appearance based methods (use no a priori knowledge on the data present in the image. Instead, they try through statistical analysis of the available dataset (either an image or image characteristics database) to extract the different variation modes of the database and provide a set of subclasses which represent them best.
Appearance based method uses the holistic features of 2-D image. Generally face images are captured in very high dimensionality; normally it is more than 1000 pixels. It is very difficult to perform face recognition based on original face image without reducing the dimensionality by extracting the important features. Kirby and Sirovich first used Principal Component Analysis (PCA) to extract the features from face image and used them to represent human face image (Sakthivel and Rajaram, 2011. PCA seeks for a set of projection vectors which project the image data into a subspace based on the variation in energy. In 1991, Turk and Pentland introduced the wellknown eigenface method. Eigenface method incorporates PCA and showed promising results. Another well-known method is Fisherface. Fisherface incorporates Linear Discriminant Analysis (LDA) to extract the most discriminant features and to reduce the dimensionality. Principal Component Analysis (PCA, also known as "Eigenfaces"), is one of the most known global face recognition algorithm (Darwish et al., 2009). In general, LDA-based methods outperform PCA-based Chan et al. (2010). Recently, there has been a lot of interest in geometrically motivated approaches to data analysis in high dimensional spaces. Consider the case where data is drawn from sampling and probability distribution that has support on or near a sub manifold of Euclidean space.
Suppose a collection of data points of ndimensional real vectors are drawn from an unknown probability distribution. In increasingly many cases of interest in machine learning and data mining, one is confronted with the situation which is very large. However, there might be reason to suspect that the "intrinsic dimensionality" of the data is much lower. This leads one to consider methods of dimensionality reduction that allow one to represent the data in a lower dimensional space (Sakthivel and Rajaram, 2011). A great number of dimensionality reduction techniques exist in the literature. In a practical situation, where it is prohibitively large, one is often forced to use linear (or even sublinear) techniques. Consequently, projective maps have been the subject of considerable investigation. Three classical, yet popular forms of linear techniques are the methods of principal component analysis (PCA). Multi-Dimensional Scaling (MDS)) and Linear Discriminant Analysis (LDA). Each of these is an eigenvector method designed to model linear variability in high-dimensional data More recently, frequency domain analysis methods (Sakthivel and Rajaram, 2011;Sakthivel and Lakshmipathi, 2010) such as Discrete Fourier Transform (DFT), Discrete Wavelet Transform (DWT) and Discrete Cosine Transform (DCT) have been widely adopted in face recognition. Frequency domain analysis methods transform the image signals from spatial domain to frequency domain and analyze the features in frequency domain. Only limited lowfrequency components which contain high energy are selected to represent the image.
Unlike PCA, frequency domain analysis methods are data independent. They analyze image independently and do not require training images. Furthermore, fast algorithms are available for the ease of implementation and have high computational efficiency.
In new parallel models for face recognition were presented. Feature fusion is one of the easy and effective ways to improve the performance. Feature fusion method is performed by integrating multiple feature sets at different levels. However, feature fusion method does not guarantee better result. One major issue is feature selection.
Feature selection plays a very important role to avoid overlapping features and information redundancy. We propose a new parallel model for face recognition utilizing information from frequency and spatial domains. Both features are processed in parallel way. It is well-known that image can be analyzed in spatial and frequency domains. Both domains describe the image in very different ways. The frequency domain features are extracted using techniques like DCT, DFT and DWT methods respectively. By utilizing two or more of these different features, a better performance is guaranteed.
Feature fusion method suffers from the problem of high dimensionality because of the combined features. It may also contain redundant and noisy data. To solve this problem, PCA is applied on the features from frequency and spatial domains to reduce the dimensionality and extract the most discriminant information. The Short Time Fourier Transform (STFT) represents a sort of compromise between the time-and frequency-based views of a signal. It provides some information about both when and at what frequencies a signal event occurs. However, one can only obtain this information with limited precision and that precision is determined by the size of the window (Birgale and Kokare, 2010).

Related work:
In the previous work (Sakthivel and Rajaram, 2011;, Face Recognition is achieved with different kinds of facial features which were used separately or in a combined manner. Currently, Feature fusion methods and parallel methods are the facial features used and performed by integrating multiple feature sets at different levels. However, this integration and the combinational methods do not guarantee better results. Hence to achieve better results, the feature fusion model with multiple weighted facial attribute set has been selected. For this feature model, face images from predefined data set has been taken from Olivetti Research Laboratory (ORL) and applied on different methods like Principal Component Analysis (PCA) based on Eigen feature extraction technique, Discrete Cosine Transformation (DCT) based feature extraction technique, Histogram Based Feature Extraction technique and Simple Intensity based feature technique. The extracted feature set obtained from these methods were compared and tested for accuracy. In this study, a model has been developed by using the above set of feature extraction techniques with different levels of weights to attain better accuracy. The results show that the selection of optimum weight for a particular feature will lead to improvement in the recognition rate. But in performance wise, it is low as compared with that of Wavelet Decomposition technique.
Proposed work: In the previous work (Sakthivel and Rajaram, 2011), it is showed that the performance of the face recognition system can be improved using multiple attributes. In this study, it is shown that the performance of the overall system can be improved by using single level decomposed images instead of using the original images for training and testing.

MATERIALS AND METHODS
Wavelet decomposition: Wavelets are mathematical functions that convert data into different frequency components and study each component with a resolution matched to its scale. They have advantages over traditional Fourier methods in analyzing physical situations where the signal contains discontinuities and sharp spikes. Wavelets were developed independently in the field of mathematics (Suhartono et al., 2010;Sabeenian, 2010), quantum physics, electrical engineering and seismic geology. Interchanges between these fields in the last ten years have led to many new wavelet applications such as image compression, turbulence, human vision, radar and earthquake prediction. There are two functions in wavelet transform, i.e.,scale function (father wavelet) and mother wavelet. Equation 1 shows mother wavelet Decomposition for Face Recognition: Where: Wavelet analysis is a windowing technique with variable-sized regions (Suhartono et al., 2010;Sabeenian, 2010). Wavelet analysis allows the use of long time intervals where we need more precise lowfrequency information and shorter regions where we need high-frequency information. Figure 1 shows one level wavelet decomposition.
Where LL1 represents approximate or smooth information, LH1 represents Horizontal edge information of wavelet frequency information, HL1 represents vertical edge information of wavelet frequency information, HH1 represents the diagonal edge frequency information. Level 2 wavelet Decomposition is shown in Fig. 2. Where LL1 represents approximate or smooth information for Level 2 wavelet, LH1 represents Horizontal edge information, HL1 represents vertical edge information and HH1 represents the diagonal edge information for Level 2 wavelet frequency, Where LL1 further divided into LL2, LH2, HL2 and HH2 decomposition.

Multi resolution analysis:
The most popular multiresolution analysis technique is the wavelet transforms (Suhartono et al., 2010;Sabeenian, 2010). Therefore in this study, the 2D discrete wavelet transform has been used in order to extract multiple subband face images. These subband images contain coarse approximations of the face as well as horizontal, vertical and diagonal details of faces at various scales. Subsequently, PCA or ICA features from these sub bands have been extracted. These multiple channels have been exploited by fusing their information for improved recognition. A level one decomposed image has been used as the input image for training and testing.
Multiresolution methods (Suhartono et al., 2010;Sabeenian, 2010) provide powerful signal analysis tools, which are widely used in feature extraction, image compression and denoising applications. Wavelet decomposition is the most widely used multiresolution technique in image processing. Images have typically locally varying statistics that result from different combinations of abrupt features like edges, of textured regions and of relatively low-contrast homogeneous regions. While such variability and spatial nonstationarity defies any single statistical characterization, the multiresolution components are more easily handled.
Wavelet transform can be performed for every scale and translation, resulting in continuous wavelet transform (CWT) or only at multiples of scale and translation intervals, resulting in discrete wavelet transform (DWT). Since, CWT provides redundant information and requires a lot of computation, generally DWT is preferred. The two-dimensional wavelet transform is performed by consecutively applying onedimensional wavelet transform to the rows and columns of the two-dimensional data. Wavelet Transform works well in the context of discrete emotion (Rizon, 2010).
In the final stage of the decomposition there is four resolution subband images: A1, the scaling component containing global low-pass information and three wavelet components, H1,V1 and D1 correspond respectively, to the horizontal, vertical and diagonal details.
Extracting Eigen features F1: The eigenfaces approach for face recognition involves the following initialization operations: • Acquiring a set of training images In the first case, an individual is recognized and identified. In the second case, an unknown individual is present. The last two cases indicate that the image is not a face image. Case three typically shows up as a false positive in most recognition systems. In this framework, however, the false recognition may be detected because of the significant distance between the image and the subspace of expected face images (Toure and Beiji, 2010).
The histogram feature vector F2: The distribution of gray levels occurring in an image is called gray level histogram. It is a graph showing the frequency of occurrence of each gray level in the image versus the gray level itself. The plot of this function provides a global description of the appearance of the image. The histogram of a digital image with gray levels in the range [0,L-1] is a discrete function: P(r k ) = n k /n Where: r k = The k th gray level n k = Number of pixels in the image with that gray level N = The total number of pixels in the image K = 0, 1, 2,., L-1 and L=256 Equation 2: P (r k ) gives an estimate of the probability of occurrence of gray level r k . If we use L value of small size, then n k will contain a range of nearest values in L number f bins. So for constructing Histogram Based Feature, the set n k and the mid values of the bin n k were combined.

Support Vector Machines (SVMs):
In this study, support vector machines recognizing the faces based on the two sets of features have been used (Benhaddouche and Benyettou, 2010). Support vector machines are a set of related supervised learning methods used for classification and recognition (Izabatene et al., 2010;Kim et al., 2010). Viewing input data as two sets of vectors in an n-dimensional space, an SVM will construct a separating hyperplane in that space, one which maximizes the margin between the two data sets.
To calculate the margin, two parallel hyperplanes are constructed, one on each side of the separating hyperplane, which are "pushed up against" the two data sets. Intuitively, a good separation is achieved by the hyperplane that has the largest distance to the neighboring datapoints of both classes, since in general the larger the margin the better the generalization error of the classifier (Suhartono et al., 2010). shows testing we trying to minimize the overall risk: Where: P(x, y) = The joint distribution function of x and y SVM = Selected as the classifying function One distinctive advantage of this type of classifier over traditional neural network is that SVMs can achieve better generalization performance. Support vector machine is a pattern classification algorithm developed by Vapnik. It is a binary classification method that finds the optimal linear decision surface based on the concept of structural risk minimization. As shown by Vapnik, this maximal margin decision boundary can achieve optimal worst-case generalization performance. It can be noted that SVMs are originally designed to solve problems where data can be separated by a linear decision boundary. By using kernel functions, (Sakthivel and Rajaram, 2011; SVMs can be used effectively to deal with problems that are not linearly separable in the original space. Some of the commonly used kernels include Gaussian Radial Basis Functions (RBFs) (Sakthivel and Lakshmipathi, 2010), polynomial functions and sigmoid polynomials whose decision surfaces are known to have good approximation properties. Relying on the fact that the training data set is not linearly separable, a Gaussian Radial Basis Function (RBF) kernel is selected in this study. The RBF kernel performs usually better for the reason that it has better boundary response as it allows for extrapolation. Support vector machines project the data into a higher dimensional space and maximize the margins between classes or minimize the error margin for regression (Shahrabi et al., 2009;Yogameena et al., 2010). A complexity parameter permits the adjustment of the number of error versus the model complexity and different kernels, such as the Radial Basis Function RBF) kernel, can be used to permit non-linear mapping into the higher dimensional space. In the proposed face recognition system, instead of using the face directly, a feature vector representing the face is constructed by using DWT transformation. Thus the modified approach for face recognition involves the following initialization operations: • Acquiring a set of training images In this implementation, we use the L1 decomposed image for training and testing to improve the recognition accuracy. The following Fig. 3 shows the proposed model of overall face recognition system The following steps are used to recognize new face images: • Given an image to be recognized, calculating a set of weights of the M eigenfeatures by projecting the DWT feature vector onto each of the eigenfaces • Determining whether the image is a face at all by checking to see if the DWT feature vector is sufficiently close to the face space If it is a face, classifying the weight pattern as either a known or unknown. The support vector machine produces similar correct classification rate compared to neuro-fuzzy system (Ali et al., 2011). However, the support vector machine achieves much better true positive rate and performs much faster than neuro-fuzzy system for both training and testing in several datasets.

Steps involved in training:
• Load a set of 'n' ORL Face Images for Training and resize them to 96×96 if necessary • Finding the L1 Decomposed Images using wavelet transformation technique. After L1 Decomposition, the output images will be in size of 48×48 pixels • Reshaping the Images in to 1×2304 and preparing an n x 2304 Feature Matrix, representing the training data set

RESULTS AND DISCUSSION
In the previous study (Sakthivel and Lakshmipathi, 2010) different dimensionality reduction techniques were selected and applied in order to reduce the loss of classification performance due to changes in facial appearance. The important factor of using the dimensionality reduction techniques in that work was firstly, to obtain significant feature vectors of the face and search for those components that were less sensitive to intrinsic deformations due to expression or due to extrinsic factors, like illumination. For training and testing Support Vector Machine (SVM) was selected as the classifying function. The performance of recognition while using PCA as well as LDA for dimensionality reduction seems to be equal in terms of accuracy. But it was observed that, LDA requires very long time for processing more number of multiple face images even for small databases. In case of LPP and NPE methods, the recognition rate is very less if there are increasing number of face images as compared to that of PCA and KPCA methods. Finally it is concluded that the two methods PCA and Kernel PCA are the best performers.
On the other hand, it is observed that recognition of faces, subject to illumination changes is a more sensitive task. Utilizing the proposed methods provide considerable improvement in the case of illumination variations. Finally it is concluded that the two methods PCA and Kernel PCA are the best performers. Hence for the face recognition system these two methods are better concerned than other three techniques. Also in (Sakthivel and Rajaram, 2011) Feature fusion methods and parallel methods were used and performed by integrating multiple feature sets at different levels. However, this integration and the combinational methods do not guarantee better result.
Hence to achieve better results, the proposed work suggests a novel feature fusion model with multiple weighted facial attribute set. For this feature model, face images from predefined data set has been taken from Olivetti Research Laboratory (ORL) and applied on different methods like Principal Component Analysis (PCA) based Eigen feature extraction technique, Discrete Cosine Transformation (DCT) based feature extraction technique, Histogram Based Feature Extraction technique and Simple Intensity based features. The extracted feature set obtained from these methods were compared and tested for accuracy. In this study we have developed a model which will use the above set of feature extraction techniques with different levels of weights to attain better accuracy. The proposed Multiple Weighted Feature Attribute Sets based training provided significant improvement in terms of performance accuracy of the face recognition system. The weights of the used feature sets were decided based on trial and error method. This will be applicable for systems with predefined data sets.
In current work, the performance of proposed face recognition model was tested with the standard set of images called "ORL Face Database". The ORL Database of Faces contains a set of face images used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department. There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open/closed eyes, smiling/not smiling) and facial details (glasses/no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).
Thus, the accuracy of recognition with and without wavelet decomposition has been evaluated. The following Table 1 shows the overall results of these two types of techniques with different number of input face images.
The data given in Table 1 and the graph given in Fig. 4 shows how the performance of recognition has been improved very much by applying wavelet decomposition as a preprocessing for the image feature enhancement and how more significant improvement in terms of the accuracy of the result was achieved.

CONCLUSION
The Enhanced machine learning based multi attribute face recognition algorithm provides significant improvement in performance which shows the accuracy of the face recognition system. With ORL data set, a significant 8.54 percent performance improvement was observed during various tests. In this study, during evaluation of the performance of algorithms, unit weights were used on the feature sets for simplicity of evaluation. If one changes the weight, as in the previously proposed model (Sakthivel and Lakshmipathi, 2010), there will be change in recognition accuracy. If one uses unit or fixed weight then this idea will be applicable for systems with predefined data sets. For dynamically changing database, this approach will not be suitable. Hence, future works may address methods for automatic estimation of the weights of the feature sets for better recognition accuracy. Face recognition has been and will continue to be a very challenging and difficult problem. In spite of the great work done in the last 30 years, one can be sure that the face recognition research community will have to work for at least next 30 years to completely solve the problem.