© 2010 Science Publications Automatic Mouth Localization Using Edge Projection

Problem statement: This study presented algorithms to detect mouth from color and intensity images. Approach: First, this algorithm detected the face region in the image and extracts intensity valleys from the face region. Next, the algorithm extracted iris candidates from the valleys and computed the costs for each pair of iris candidates. Finally, a pair of iris candidates was selected as irises by using the computed costs. Projection based method had been used to detect mouth corresponding to irises location. Results: By experiment, the proposed algorithm detected 90% of full mouth region for South East Asian database and 74% for European database. Conclusion: The algorithm was considered successful to detect mouth detection under variation of pose, illumination and orientation. For future improvement, more preprocessing steps might be needed to enhance and eliminate the effect of beard, moustache and illumination.


INTRODUCTION
Over the past decade, face recognition has attracted substantial attention from various disciplines and seen a tremendous growth in researches. Automated face recognition has been applied in two ways: Holistic and feature-based. The holistic approach (Brunelli and Poggio, 1993;Beymer, 1993) treats a face as 2D pattern of intensity variation.
Generally, eyes and mouth are important in understanding the information and feeling conveys by a person. The feature-based approach (Kawaguchi et al., 2000;Kawaguchi and Rizon, 2003;Brunelli and Poggio, 1993) recognizes a face using the geometrical measurements taken among facial features such as eyes and mouth. Many researches have proposed methods to find the eye (Feng and Yuen, 1998;Zhou and Geng, 2004) and mouth regions (Sobottka and Pitas, 1996;Li et al., 2004) or to locate the face region in an image. These methods have shown the popularity of using information such as template matching, geometrical and intensity features. Brunelli and Poggio (1993) and Beymer (1993) located eyes using template matching. In this method, an eye template of a person is moved into the input image. Then, a patch of the image that has the best match to the eye template is selected as the eye region. However, template matching and eigenspace method require normalization of the face for variation of size and orientation. A large number of templates are needed for template matching to accommodate varying pose.
For facial feature extraction, one of the very popular methods in geometric-feature based approach is the use of vertical and horizontal projections. The projections can be employed after taking the first derivative of the image (Brunelli and Poggio, 1993) as well as directly on intensity the intensities values (Sobottka and Pitas, 1996). Projection based methods (Baskan et al., 2002;Gourier et al., 2004;Ryu and Oh, 2001;Pantic et al., 2001) have been used particularly to find coarsely the position of the facial features. For eye and mouth localization, valley regions are detected using morphological filtering and component analysis. There are only a few studies give qualitative results about the performance. Most of the sample images used by previous studies (Baskan et al., 2002;Gourier et al., 2004;Ryu and Oh, 2001;Pantic et al., 2001) did not contain thick beard. This is because the intensities level of thick beard might reduce the mouth detection rate of projection based method. A large number of the South East Asians have darker and thicker beard or moustache than Europeans. The combination of methods to detect eyes and mouth propose in our algorithm has not been applied by previous studies.
In this study, the algorithm first detects the face region in the image and extracts intensity valleys from the region. Then, iris candidates are extracted from the valleys using the feature template of Lin and Wu (1999) and separability filter of Fukui and Yamaguchi (1997). Next, the costs from template matching, separability filter and Hough transform (Kawaguchi et al., 2000) are computed for iris candidate pairs to determine the irises of both eyes. Finally, horizontal and vertical integral projections have been applied to lower part of the face for mouth detection. The positions of both irises are used to determine the horizontal wide of the mouth region. This study begins by describing each of these phrases. It presents further the experimental results and finally discusses the limitations and future improvement of the proposed method.

MATERIALS AND METHODS
In our experiment, we use two databases to evaluate the performance of our algorithm. One is European database with color images and another is South East Asian database with intensity images. For input image, we assume the image is a head-shoulder image with plain background and head rotation on yaxis is approximately within the interval (-30, 30°).

Extraction of the face regions from intensity images:
This proposed algorithm extracts the face region from an intensity image using a similar method to that shown in (Brunelli and Poggio, 1993). First, we apply Sobel edge detector to the original intensity image I(x, y). Let E(x,y) denote the obtained edge image where E(x,y) = 1 if (x,y) is an edge pixel and otherwise ( ) E x, y 0 = . Next, for each column x and each row y, v(x) and H(y) are computed by: The x-positions x L and x R of the left and right boundaries of the head are given by smallest and largest values of x such that V(x)≥V(x 0 )/3 where x 0 denotes the column x with the largest V(x). The y-position y min of the upper boundary of the head is given by the smallest y such that H(y)≥0.05(x R -x L ). Finally, we give the y-position y max of the lower boundary of the head by

Extraction of the face regions from color images:
The proposed algorithm first creates a skin-color model for the face region detection from color images. Let (R, G, B) denote a color vector in RGB color space. Then, the normalized color vector (r, g) is given by: Using the rg color space, we create a skin-color model by (Kawaguchi and Rizon, 2003): 1. Select skin-color pixels (x,y) whose color value v = (r,g) satisfy ( ) g v ≥ ε 2. Estimate the mean r g ( , ) µ = µ µ and covariance matrix ∑ of the color distribution of the pixels in the selected regions 3. Computes the Gaussian distribution model where the probability density function is given by: denotes a random color vector of a pixel in an image. 4. Apply a closing and opening of the mathematical morphology to the regions of skin-color pixels 5. Find a connected component of skin-color pixels with the largest area 6. If (x,y) denotes the coordinates of the pixels obtained in Step 5, let 1 X and 2 X be the smallest and largest of i x and Y 1 be the smallest of y i . Then, the face region produced by two vertical lines The proposed algorithm applies grayscale closing (Sternberg, 1986) to the face region to extract valleys. V(x, y = G(x, y)-I(x, y) where G(x, y) and I(x, y) denote its intensity value and value obtained by applying grayscale closing. Then, region consisting of pixels (x, y) such that V(x, y) is greater than or equal to a threshold value are determined to be valleys. Histogram equalization and light spot deletion are performed to enhance the quality of image and reduce illumination effect in this algorithm.

Detection and selection of iris candidates:
The proposed algorithm performs the similar method as proposed by (Kawaguchi and Rizon, 2003): • Computes costs ( ) C x, y for all pixels in the valleys: Then, the algorithm selects m pixels according to non-increasing order that give the local maxima of ( ) C x, y . The templates used to compute C 2 (i) and C 3 (i) • Places the template of Fig. 1 at each candidate location and measures the separabilty between two regions R 1 and R 2 given by : • Applies Canny (1986) edge detector to the face region and measures the fitness of iris candidates to the edge image using Hough transform (Kawaguchi et al., 2000). We give the equation of a circle by: (x-a) 2 +(y-b) 2 = r 2 where, (a,b) is the circle center and r is the radius.
• Given an iris candidate B i = (x i , y i , r i ) measures the fitness of iris candidates to the intensity image by placing two templates in Fig. 2 • Computes a cost for each pair of iris candidates, B i and B j is given by: where, C(i) and C(j) are costs computed by Eq. 6. R(I,j) is the normalized cross-correlation value computed by using eye template in Fig. 3 which is produced by manually cut off the eye region from a face image. t is the weight to adjust two terms of the cost.
Mouth detection: First, we assume this proposed algorithm crops the region containing mouth automatically from the Canny edge detection images by defining: The irises locations obtained in previous steps can be used to determine left and right boundaries of mouth region from horizontal integral projection.

RESULTS AND DISCUSSION
We made the experiments to evaluate the performance of the proposed algorithm. The European database and South East Asian database are used in the experiment. There are 63 color images in European database with 21 subjects and size of each image is 768×576. South East Asian database contains 60 intensity images with 8 subjects and size of each image is 480×360. The images in both databases are varying in pose, head orientation, illumination and gender. Images with moustache and beard are included as well.
Since the successful detection rate for irises are more than 90% for both databases as shown in (Kawaguchi and Rizon, 2003), we put our concentration on mouth detection. From Table 1, this proposed algorithm scores 74% successful mouth detection. Illumination effect, different gender and the existence of moustache or beard still contribute to the failure detection although there are successful mouth detections from these images in this experiment. However, 15% of half mouths detected by this algorithm from European database can be essential in classification stage using Artificial Intelligence such as Neural Network. This algorithm detected 90% of full mouth region for South East Asian database even with beard or moustache. This may due to the better and more stable illumination environment in intensity images. The processing time for irises and mouth detection for European database is merely 0.2 sec and 0.1 sec for South East Asian database for each image. We can see that intensity image has an advantage over color image in terms of processing time. Figure 4 shows examples of the images for which the proposed algorithm could correctly detect the mouth.

CONCLUSION
Most of the facial features algorithms previously reported used template matching, eigenspace method or Hough transform. However, template matching and eigenspace method require the normalization of the image face in its size and orientation when large variations occur. These algorithms can detect facial features from faces whose patterns are similar to training samples. In order to be more robust, large amount of facial features' models are required especially when dealing with the moustache, beard and different gender or races. Hough transform algorithms need to estimate the searching windows for the facial features in the face region for example; irises are considerably small as compared to the face size. Thus, these algorithms require complete face region detection.
In this proposed algorithm, iris detection rate is higher than 90% as shown in (Kawaguchi and Rizon, 2003) even in incomplete face region. This is due to the higher intensities of the irises compared to other features in face region. Our algorithm is considered successful in South East Asian database with 90% successful rate for mouth detection under variation of pose, illumination and orientation. This may be due to the better illumination environment offered by intensity images compared to color images. In European database, the illumination environment and existence of other features such as nose, beard and moustache have contributed to the failure detections. In addition, the difference of facial texture for Europeans and Asians might also affect the mouth detection rate. Note that, the half mouth detected in both databases 15% for European database and 7% for South East Asian database can be useful as training samples in classification stage. This is because these regions contain essential geometric information which is useful for face recognition purpose.
For future improvement, more preprocessing steps are needed to enhance and eliminate the effect of beard, moustache and illumination. Geometrical distances according to the detected irises locations can be added to increase the successful rate of mouth detection. Artificial Intelligence techniques such as Neural Network can be used as classifier by using half mouth regions as training samples to increase the performance of face recognition system.