Increasing the Reliability of Fuzzy Inference System-Based Skin Detector

,


INTRODUCTION
Skin detection is used in determining pixels related to human skin. It is an important technique in image processing and the most distinctive and widely used key technology in many applications, such as face detection (Kovac et al., 2003), face tracking (Dadgostar et al., 2005), human motion analysis (Gavrila, 1999) and naked image filters (Fleck et al., 1996). One of the major issues in using skin color in skin detection is the task of choosing a suitable color space. Color is a useful cue to extract skin pixels. Numerous color models are being used today, because color science is a broad field encompassing many areas of applications. The most common color space models are RGB, CMY and CMYK (Gonzalez and Woods, 2002); Hue, Saturation and Intensity (HIS) (Umbaugh, 1997;Singh et al., 2003); Hue, Saturation and Value (HSV) (Lin et al., 2003); normalized RGB (Chan et al., 1999;Vezhnevets et al., 2003) and YCbCr (Umbaugh, 1997;Lin et al., 2003). Many skin models have been developed based on colors (RGB) (Vezhnevets et al., 2003), but these approaches are not robust enough to handle different lighting conditions and complex backgrounds containing surfaces and objects with skin-like colors. Many researchers (Gasparini et al., 2005;Shirali-Shahreza et al., 2008) have used pixel-based algorithms as the main methods for skin detection. Nevertheless, few skin detection methods have been constructed based on a pixel and its neighbors (region) (Ruiz-Del-Solar and Verschae, 2004). Some researchers have used traditional techniques (Jones andRehg, 2002 Ghouzali et al., 2008;Maskooki et al., 2009), while others used intelligence (Brown et al., 2001;Bhoyar and Kakde, 2010;Subramanian et al., 2008) to detect skin pixels.  Almohair et al. (2007) Skin detection in luminance images using threshold technique Threshold values based skin detection Jiang et al. (2007) Skin detection using color, texture and space information Integrate color, texture and space information. Using Gabor filter Abin et al. (2008) Skin segmentation based on cellular Combining color and texture information Learning automata of skin with cellular learning automata Ahmed et al. (2007) A robust fuzzy logic based approach for Fuzzy logic using different color model skin detection in colored images Al-Wadud and Chae (2008) A skin detection approach based on color distance map DM based on gray scale images Conci et al. (2008) Comparing color and texture-based Skin detection based on texture feature algorithms for human skin detection using different color spaces Ghouzali et al. (2008b) A skin detection algorithm based on discrete cosine Skin model based on generalized Gaussian transform and generalized Gaussian density density Fotouhi et al. (2009) Skin detection using contourlet-based texture analysis Color and texture based on wavelet domain using neural network Zafarifar et al. (2010) Improved skin segmentation for TV Image defined in HSV color space and versus a enhancement, using color and texture features histogram-based color detector and extract texture feature  Table 1 illustrates several publications using different approaches for skin detection. This study proposes a novel reliable Fuzzy Inference System (FIS) for skin detection, which combines both color and texture features.
Skin detection: It could be defined as the process of finding skin-colored pixels and regions in an image or a video. There are two main approaches for skin detection, namely pixel-based and region-based. In pixel-based methods, the features (e.g., color) are extracted from information coming from a pixel; in region-based methods, the features (e.g., texture) are extracted from information about a pixel and its neighbors. Skin detection algorithms aim to recognize skin pixels in an unconstrained input image. Skin color is considered as a useful and discriminating spatial feature for many applications, but it is not robust enough to deal with complex image environments. Skin tones range from dark (some Africans) to light white (Caucasians and some Europeans). In addition, both the light-changing conditions ( Fig. 1) and the existence of objects with skin-like colors could cause some major difficulties. Figure 2 shows the different skin-color tones and skin-color-like tones. To help overcome these problems, this study proposes a novel FIS for skin detection that combines both color and texture features.

MATERIALS AND METHODS
Statistical-based texture features: Three texture features were estimated using a statistical approach, which computes the different properties through three statistical measures: standard deviation, maximumminimum range and entropy. These features were extracted from each pixel and its neighbors. The standard deviation could be calculated using the following formula (Verzani, 2004): Where: Meanwhile, the maximum-minimum range equals (maximum value of pixel-minimum value of pixel) of the n-by-n neighborhood around the corresponding pixel in input image I (Gonzalez and Woods, 2002). Finally, the entropy was estimated by using the formula (Gonzalez and Woods, 2002): where, P(x i ) is the probability of the pixel color (x i ) and n represents the number of pixels.
Generation of the fuzzy inference system: The term "fuzzy logic" emerged in the development of the theory of fuzzy sets by Zadeh (1965). Fuzzy logic provides a simple way of arriving at a definite decision based on vague, ambiguous, imprecise, noisy, or missing data. It generates an FIS structure from databases in order to generate a Sugeno-type fuzzy (Sugeno, 1985) inference system structure using subtractive clustering.
Subtractive clustering: Data clustering is considered as an interesting approach for determining similarities in data and categorizing similar data into groups (Visalakshi and Thangavel, 2009). The most representative off-line clustering techniques are illustrated in Table 2. Fuzzy clustering is an important class of clustering algorithms. It helps find natural vague boundaries in data (Du, 2010). The subtractive clustering method proposed by Yager and Filev (1994) is an example of fuzzy clustering. It is based on partitioning the data space into grids, with the density of each grid computed depending on the distance of the grid center to the data points. A grid with many nearby data points will have a high potential value. The first cluster is selected based on the grid with the highest potential value to ensure that any two grids that are close together do not become two different clusters; the potential of the nearby grids is reduced based on their distance from the cluster center. The next cluster center is then selected from the remaining grids with the highest potential value. K-means (or hard C-means) clustering Johnson (1967) hierarchical clustering Dempster et al. (1977) Mixture of Gaussian Bezdek (1981) The fuzzy cmeans algorithm Kohonen (1982) Kohonen's self-organizing map Yager and Filev (1994) Mountain method for clustering Chiu (1994) Subtractive clustering Chiu (1994) further developed this idea by using actual data points as cluster centers, rather than grids. Each data point is given a potential value based on its neighboring points and the point with the highest potential value is considered the cluster center (Sampath and Shan, 2008). However, the subtractive clustering method assumes that each data point is a potential cluster center and calculates the likelihood that each data point would define the cluster center based on the density of surrounding data points. The data point with the highest potential value, which will be the first cluster center, is selected, followed by all other points. To determine the next data cluster and its center location, all data points near the first cluster center (as determined by the cluster center range) are then removed. This process is repeated until all data points are within the range of the cluster center. A data point with the highest potential, which is a function of the distance measure, is considered the cluster center. The potential of each data point is estimated using the following formula (Chiu, 1994;: Where: Where: r a = A positive constant ||.|| = The Euclidean distance Thus, to measure the potential value for a data point, the distance from this point to all other data points is computed. A data point with many neighboring data points will have a high potential value. The constant r a is the radius defining a neighborhood; data points outside this radius have little influence on the potential value. Once the potential value of each data point is computed and that with the highest potential value is selected as the first cluster center. Let x*1 be the location of the first cluster center and P*1 be its potential value. The potential value of each data point x i is then revised by the following formula (Chiu, 1994;: Where: and r b is a positive constant. Thus, the amount of potential value is subtracted from each data point as a function of its distance from the first cluster center. Data points near the first cluster center will have greatly reduced potential value and will, therefore, be unlikely selected as the next cluster center. The constant r b is the radius defining the neighborhood, which will have measurable reductions in potential value. When the potential value of all data points has been revised, the data point with the highest remaining potential value is selected as the second cluster center. Afterwards, the potential value of each data point is reduced according to their distance to the second cluster center. In general, after the k'th cluster center is obtained, the potential of each data point is revised by the following formula (Chiu, 1994;: Where: * k x = The location of the k'th cluster center * k P = Its potential value This process is repeated until the remaining potential values of all data points fall below some fraction of the potential value of the first cluster center * 1 P .

Sugeno-type fuzzy inference:
The Sugeno or Takagi-Sugeno-Kang (Sugeno, 1985) FIS algorithm is similar to the Mamdani method (Mamdani and Assilian, 1975) in many aspects. The first two parts of the fuzzy inference process-fuzzifying the inputs and applying the fuzzy operator-are exactly the same. The main difference between Mamdani and Sugeno is that the Sugeno output membership functions are either linear or constant. A typical rule in a Sugeno fuzzy model has the following form (Mamdani and Assilian, 1975): If input 1 = x and input 2 = y, then output is z = ax + by + c.
For a zero-order Sugeno model, the output level z is a constant (a = b = 0). The output level z i of each rule is weighted by the firing strength w i of the rule. For example, for an AND rule with input 1 = x and input 2 = y, the firing strength is: where, F1 and F2 are the membership functions for inputs 1 and 2, respectively.
The final output of the system is the weighted average of all rule outputs, which is computed as (Tang et al., 2007):

Fig. 3: The prposed fos based skin detection
The proposed skin detection algorithms: The proposed FIS based on Sugeno reasoning for skin detection combines both color and texture features. To increase the reliability of the skin detection process, neighborhood pixel information is incorporated into the proposed method. The color features are extracted directly from the pixels and the texture features of the scanned windows over the image are extracted using a statistical approach to produce feature vector. To determine the decision rule of these features, an FIS is used. The overall structure of the proposed skin detection is illustrated in Fig. 3.

Data set:
The creation of skin and non-skin image database involved collecting samples of different human skin-colored pixels from a variety of people under different illumination conditions (skin pixels without background), as well as a variety of non-skincolored pixels. An image was examined manually to determine whether it contained skin. If no skin was present, the image was placed in the non-skin group. In the skin image group, regions of skin pixels were manually extracted using Adobe Photoshop. In labeling skin, the eyes, hair, clothes, mouth opening and lips were all excluded. The collected data were then divided into two subsets: constructing ("constructing_set") and testing ("testing_set"). The constructing_set was used as the primary set of data applied in constructing FIS, with 351,228 skin pixels and 428,602 non-skin pixels. The test set included different images with simple and complex backgrounds, indoor and outdoor settings and different image sizes and skin colors used to measure the performance of the proposed skin detection. It had 632,379 different pixel types.
Feature extraction: Three statistical measures were used to estimate the texture features-standard deviation, maximum-minimum range and entropy. These features were extracted from each pixel and its neighbors and two windows were moved over an image; the size of the first window was 3×3 and that of the second was 9×9. The color features (red, green and blue) of the center pixel of the first window were extracted and the first two static features were then estimated from pixels within the first window. Afterwards, entropy was estimated from the pixels within the second window. These six features were used as inputs to the fuzzy inference system. All statistical measures were computed for multi-channel image matrices (red, green and blue) and their average was determined. Figure 4 shows an example of the scheme of computing entropy for multi-channel image matrices. Entropy is a statistical measure of randomness and is used to characterize the texture of the input image (Gonzalez and Woods, 2002). Entropy is defined for red, green and blue channels, respectively, as: The average entropy matrix will be: Fig. 4: Computing scheme entropy for colored image Fuzzy inference system: The FIS mainly consists of two phases-the construction phase and the test phase. The first two steps of both phases are inputting image and extracting features from such an image.

Construction phase:
The FIS was constructed using a priori knowledge about skin and non-skin pixels that were extracted from certain images (construction_set) in order to detect skin pixels using the genfis2 Matlab function. The rule extraction method is based on estimating clusters in the data; each cluster obtained corresponds to a fuzzy rule that relates a region in the input space to suitable output region. The construction phase generates a Sugeno-type FIS structure using both subtractive clustering algorithm and separate sets of input and output data. A set of rules that models the data behavior was extracted using subtractive clustering algorithm. Throughout this phase, the FIS structure containing a set of fuzzy rules to cover the feature space (constrution_set) was obtained. Figure 5 illustrates the construction phase of the FIS. Testing phase: A skin detector was used to test each pixel of a given image (test image) depending on the FIS. If a pixel was detected as skin, then it was stored in a new image (skin image) at the position of the original image. After examining all image pixels, a new binary image was obtained, including only skin pixels, as shown in Fig. 6.

RESULTS
The FIS was constructed based on the genfis2 Mathlab system. The main structure of the FIS is shown in Fig. 7. It has six inputs, each representing a single feature (SD, Entropy, Range, Red, Green and Blue) and one output, either skin or non-skin pixels.
The road map of the proposed FIS for skin detection is shown in Fig. 8. It represents the rules of the system. A single-figure window with 36 plots nested in it. The six plots across the top of the Fig. 8 represent the antecedent and consequent of the first rule. Each rule is a row of plots and each column is a variable. The rule numbers are displayed on the left of each row. The first six columns of plots show the membership functions referenced by the antecedent or the if-part of each rule. The last column of plots shows the membership functions referenced by the consequent or the then-part of each rule. The sixth plot in the last column of plots represents the aggregate weighted decision for the given inference system.
The implementation of the skin detector was conducted by testing different images with simple and complex backgrounds, indoor and outdoor settings, as well as different image sizes and skin colors. An experiment was performed on the testing set, which included 632,379 uncontrolled (different illumination, captured quality, distance to camera) pixels. Of these, each of the first 92,883 pixels belonged to an arbitrary number of skin images and images containing an arbitrary number of people and faces and the remaining 539,496 pixels reflected no skin pixels and included pixels belonging to images with objects presenting skin-like tones (e.g., red flower, dog, chocolate).
Two different skin detectors were tested and evaluated to select the one with higher reliability; afterwards, the evaluations were compared with the performance of previous skin detectors. The first skin detection method detects skin pixels based on the threshold skin color tones (Kovac et al., 2003;Gasparini and Schettini, 2006), while the second detects skin pixels using the proposed skin detector, which combines both texture and color features.
The original images shown in Fig. 9 are segmented based on predefined color rules. The obtained images show that many non-skin pixels are detected incorrectly as skin pixels. On the other hand, there are no skin pixels detected in the last set of images. The testing results of the proposed FIS-based skin detector using different threshold values are illustrated in Fig. 10. The results include high rates of true positives and true negatives with low rates of false detection. Although images 1, 2, 3 and 4 in Fig. 10 reflect several human skin types with different colors and textures, the skin pixels within these images were detected correctly by the proposed skin detector, except for a few scattered pixels that were incorrectly detected as non-skin pixels. Most skin pixels within the images shown in Fig. 10 were detected correctly; meanwhile, no false detection rates were shown within image 5 for a threshold value greater than 0.5 and within image 6 for all threshold values.
A skin detection process is not perfect and different users adopt varying criteria for performance evaluation. One of the evaluation criteria consists of the general appearance of the size zones detected. To quantify performance evaluation, True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) were computed for all pixels in the testing_set through skin detector testing. FP is the proportion of nonskin pixels classified incorrectly as skin, whereas TP is the proportion of skin pixels classified correctly as skin.    (Table 3).
Four metrics (Table 4) were used to evaluate the performance of the two skin detections. These metrics are: recall, precision, specificity and accuracy (Gasparini et al., 2005;Fawcett, 2004).

DISCUSSION
Considering the unconstrained nature of internet images, the performance of the proposed skin detector is surprisingly good. The best performance can detect 90% of skin pixels with a very low FP rate of 0.22% by combining both texture and color features. Although the threshold based skin detection method can detect 84.98% of skin pixels correctly, it also has a high FP rate of 8.45%. A simple comparison among the performance evaluation of the two skin detection methods is shown in Fig. 11.
Although there is no means to locate any two papers using the same test sets, examining previously published results may be useful. The performance of the proposed skin detector in this research is thus compared with that of other skin detectors. The Bethe tree approximation of first order model proposed by (Abdullah-Al-Wadud and Chae, 2008) can detect 72% of skin pixels with a 5% FP rate, whereas the proposed Bayesian model by (Jones and Rehg, 2002) can detect 69% at the same FP rate. Meanwhile, this model can detect 80% of skin pixels with an 8.5% FP rate or 90% correct detection with 14.2% FP. The skin detection method suggested by Zafarifar et al. (2010) can detect more than 83% of skin pixels correctly with a 20% FP rate. The recall rate of the pixel-based skin color classification proposed by (Gasparini et al., 2005) is 92% and the precision rate is 39%. These evaluation metric values indicate that the proposed skin detector outperforms other methods mentioned above, especially in terms of decreasing the FP rate.

CONCLUSION
Skin detection is an important pre-processing stage in many image analysis applications; hence, we proposed an improved FIS for skin detection, which combines both color and texture features to increase the reliability of the proposed method. Neighborhood information of each pixel was also used throughout the training and testing phases. Two skin detections (threshold-based skin detector and a combination of both color and texture features) were applied and tested. This study demonstrated that a skin detector based on both color and texture features can lead to an efficient and more reliable skin detection method compared with that based on threshold. The proposed detector reduces the FP rate to 0.22% compared with a threshold based skin detector. An essential future direction will be to validate the proposed algorithms using a standard skin database data set. Such a method will enable us to compare our detection results with those presented by other authors for the same test images. Another improvement would be to adapt our approach in wavelet domain.