Implementation of Multi-Centroid Moment Invariants in Thermal-Based Face Identification System

,


INTRODUCTION
As humans, we have always relied on holistic facial identification as a mean to identify another person's identity. Due to this remarkable capability that exists in humans, face identification-based security systems is gaining major acceptance as paramount biometric approach in access control and surveillance. It is non-invasive, hygiene and natural since it coexists with the mode of identification that humans employ in everyday matters. Although humans are capable of identifying faces effortlessly, this similar task is trivial in security vision system. Analysis and identification of facial images which was acquired from a real and nonideal imaging system still holds many complications since the appearance of face in concern varies dramatically due to incident angle and light variation, facial expressions, head pose and image quality.
Most research efforts in face identification field had focused on visible spectrum imagery (Kong et al., 2005). Even though progress in visible spectrum imagery is notably high, problems due to the nature of the approach used in visible spectrum is still an open issue. Images acquired from visible spectrum are formed primarily due to reflection and due to this; it is difficult to process because of the high dependency on incident angle and lighting variation from external light source. Recently, interest in thermal infrared imagery had increased among researchers. Thermal Infrared (IR) imagery is basically based on heat emission. Since thermal IR imagery is independent to external light source, problems encountered in visible spectrum-based systems do not exists in thermal IR imagery. This was proven by works done in Socolinsky et al. (2006) and Friedrich and Yeshurun (2002).
A more complex work was done in (Buddharaju et al., 2007) which utilize the physiological features. The obtained physiological features are closely related to the distribution of blood vessels under the face skin. The features extracted are used to form a unique thermal faceprint. Distance transform is then used to obtain an invariant representation for face identification. This work tremendously achieved 86% in correct identification rate for the University of Houston database. Geometric moment invariants were utilized for object identification (Rizon et al., 2006). Canny and Sobel edge detector was employed for segmentation prior to the calculation of geometric moment invariants. The first four of Hu's moment invariants were extracted as features from each image. This study (Rizon et al., 2006) achieved 90% in correct identification rate with Sobel edge detector, while, by employing the Canny edge detector approach, it could only achieved up to 70% in correct identification rate. Despite the maximum correctness in identification rate for both aforementioned works, works done in (Buddharaju et al., 2007) implements complex algorithms. This would further complicate the realization process in security vision system.
Motivated by works done in (Buddharaju et al., 2007) and also inspired by the implementation of Hu's classical moment invariants for object classification done in (Rizon et al., 2006), this study presents the implementation of moment invariants (with respect to centroid point obtained from frontal mugshot images) for thermal-based face identification system. Generally, most face identification system practices holistic analysis approach. This differs from the proposed approach where analysis is done non-holistically within each decomposed thermal regions. Likewise, the propose approach offers a broader possibilities for its usage. Due to independency towards facial features; such as length of eyes, nose and mouth, the proposed approach has the potential to be implemented for identification of other body parts; such as arms, abdomens and legs. The proposed system initially filters the background scenery via seeded region growing method. Later, it decomposes a filtered infrared image into 4 thermal regions via 3-valued threshold method. Nevertheless, region with the lowest thermal value is omitted from further processes. This is to ensure that the background scenery is not taken into consideration. Thereafter, Hu's first moment invariants (with respect to centroid point obtained from frontal view of each individual) is calculated as means of feature to be extracted from each generated thermal region. Minimum distance measurement is employed for classification between Hu's first moment invariants for stored and tests images. This is similar to template matching method. Further throughout this study, it can be seen that the proposed approach did not involve any training phase.
The outline of this study is as follow: The methods demonstrate the application of our proposed approach, followed by empirical results and discussion. Sequentially, conclusions are drawn. Finally, references are presented at the end of this study.

MATERIALS AND METHODS
Acquired facial image normally contains background scenery. If the entire image is taken into consideration for feature extraction, it may affect the performance of the system. Therefore, we employed seeded region growing method to remove the background scenery. This is done alongside with other conventional image pre-processes; such as histogram equalization and image normalization, prior to the proposed method.
The following subsections demonstrate the application of 3-valued threshold for thermal region decomposition which is followed by centroid point calculation (for later use in Hu's classical moment invariants), elaboration on Hu's classical moment invariants and also the implementation of minimum distance measurement for classification.

3-valued threshold decomposition:
The general definition of threshold is represented by the following equation: Where: f(x,y) = The input pixel g(x,y) = The output pixel T = The threshold value By inserting three threshold values rather than one threshold value, the 3-valued threshold equation can be derived from (1) as follows: Where: T 1 , T 2 , T 3 = The three threshold values L 1 , L 2 , L 3 , L 4 = The label for each generated thermal regions This shows that an image can be decomposed into several thermal regions by substituting the threshold values with border line thermal values. As a result, four thermal regions (four binary formatted images) are generated. In order to obtain these thermal regions, values for T 1 , T 2 and T 3 is selected based on the results acquired from the preliminary experiments conducted in (Abas et al., 2009). Referring to works done in (Abas et al., 2009), an initial value for T 1 , T 2 and T 3 is randomly selected within the range stated in (Buddharaju et al., 2004) where the temperatures at all pixels are mapped between 0 and 255. Mapped temperatures between 200 and 225 is said to be common temperature on face and mapped temperature between 175-200 and 225-255 are said to be normal temperature on cheeks and maximum temperature on face, respectively. Since the area for maximum mapped temperature on face is small and sparsely located within a face, this would cause the system to identify these areas as noise. Therefore, we selected T 3 to have the initial value of 200; the minimum value for the combination of mapped temperatures for common and maximum temperature on face.
As aforementioned, mapped temperatures between 175 and 200 is said to be normal temperatures on cheeks. With manual tuning done in (Abas et al., 2009), we discovered that the mapped temperatures between approximately 140 and 200 comprehend temperatures on convex surfaces of a face; such as nose, cheeks and forehead. Hence, the initial value for T 2 is set to 140. By employing the same manual tuning technique used in (Abas et al., 2009), T 1 is initially set to approximately 80, where this value affirms with values stated in (Buddharaju et al., 2004) (mapped temperature value between 0 and 100 normally indicates the background scenery). For maximum assurance that the background scenery is not taken into consideration, the lowest valued region (coldest region) is omitted from further processes. Although we have proposed the usage of multiple thresholds (constitute of 3 threshold values), various number of threshold values (e.g., 4, 5… N threshold values) could also be implemented, thus producing more thermal segments for further analysis.
Centroid point: Referring to an online source (Wikipedia, 2009), the first moment invariant, Φ1 (which will be elaborated in the next subsection), is roughly proportional to the moment of inertia around the image's centroid, if the pixel's intensities were interpreted as physical density. Thus the following is a brief interpretation on centroid calculation.
The general definition for centroid point in a discrete mass is as follows: where, r i and m i are particle positions and mass, respectively. For a binary formatted image, the numerator's particle position, r i , is substituted with pixel's coordinate and mass, m i , is substituted with pixel's intensity. For the denominator, ∑m i is substituted with the total number of pixels with the intensity of 1. The derived equation is as follows: where, C(x) and C(y) are coordinates for x-axis and yaxis, respectively. Therefore, the actual location of centroid point for a binary formatted image becomes (C(x), C(y)). Note that, intensity, I, holds a value of 1 or 0 for binary formatted images and 0-255 for a grayscaled image.

Moment invariants:
Image moments are weighted averages (moments) of the image pixels' intensities, or functions of those moments, usually chosen to have some attractive property or interpretation. These moments are normally used in numerous fields; such as image processing and computer vision. They are useful to describe objects after segmentation. Simple properties of the image which are found via image moments include area (or total intensity), centroid and information about its orientation. Similarly, moment invariants are properties of connected regions in images that are invariant to translation, rotation and scale. They are useful because they define a simple calculated set of region properties that can be used for shape classification and pattern recognition. Invariants to similarity transformation; such as rotation, translation and scaling, was the first invariants that appeared in pattern recognition field. It was caused partly because of their simplicity, partly because of great demand for invariant features that could be used in position-independent object classification. In this problem formulation, degradation operator, D, is supposed to act solely in spatial domain and to have a form of similarity transform. Initially, the relation between the ideal image f(x,y) and the observed image g(x,y) is described as g = D(f). Due to invariants to similarity transformation, this results to the following equation: where, r(x,y) denotes arbitrary rotation, translation and scaling. Invariants to translation and scaling are trivial in any imaging system. As early as, Hu (1962) As aforesaid, the first moment invariant, Φ1, is roughly proportional to the moment of inertia around the image's centroid, if the pixel's intensities were interpreted as physical density. By interpreting the pixel's intensities as physical density, we calculated the first moment invariant for each decomposed thermal region (with respect to centroid point acquired from each background filtered, frontal view source images). This is shown in Fig. 1. Theoretically, the impact of each thermal region (from different input image for the same individual) surrounding one common centroid point will be similar; or slightly deviates, if the source images are similar or slightly differs in angular pose. Therefore, the moments acquired from each respected thermal region surrounding the images' centroid will result to similar values. Followings are the derivation of Hu's first moment invariant: Where: where x and y are centroid point coordinate for x-axis and y-axis, respectively, while raw image moment, M ij with pixel intensity, I(x,y) are calculated as follows: for i,j = 0, 1 and 2.
On the other hand, the centroid point is obtained from the holistic frontal view of registered images prior to the calculation of moment invariant (or moment inertia) for each decomposed thermal region.
Minimum distance measurement: Previously, Hu's first moment invariants were calculated and extracted as features to be classified. Hence, for classification, we employed minimum distance measurement method between the stored and test values of Hu's first moment invariants obtained from each corresponding thermal region. The general definition of minimum distance measurement, x, via Euclidean Distance between two points, P and Q, is shown in (12): For higher order of points, the distance between any finite number of points, n, is shown in (13) below: In our case, since only two values are being compared; therefore (12) is redefined as follows: where, r and t are values of Hu's first moment invariants for registered and test images, respectively.

RESULTS
OTCBVS IRIS IR facial database were used to validate the effectiveness of our proposed approach. Figure 2 shows some examples from this dataset. All calculations were done with MATLAB 7.0 Student Version on a 1.8 GHz Centrino Duo processor with 1 GB RAM.
We conducted two experiments on the OTCBVS IRIS IR facial database to evaluate the performance of the proposed face identification method. We have used 2 and 4 test images for 1 registered image. Each image is decomposed into 4 thermal regions where the lowest (coldest) thermal region is not taken into consideration. This results to a total of 6 and 12 decomposed test images used for 3 decomposed registered images per individual. We would like to point out that only frontal shot images were used for registered images; whereas images with slight angular deviation in pose; to the left and right were used for test images. The first set of test images (2 test images) consists of quarter-left and quarter-right profiles, whereas the second set of test images (4 test images) constitute of quarter-left, mid-left, quarter-right and mid-right profiles. Since only the frontal shot images were used for registered dataset, left and right profiles were not included in the test dataset.
In the first experiment, we implemented the original Hu's first moment invariant, Φ1, for face identification of both sets of 2 and 4 test images. In the second experiment, we employed the proposed approach; the first moment invariant (with respect to centroid point obtained from frontal shot source images), for both sets of test images. The performance for both experiments is shown in Fig. 3. Figure 3a shows the Cumulative Match Characteristic (CMC) curves for the first experiment and Fig. 3b shows the CMC curves for the second experiment. We have compared the identification performance of our approach with works done in (Buddharaju et al., 2007). The comparison of performance between these two approaches is shown in Fig. 4.

DISCUSSION
Referring to Fig. 3a and b, experiment conducted on the first set of 2 test images performs better than the second set of 4 test images. This is due to test cases that are close to mid-left and mid-right profiles in the second set of 4 test images may be less accurate, since only frontal shot images are being used as registered images.
Referring to Fig. 4, the CMC curves shows that rank 1 identification for our approach is 92.5%, which exceeds the performance for work done in (Buddharaju et al., 2007) (approximately 83.5%). Albeit our approach demonstrates encouraging performance, the robustness of this approach degrades when health issues are being addressed (fever and flu). At the moment, this matter is considered as the operational limit for this approach.

CONCLUSION
In this study, we had demonstrated a novel approach in infrared image processing. Like many other approaches, conventional pre-processes were employed prior to our approach. For maximum assurance, we employed seeded region growing method for background filtering. A 3-valued threshold were derived and employed for region decomposition. We also introduced the implementation of Hu's moment invariants in multiple thermal regions as a feature to be extracted within each generated thermal regions. Our approach was tested on OTCBVS IRIS IR facial database which is publicly available for download at www.cse.ohio-state.edu/otcbvs-bench/. Classifications are done by employing a minimum distance measurement method between the acquired moment invariant from test and registered IR images. Empirical results obtained show that with proper enhancement of classical methods, the performance of our proposed approach surpasses the performance of works done by Buddharaju et al. (2007) with some operational limitations.