A NEW APPROACH OF LOCAL FEATURE DESCRIPTORS USING MOMENT INVARIANTS

Moment invariants have been widely introduced in recognizing planar objects for a few decades. This is due the robustness of moment function in distinguishing the original identity of object under various two Dimensional (2D) transformations. A set of moments computed from a planar images, represents the global description of an object’s shape and geometrical features of an image. Since global descriptor utilizes the information of a whole object or shape to describe the features of an object, it does not tolerate occlusion. If there is a mixture of regions that do not belong to the object of the interest, an additional task of segmentation is required to isolate the object for recognition. Hence, moment invariants are proposed to be employed as local descriptors for object recognition since local descriptors do not suffer from the drawbacks caused by image clutter and occlusion. A new approach of local feature descriptors using moment invariants is presented. The preliminary framework is divided into three different stages. Interest points are firstly detected in the entire image. The local descriptors are then produced by applying moment invariants on the region around the interest points. Cross-correlation is finally carried out for feature matching.


INTRODUCTION
Since the introduction of geometric invariants in 1962, moment invariants have been applied in object recognition, shape analysis, image description and matching (Flusser et al., 2009). The invariants are able to provide descriptive information of an object for distinguishing its identity from another object. Although the object undergo 2D transformations (translation, scale, rotation and skew), the identification task remain invariance. Due to the promising result, moment invariants are further extended to new areas, such as hand gesture recognition, image registration, fingerprint verification, image retrieval and action classification (Almoosa et al., 2008;Chen et al., 2013;Costantini et al., 2011;Li et al., 2012).
Moment function interprets an object (in a 2D image) as a 2D intensity distribution, which provides global features of an object: Total area, coordinates of centroid and orientation. The performance of global features is seriously affected when region of the object is partially occluded by another object. This situation is commonly happen in natural images, where multiple objects are mixed in a scene. Segmentation task has been widely used to overcome this limitation by separating the object of interest from the scene. The segmented region, however, might not represent the intensity distribution of the whole object when partial occlusion took place.
Meanwhile, local features do not suffer from the drawbacks caused by image clutter and occlusion. A local feature is an image pattern extracted from a particular region of an object. It represents the Science Publications JCS descriptive information that is associated with the change of intensity distribution in the image pattern (Shvarts and Tamre, 2012). Local features are normally extracted from the region around the key points within an object. These features are distinctive and recognizable even though parts of the object are occluded. Therefore, geometric moment invariants are proposed to be employed as a new approach of local descriptors. This approach is able to maintain the ability of moment invariants in providing unique and distinguishable features in a natural image or sequence of frames.
This paper presents a preliminary framework on selecting the feature points, formulating the invariance feature descriptors and matching descriptors in a sequence of consecutive frames. The proposed framework is divided into three different stages. Interest points are firstly detected in the entire image. The local descriptors are produced by applying moment invariants on the region around the interest points. Crosscorrelation is finally carried out for feature matching.

Related Works
In the history of object recognition, many earlydeveloped approaches are based on global features. Moment invariants are one of the earliest and widely used methods. The main reason is due to the robust performance of invariance property in different transformations. Indeed, the recognition rate of global features is affected when foreground objects are mixed with background scene in natural images (Tuytelaars and Mikolajczyk, 2008). In the current world that equipped with video surveillance system, there is a tremendous input of natural images sent for daily recognition task. Global features are no longer sufficient for recognizing object that is partially occluded or part of object is out of the field of vision.
In order to overcome this limitation, a few regions (blobs) with reliable description are extracted from the image. The extracted regions contain descriptive information that is corresponding to different subparts of the image. A string of vectors is then formed with the description of blobs. The recognition is performed by matching similarities between subparts of a foreground object even in changing background and partial occlusion (Krolupper and Flusser, 2007). One of the famous approaches in extracting local features is SIFT. Lowe (2004) method transforms an image into a multi-scale sampling of image patches centered on the interest points. Each of the feature vectors is invariant to image scale and rotation. Lowe suggested four stages of filtering method for SIFT, which includes scale-space extrema detection, keypoint localization, orientation assignment and keypoint descriptor. The resulting features are used by nearest-neighbors algorithm to identify the best-matched object in an image. Since SIFT is able to generate a large number of local features, object is still recognizable in substantial level of occlusion.
On the other hand, some researchers introduced the combination of both independent algorithms from local features detection and description. The most recent combinations between FAST detector with BRIEF descriptor or BRISK descriptor offer a much more suitable alternative for real-time applications (Miksik and Mikolajczyk, 2012). This is due to the outstanding result of FAST detector in several comparison studies (Rosten and Drummond, 2006;Miksik and Mikolajczyk, 2012;Senst et al., 2012). As compared to other existing detectors, FAST feature detector achieves a nearly constant of 2 ms runtime per image with respond to an increasing number of features. Rosten and Drummond (2006) have proven that FAST-9 is the most reliable detector with shortest runtime and low processing power. A fast and reliable detector is definitely in need for producing an efficient combination with feature descriptor.

PROPOSED FRAMEWORK AND METHODOLOGY
From the discussion of Shvarts and Tamre (2012), local feature is the descriptive information selected from a specific region of an object to avoid drawbacks of image clutter and occlusion. The proposed framework of formulating a set of suitable local features with the moment invariants function is shown in Fig. 1. A set of feature points is initially selected from an input image. The invariance descriptors are then formulated from the neighborhood region of each feature point, which indirectly build up a unique identity for an object. Since every object is recognized with a unique descriptor, it can be used to locate an object in the consecutive video frames.

Methodology of Feature Points Detection
There are several types of local invariant features discovered by researchers in decades ago. Image properties, such as points, edge or small image Science Publications JCS patches are extracted as a local feature and converted into descriptors (Tuytelaars and Mikolajczyk, 2008). Several algorithms have been continuously developed in the early literature of image processing, for finding corner points at the extrema of functions computed on the shape. Since 1980, Moravec introduced corner detection algorithm for robot navigation. The algorithm was further improved by Harris and Stephens (1988;Shi and Tomasi, 1994) for invariance detection. More simple and efficient algorithms, such as SUSAN and FAST are recently developed for improving the computational time of corner detection (Smith and Brady, 1997;Rosten and Drummond, 2006).
The algorithm of FAST detector is built on the basic concept of SUSAN detector. According to Rosten and Drummond (2006), a corner point can be concluded if there is sufficiently large set of pixels in the circular neighborhood, significantly brighter or darker than the central point. A circle of 16-pixels is initially formed by a fixed radius around the central point, as illustrated in Fig.  2. The selected pixel from north (1), south (9), east (5) and west (13) locations of the circle are compared with a threshold for classifying brighter, similar and dark categories. If there are at least three of the pixels brighter or darker than the threshold, the central point is concluded a corner point. Otherwise, the test criterion is continued to be applied on the remaining pixels in the circle.

Invariance Features from Moment Function
A set of moments computed from a planar images, represents global description of the object shape and geometrical features of the image. When applying to images, simple properties of the image which include area of an image, centre of mass and orientation information can be found via moment functions. The properties of an image can be generated from a geometric moments with the general definition given as: , , 0,1, 2,3,...
The moment function in Equation 1, G of order (i+j), consists of monomial functions in the image region of ζ, for 2D density distribution, f(x, y). Geometric moments were the first moment function that was used to derive a set of invariant descriptors. Hu (1962) presented a set of invariant descriptors from geometric moments. The presented set is able to recognise images, no matter in translation, scaling and rotation transformations. Since then, Hu's publication has been extensively referenced in nearly all moment related literature for the past few decades.
In order to achieve translation and scale invariants, geometric moments are defined with respect to the image centroid (x 0 , y 0 ) as the origin, i.e., Equation 2 An example of feature descriptor that is generated from the translation, scale and rotation invariants for a sample image is illustrated in Table 1. The sample image consists of an alphabet 'F' with size 100×100 pixels. Based on the result of feature descriptor computed from Equation 4, only minor variation that is less than 0.001 occurred among the same invariance function in different transformations. The feature descriptor of the sample image is further compared with other images, by using the same alphabet with the almost similar font types and recorded in Table 2. The result shows a larger difference between the images of similar font types, as compared to Table 1. Therefore, geometric moment invariants is capable of building a unique identity of a specify image although it undergoes several geometrical transformations.

Formation of Proposed Local Descriptor
Once a feature point has been detected, local descriptor is formulated from the neighborhood of the feature point.

Methodology of Feature Matching
The methodology of matching the proposed local descriptors among frames is shown in Fig. 3. After the invariance descriptors of all feature points are formulated from Equation 6, the descriptors are ready to be used for discovering the matching pairs from consecutive frames. In order to determine the matching pairs, the descriptors ID(p) of a set feature points from the previous frame, Fr(n-1) have to be related in a certain criterion with the target feature sets from the current frame, Fr(n). The linear correlation coefficient is chosen to measure the association between the descriptor sets, ID(p) from Fr(n-1) and Fr(n). Based on the Pearson's correlation coefficient, r from Equation 7, the pair of descriptors that experienced the positive coefficient closest to 1 indicates a strong association between the descriptors. Thus, the pair of descriptors with largest coefficient is shortlisted as matching pairs.
Instead of using all the shortlisted pairs, only those highly reliable pairs are remained to improve the matching performance. An efficient way for evaluating matching pairs as the reliable pairs is by using RANdom Sample and Consensus (RANSAC) algorithm. RANSAC algorithm estimates the possible homographies that elaborate the relation between descriptor pairs in different frames (Hartley and Zisserman, 2004). During the estimation, the less reliable pairs or considered the outliers are rejected. This iterative method ends up with homographies that are estimated from the inliers.

RESULTS AND DISCUSSION
The proposed framework is tested with a sequence of four sample frames, where each frame consists of 512×512 pixels. This sequence of images is retrieved from Hartley and Zisserman (2004), as shown in Fig. 4. The sample grey-scale images are captured at the corridor of a building and experienced several transformations, which include translation, scale and rotation.
Based on the proposed framework in Fig. 1, feature points are initially detected from each frame with FAST-9 detector. The results of detected feature points are highlighted in the image sequence, as shown in Fig. 5. It is noted that mostly of the feature points from the current frame are still detectable in the next frame although it involved transformation changes. Invariance feature descriptors are subsequently formulated from the neighborhood of each feature points. The neighborhood area of 10×10 pixels within the feature point is selected for descriptor computation. In order to find out the matching keys, the set of invariance descriptors from the current frame is correlated with the descriptor from the next frame. The largest coefficient represents the stronger matching pairs but not necessary the reliable matching pairs. Fig. 6 shows the output of matching pairs selected from the largest coefficient. However, some less reliable matching pairs are not associated to the correct points in the latter frame. An iterative method, RANSAC is used to estimate a suitable model of homography between the descriptor pairs. At the same time, the less reliable pairs that have been considered the outliers of the model would be rejected. Fig. 7 shows the result of the finalized reliable matching pairs across 2 consecutive frames. The performance of the feature matching is evaluated by obtaining the percentage of correct feature matching between two frames. The details of total matched points, false matching and percentage of matching accuracy are listed in Table 3. The result of feature matching in two consecutive frames, such as frame 1 and 2, frame 2 and 3 and frame 3 and 4, have reflected a promising matching result of 88 to 90%. In addition, feature matching in alternate frames (frame 1 and 3 and frame 2 and 3) have also achieved the result of 83% and above. These verified the usefulness of geometric moment invariants as local feature descriptors. An additional testing is done for feature matching in the situation of two and more missing frames. However, the matching accuracy reduced tremendously. In the situation between frame 1 and frame 4, there are several obvious transformations (rotate and zoom in) took place. It caused the huge changes in feature point detection and descriptor formulation, which leads to the increment of false matching.

CONCLUSION
A new approach of local feature descriptors using moment invariants is presented and tested in the proposed framework. The proposed descriptors are invariant to changes in scale, rotation and translation in consecutive frames and also alternate frames. This approach can be served as a new contribution for features tracking in image warping, locating moving objects in surveillance video and indoor robot navigation system.