3D Object Recognition using Multiclass Support Vector Machine-K-Nearest Neighbor Supported by Local and Global Feature

: Problem statement: In this study, a new method has been proposed for the recognition of 3D objects based on the various views of the object. The proposed method is evolved from the two promising methods available for object recognition. Approach: The proposed method uses both the local and global features extracted from the images. For feature extraction, Hu’s Moment invariant is computed for global feature to represent the image and Hessian-Laplace detector and PCA-SIFT descriptor as local feature for the given image. The multi-classs SVM-KNN classifier is applied to the feature vector to recognize the object. The proposed method uses the COIL-100 and CALTECH image databases for its experimentation. Results and Conclusion: The proposed method is implemented in MATLAB and tested. The results of the proposed method are better when comparing with other methods like KNN, SVM and BPN.


INTRODUCTION
This study addresses the problem of recognizing 3D objects in images. The 3D object recognition is a prominent research area for last two decades; many researchers were involved in developing real-world object recognition applications. The main objective of the object recognition system is to identify the object, if it is present in the image and to estimate its location. The most difficult part of object recognition is to identify the object when the given image has noise and the presence of the unwanted objects and due to presence of multiple objects. Developing a 3D object recognition that can recognize the object even if there is an occlusion and clutter is a challenging task. Generally 3D object recognition carried out either as view-based or Model based. In model based, during training phase a model library is constructed with the 3D models of objects as features. During testing of model based system, a test image is converted into features and matched with the models available in the model library in order to identify the object (Mian et al., 2006). View based object recognition system creates a model from the objects appearance in 2D image under different angles. In testing phase of view based system, the created model is used to recognize if the target object is available in the image or not. For the recent years, view based object recognition has attracted much attention than model based methods. In this study, a view based 3D object recognition model is proposed as a hybrid of Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) method as classifiers with the local and global features of 2D images as features. The proposed work in this study is an extension of the previous work in object recognition using local and global features on 2D images (Muralidharan and Chandrasekar, 2012). The proposed model of object recognition system is designed to work in two phases, they are training phase and testing phase. During training phase, the images are given as input to the system, the image is preprocessed and the both local and global features are extracted and the feature vector is constructed. The constructed feature vector is stored in the database with the label of the image and the SVM is trained. During Testing Phase, the test image is given to the system, after preprocessing feature vector is constructed by extracting the local and global features of the preprocessed image. Then the classifier is employed to recognize the object.
Literature survey: Bhagat (2004) proposed the use of Hu and Zernike moment invariants as feature vector for classification of desired 3D objects. The hybrid moment method achieves high identification rates compared with those on the view information encoded with network architecture and of the Hu moment invariants applying approximately the same set of objects and decision rule (Bhagat, 2004). Zernike moment invariants are used to find the pose of the object and Hu moment invariant is used to identify the object. Euclidean distance classifier is used to find the closest match of the queried representation and the representations stored in a library for identification of the object. The proposed method provides 99.33% with one view and 100% with three 2D views. Roobaert and Hulle (1999) used subset of COIL-100 image database to compare the performance of Support Vector Machine with different pixel-based input representations. Pontil and Verri (1998) used Support Vector Machine for training and testing the 3D object recognition with a subset of COIL-100 image dataset (consisting of 32 objects). For training the system, 36 images (one for every 10°) for each of the 32 objects and for the testing remaining 36 images of the respective 32 objects were chosen. The experiment is conducted with 20 randomly selected out of 32 objects from the COIL-100, the system achieves perfect recognition rate of 96.00% (Pontil and Verri, 1998). Nayar et al. (1996) used COIL-100 image dataset for the recognition of 3D objects. Also they used parametric eigenspace method to recognize 3D objects directly from their appearance. Several views of same object are chosen as training samples. The eigenvectors are computed from the covariance matrix of the training set . Otoom et al. (2008) and Juan and Gwun (2009) uses SIFT as feature extraction for their study. Otoom et al. (2008) states that the importance of the SIFT keypoints in object identification. Juan and Gwun (2009) for image deformation use PCA-SIFT as feature extraction component. Gao et al. (2007) proves that the nearest neighbor is the best method for classifications of patterns (Gao et al., 2007). Li et al. (2008) proves that the kNN is easier and simpler to build an automatic classifier (Li et al., 2008). Dudani et al. (1977) shows that moment invariants plays vital role in aircraft identification (Dudani et al., 1977). Borji and Hamidi (2007) utilize Support Vector Machine for recognition of Persian Font Recognition. Hsu et al. (2001) suggests Moment Invariants as feature for airport pavement distress image classification (Hsu et al., 2001).
Rajesekaran and Vijayalakshmi Pai proved the use of moment invariant as feature extractor for ARTMAP image classification (Rajasekaran and Pai, 2000). Singh et al. (2010) uses the support vector machine with the local features for classifying the leaf images. Huang et al. (2010) suggests that the support vector machine performs well in identifying micro parts.  He et al. (2007) applies different classifier for global feature and local feature. In his study he used haar-like feature as local feature and edge feature as global. He proposes that the local features play important in license plate detection from a video (He et al., 2007). Lowe proposed the Scale Invariant Feature Transformation (SIFT) descriptor which is invariant to rotation, scaling and translation, it provides good results in detecting previously learned objects in cluttered environment with changes in pose and with partial occlusion (Lowe, 2004). Hasan et al. (2010) constructed a Back Propagation Neural Network for intelligent object detection. He proves BPN provides efficient and accurate results. Also he suggests Principal Component Analysis (PCA) is useful only if accuracy is attained higher than the mere neural network (Hasan et al., 2010). Lin et al. (2006) shows in his study, that BPN can be applied to classify the irregular shapes, also he states with a small number of training iterations, the BPN showed fast and highly accurate classification ability (Lin et al., 2006). Mikolajczyk and Schmid (2004) proposed the Hessian Laplace detector for interest point detection, which is scale invariant and detects blob like patches in the image. Zhang et al. (2006) proposed the SVM-KNN as a classifier for visual category recognition. In his study, he applied the KNN to reduce the number of classes for SVM. Training an SVM on the entire data set is slow, instead of training the entire data set, the entire data set can be reduced by NN and from the reduced data set SVM can be trained easily and efficiently (Zhang et al., 2006). Based on the literature survey done, the proposed 3D object recognition model is designed as in Fig. 1. The feature extraction process is done in two phases, in phase-I local features are extracted from the preprocessed image and in the phase-II global features are extracted. For extracting the local features, Hessian-Laplace detector and PCA-SIFT descriptor is used and for the global features Hu's Moment invariants is used. The extracted features are assembled in such way to construct the feature vector. The classifier used in this study is an improvement of SVM-KNN (Zhang et al., 2006).

Background:
Feature vector construction: Feature vector is considered as collection of important features to identify an object. The feature vector is not a single unit; it consists of number of values computed for the entire image or for a patch of the image (i.e., region of interest). The feature extraction is a process of obtaining the important properties of the image for the purpose of recognizing the object in the image or classifying the image or categorizing the image.
Generally features are categorized into two types; they are local features and global features. The local features are the features extracted from certain part of the image. The global features are computed for the entire image. Many researchers utilized either local feature or global feature for their research work related to object recognition/character recognition/leaf recognition/plankton recognition. Only few works (Lisin et al., 2005;Shabanzade et al., 2011;Muralidharan and Chandrasekar, 2011;Murphy et al., 2006) were carried out using both local and global feature.
Local feature: For object recognition task, the interest point detection is considered as important work in local feature computation. Interest point usually refers to the corners, blobs in an image, where the intensity of the object is high when compared to the background or other objects in the image and they are useful in finding the local features in many solutions to computer vision problems. Through the literature survey, it is identified the following are the familiar interest point detection methods such as Moravec's Corner detector, Harris detector, SUSAN, Libdeberg scale selection theory, Harris/Hessian Laplacian (Mikolajczyk and Schmid, 2004), MSER (Mataz et al., 2002) , SIFT (Lowe, 2004) and SURF (Bay et al., 2008). From the above methods, Mikolajczyk and Schmid (2004) proposed the Hessian Laplace detector for interest point detection is scale invariant and detects blob like patches in the image.
The interest points detected by the hessian-laplace detectors are invariant to rotation and scale changes. Keypoints are localized in space at the maxima of the Hessian determinant (Lindeberg, 1998) and in scale at the local maxima of the Laplacian-of-Gaussian. Hessian-Laplace obtains greater localization accuracy in scale-space and scale selection accuracy. The Hessian matrix also called as Hessian is the square matrix of second-order partial derivatives of a function; that is, it describes the local curvature of a function of many variables. The following is the function so-called Hessian Eq. 1: The detector computes the second order partial derivatives I xx , I xy , I yy , for each image point and then searches for points where the determinant of the of the Hessian Eq. 2 becomes maximal: In this study the Hessian-Laplace blob detector is used for detecting the interest point. Once the interest point is detected the SIFT (Lowe, 2004) is applied to extract the local features. Generally SIFT, has high dimension of 128 features for each interest point detected in the image. To reduce the number of features, PCA is utilized, that reduces the features to 36 numbers. Here in this study, for local feature extraction PCA-SIFT descriptor is used.
Global feature: During, 1970, the geometric moment invariant was introduced by Hu's based on the theory of algebraic invariants (Hu, 1962). Since its inception, it appears to be the most promising and effective feature in representing an image. From the moment the image may be re-constructed. The set of seven moment invariant shown below Eq. 3-9 introduced by Hu's, which is invariant to rotation, scaling and translation.
The Moment invariants are very useful way for extracting features from two-dimensional images (Muralidharan and Chandrasekar, 2011). Moment invariants are properties of connected regions in binary images that are invariant to translation, rotation and scale.

Classifiers: K-Nearest neighbor:
In pattern recognition, the knearest neighbor algorithm is amongst the simplest of all machine learning algorithms. When using k-NN, an object is classified by a majority vote of its neighbors. In general, k-NN algorithm is treated as classification method based on closest training examples in the feature vector. The value of the k is decided based on the size of the data used for classification. If k=1, then the object is simply assigned to class of its nearest neighbor, larger value of k reduce the effect of noise on the classification, but make boundaries between classes less different. K-Nearest Neighbor algorithm (KNN) is part of supervised learning that has been used in many applications in the field of data mining, statistical pattern recognition and many others. KNN is a method for classifying objects based on closest training examples in the feature vector. An object is classified by a majority vote of its neighbors (Li et al., 2008).
To make a prediction for a test example the following steps are followed: • Compute the distance of test vector with all training vectors considered • Find the k closest vectors • Arrange the distance in ascending order and choose the closest label To calculate the distance between two vectors, distance measure like Euclidean distance, cityblock distance, cosine distance, Correlation, Hamming distance, Minkowski metric, Chebychev distance, Hamming distance, Jaccard distance and Spearman distance. The most common distance function is Euclidean distance. In this study k-NN algorithm is used for first stage of classification with Euclidean distance as distance measure. The Euclidean distance formula is shown below Eq. 10: x y x y x y x y where, x and y are points in R m . Support vector machine: Support Vector Machine (SVM), which was first heard during COLT-92, proposed by Cortes and Vapnik as one of the supervised Machine Learning Technique. Since its inception it receives more attention and has achieved very good performance on a range of applications like object recognition, pattern recognition, text classification. Support Vector Machines are used for classification and regression; it belongs to generalized linear classifiers (Chen et al., 2010). The objective of the support vector machine is to form a hyperplane as the decision surface in such a way that the margin of separation between positive and negative examples is maximized by utilizing optimization approach. The SVM starts with training sample ( ) x , y = , where the training vector is x i and its class label is y i . The SVM aims to find the optimum weight vector w and the bias b of the separating hyperplane such that Eq. 11: with, w and the slack variables ξ i minimizing the cost function given below as Eq. 12: where, the slack variables ξ i represent the error measures of data, C is the value assigned to the errors and ϕ(.) is a kernel mapping which maps the data into a higher dimensional feature space. Generally linear functions are used as a separating hyperplane in the feature space. For achieving better performance, several kernel functions are used such as polynomial function and radial-bias function, in this study, Radial-Basis Function Eq. 13 is used as kernel function: ( ) 2 2 x y k x, y exp 2 where, σ is a scalar value. There are two ways to extend the binary SVM to Multi-class classification, one-against-all and oneagainst-one. In one-against-all, a set of k binary SVMs are trained to separate one class from the rest, where k is the number of classes. Each binary classifier is trained on separate training set (i.e., the n th SVM is trained with all samples that belongs to n th class). In the same way all the k-SVMs must be trained and produces k decision functions. The test pattern is classified based on the maximum output among the k-classifiers.
The one-against-one SVM, all possible groups of 2 classes i, j are used to train a corresponding SVM ij . If there exists k classes, then k (k-1)/2 SVMs and gets same set of decision functions. For a test pattern, all the binary SVMs involved in a voting strategy to decide which class it belong to. Among the two approaches of multi-class SVM, there is no theoretic proof that which kind of SVM is better and they are selected based on the trail-and-error basis.

Proposed method;
In this study the proposed method is designed as in Fig. 1. The objective of the proposed method is to recognize the 3D object. For recognition purpose, SVM-KNN is used as classifier supported by the local and global feature. The local feature extracted from the given image is Hessian-Laplace detector along with PCA-SIFT descriptor and the global feature extracted is Hu's Moment Invariant. Both the local and global features used in this study have invariance property.
The proposed method of 3D object recognition is given below.
Training Phase: Step 1: Training images are selected and placed in the folder.
Step 2: Read the training images.
Step 3: Pre-process each image by reducing the image size to 100×100 and removes the noise and converts the color image into grey-scale image and apply Canny's edge detection algorithm.
Step 4: Local feature of the image is computed by applying Hessian-Laplace detector and PCA-SIFT descriptor (36 features are computed).
Step 6: Feature vector construction by aligning the local and global features of the image as row in a matrix.
Step 7: Repeat steps 2 to Step 6 for all the training images.
Step 8: The KNN and the SVM (one-against-one and oneagainst-all) are trained and tuned for testing phase.
Testing Phase: Step 1: Read the test images.
Step 2: Process the steps 3 through 6 as in training phase.
Step 3: KNN is applied first. The nearest neighbors are identified using the Euclidean distance function using the training data.
Step 4: If the K neighbors have all the same labels, the query is labeled and exit; otherwise, compute pair wise distances between the K neighbors and construct the distance matrix.
Step 5: Using the Kernel trick method, the distance matrix is converted into kernel matrix, later it can be applied to multiclass SVM for classification.
Step 6: Both the one-against-one and one-against-all multiclass SVMs employed for classification / recognition separately.
Step 7: Classified object and the label are displayed.    Experimentation: The proposed method of recognizing the 3D object through view-based system by combining the local and global feature using SVM-KNN is implemented in MATLAB 7.5 and with the images of COIL-100 database  and CALTECH-101 database (Fei-Fei et al., 2004). COIL-100 database consists of images of 100 different objects with black background; each one is rotated with 5 degree angle interval in vertical axis. Hence for every object there are 72 images, which sum up to 7200 images for the whole database. The CALTECH 101 dataset (Gao et al., 2007) consists of images of 101 object categories. The significant variation in appearance, color and lighting makes this database challenging for object recognition and detection process.
To experiment the proposed method, the data set of COIL-100 and CALTECH 101 into two parts one for testing and another one for training. When selecting the images in COIL-100 for training, the images of 15,30,45,60,75,90,105,120,135 and 150° of a particular object are considered and for the testing the images of 5, 20, 35, 50, 65, 80 and 95° of the object. The proposed method, SVM, KNN and BPN were experimented using COIL-100 data set, the 1000 images (10 images of 100 objects) were selected and trained and tested with 700 images. As a pre-processing step, the image considered for training/testing is reduced to 100×100 sizes for the both data sets. And then the Canny's Edge detection step is performed to extract the important edges of the image. From the edge detected image the local and global features are extracted. The classifiers are trained and tested with the test image. The set of test images considered for the experimentation is given below in Fig. 2. Table 1 and Fig. 3 provide the performance of the proposed classifier and the traditional classifier considered for 3D object recognition for COIL-100 data set. Table 2 and Fig. 4 show the performance of the classifiers for the CALTECH-101 dataset.
The classifiers considered were tested with various types of features like local features, global features and combination of local and global features in order to prove the efficiency of the combining the local and global features. From the above experimentation results, it is also shown that the proposed method is giving better result when combining the local and global features of the image.

CONCLUSION
In this study, the 3D object recognition model is proposed. The model uses local and global features as feature vector for the SVM-KNN classifier. Hessian-Laplace detector and PCA-SIFT descriptor were used as local feature and Hu's moment invariant is used as global feature. The KNN classifier is applied first to identify the closest object from the trained features, if there is no match; multiclass SVM is performed to identify the object. In the proposed model, first KNN is employed to reduce the number of classes for SVM, among the one-against-one and one-against-all SVMs classification, one-against-one SVM provides better result. From the Table 1 and 2, it is shown that the combining of local and global feature provides better results; also the proposed SVM-KNN classifier has greater accuracy than the traditional methods like SVM, KNN and BPN. The proposed SVM-KNN uses Radial Basis Function as kernel function. Future work will include the process of increasing the efficiency by adding more features to recognize the 3D object.