SOFT CLUSTERING BASED EXPOSITION TO MULTIPLE DICTIONARY BAG OF WORDS

Object classification is a highly important area of computer vision and has many applications including robotics, searching images, face recognition, aiding visually impaired people, censoring images and many more. A new common method of classification that uses features is the Bag of Words approach. In this method a codebook of visual words is created using various clustering methods. For increasing the performance Multiple Dictionaries BoW (MDBoW) method that uses more visual words from different independent dictionaries instead of adding more words to the same dictionary was implemented using hard clustering method. Nearest-neighbor assignments are used in hard clustering of features. A given feature may be nearly the same distance from two cluster centers. For a typical hard clustering method, only the slightly nearer neighbor is selected to represent that feature. Thus, the ambiguous features are not well-represented by the visual vocabulary. To address this problem, soft clustering model based Multiple Dictionary Bag of Visual words for image classification is implemented with dictionary generated using modified Fuzzy C-means algorithm using R1 norm. A performance evaluation on images has been done by varying the dictionary size. The proposed method works better when the number of topics and the number of images per topics are more. The results obtained indicate that multiple dictionary bag of words model using fuzzy clustering increases the recognition performance than the baseline method.


INTRODUCTION
One of the most important and challenging problem in machine vision is retrieving images from a large and highly varied image data set based on visual contents. In the present scenario there are currently several smart phone applications that allow the users to take a photo which has led to the rapid growth in the number of digital image collections. Automatic classification of images will be helpful in efficient search and management of these large collections of images. A new method of classification that uses features is the Bag of Words (Lazebnik et al., 2006) approach. This is an idea that solves the problem of recognition with an approach starting from visual features and not from segmentation. The first step in classifying images using Bag of Words is creating a codebook of visual words. For this features are extracted using detectors or dense sampling and descriptors are calculated at each and every local keypoints extracted. For local feature detection, classic detectors include Harris detector (Harris and Stephens, 1988) and its extension (Tuytelaars and Gool, 2004) and many more. For local feature description, local descriptors such as Haar descriptor (Viola and Jones, 2001), Scale-Invariant Feature Transform (SIFT) descriptor (Lowe, 2004), Histogram of Gradients (HOG) descriptor (Dalal and Triggs, 2005) and Speeded Up Robust Feature descriptor (SURF) (Bay et al., 2006) are commonly used.
In this study Bag of Words model has been implemented for visual categorization of images using Harris corner detector for extracting features and Scale

JCS
Invariant Feature descriptor (SIFT) for representing the extracted features. After obtaining local features called descriptors, a codebook is generated to represent them. The overall performance of BoW depends mainly on the dictionary generation method and therefore in this implementation the method of generation of the dictionary of visual words is being focused. A novel method using Multiple Dictionaries for BoW (MDBoW) (Aly et al., 2011a,b) using soft clustering algorithm Fuzzy C-means with R1norm  which uses more visual words is implemented. This method significantly increases the performance of the algorithm when compared to the baseline method for large scale collection of images which uses Bag of Words method. In baseline method, more words are added to the same dictionary whereas in MDBoW more words are taken from different independent dictionaries. The resulting distribution of descriptors is quantified by using vector quantization against the pre-specified codebook to convert it to a histogram of votes for codebook centers. K Nearest Neighbor algorithm (KNN) is used to classify images through the resulting global descriptor vector.

Methods of Soft Clustering
In traditional bag of words that uses hard clustering a given feature may be nearly the same distance from two cluster centers and the slightly nearer neighbor is selected to represent that feature in the term vector. Thus, the ambiguous features are not well-represented by the visual vocabulary. To address this problem, in this study soft clustering methods are used to construct the codebook.

Fuzzy C-Means
Given the data set X= {x 1 , x 2 , x 3 ,……,x N }, choose the number of clusters 1< c < N, the weighting exponent m >1, the termination tolerance є >0 and the norminducing matrix A. The fuzzy C-means clustering (Cannon et al., 1986) algorithm is based on the minimization of an objective function called C-means functional given by Equation (1) Where Equation 2: Subject to the condition Equation 3: For all value of k.

Steps for Fuzzy C-means Algorithm:
The following are the steps to be followed for implementation of the algorithm. U is the fuzzy partition matrix. The i th column of U contains values of the membership function of the i-th fuzzy subset of X. U (0) is the initial partition matrix. Initialize the partition matrix randomly, such that U (0) ∈ M fc . X= {x 1 , x 2 , x 3 ,……,x N } is the given data set and v = (v 1 , v 2 . . . . . v c ) are the vectors of centers. C is the number of clusters in X.
Compute the cluster prototypes (means) Equation 4: For l = 1, 2, 3, ….. .where v i is the cluster centers calculated using the membership function.
Compute the distances Equation 5: where, A = I for Euclidean Norm and 2 ikA D is the distance matrix containing the square distances between data points and cluster centres.
Update the partition matrix Equation 6: The result of the partition is collected in structure arrays. ϵ is the maximum termination tolerance and m is the fuzziness weighting exponent.

Modified Fuzzy C-Means
In the existing Fuzzy C means algorithm the objective function is defined in terms of mean squared error. In the proposed method instead of taking mean squared error the objective function is defined in terms of root mean squared error using R1 norm . The root mean squared error is more Science Publications JCS sensitive than other measures to the occasional large error and the squaring process gives disproportionate weight to very large errors. In matrix form X = (x ik ), index k sum over spatial dimensions, i = 1, · ·, c and index k sum over data points, k = 1, · · ·, N R1-norm is defined as Equation 7: It has been proved that R1-K-means performs slightly better than standard K-means (Ding et al., 2006).
The cost function to be minimised is given by Equation 8 and 9: where, V = {v 1 ,v 2 ,…..v c }, N is the number of classes and m is the smoothing parameter which controls fuzziness. When m = 1, µ ik = 0 or 1 and it is hard partition as m increases the partition becomes more fuzzy.

Steps for Modified Fuzzy C-means Algorithm
The following are the steps to be followed for implementation of the algorithm .Given the data set X, choose the number of clusters 1< c < N, the weighting exponent m >1, the termination tolerance є >0 and the norm-inducing matrix A. U is the fuzzy partition matrix. The i th column of U contains values of the membership function of the i-th fuzzy subset of X. U (0) is the initial partition matrix. Initialize the partition matrix randomly, such that U (0) ∈M fc .
X= {x 1 , x 2 , x 3 ,……, x N } is the given data set and v = (v 1 , v 2 . . . . . v c ) are the vectors of centers. c is the number of clusters in X. The objective function J is to be minimised such that the root mean squared error between the original vectors and the reallocated centers is minimised.
Compute the cluster prototypes (means) Equation 10: For l = 1,2…. where vi is the cluster centres calculated using the membership function.
Compute the distances Equation 11: where, A = I for Euclidean Norm and 2 ikA D is the distance matrix containing the square distances between data points and cluster centres.
Update the partition matrix Equation 12: The result of the partition is collected in structure arrays. ϵ is the maximum termination tolerance and m is the fuzziness weighting exponent.

Baseline Method
In baseline method, features are extracted using Harris corner detector and SIFT descriptor is used for representing the extracted features. The extracted feature pool is then clustered using the modified FCM to get a codebook with predefined number of visual words. Features extracted from training images are assigned to the nearest code in the codebook. The image is reduced to the set of codes it contains, represented as a histogram. The normalized histogram of codes is exactly the same as the normalized histogram of visual words. The k closest points from training data is found in testing phase, for the test data point and classification is done using KNN classifier.

Multiple Dictionary Bag of Words Model
Multiple Dictionaries for BoW (MDBoW), which uses more visual words, have significantly increased the performance of classification of images from a large and highly varied image data set. In MDBoW model implemented in this study, features are extracted using Harris corner detector and SIFT descriptor is used for representing the extracted features. In multiple dictionary generation from each dictionary D N which is generated with a different subset of the image features each training image gets a histogram h N from every dictionary D N which is concatenated to form a single histogram h. Every feature gets N entries in the histogram h, one from every dictionary. In this approach, more words are taken from different independent dictionaries where as in base line method more words will be taken from same dictionary. Thus multiple dictionary method has less storage than baseline approach. In this study Separate dictionary implementation of Multiple Dictionaries for BoW (MDBoW) is implemented using modified FCM.

Steps for Separate Dictionary Generation
Figure1 shows the schematic of Separate dictionary implementation: • Generate N random possibly overlapping subsets of the image features • Compute a dictionary D N independently for each subset S N using the modified FCM. Each dictionary has a set of K N visual words • Compute the histogram. Every image feature gets its visual word from every dictionary D N . Accumulate these visual words as individual words into individual histogram h N for each dictionary. The final histogram is the concatenation of the individual histograms This process of histogram construction is done during the training and the testing phase of the algorithm. The KNN classifier then finds the k closest index and gives the classification result.

RESULTS AND DISCUSSION
The effect of variation of different parameters and performance evaluation of MDBoW approach for image classification is done in terms of Micro Precision, Macro Precision, MicroF1-measure, MacroF1-measure and Accuracy rate (Al-Salemi and Ab Aziz, 2011) for eight different topics namely burger, spaghetti, egg, spoon, bottle, can, coffee pot and mug from dataset created from Google images. The dataset is created for real time application for visual recognition of objects for a humanoid used in restaurant environment. The images in the dataset used can be categorised as tiny images. The performance measures used in this evaluation are Equation 13-18:       In these equations TP indicates true positive, FP false positive, FN false negative and TN true negative of the classification result. For the modified Fuzzy C means and FCM the parameter m = 1.7 and stop condition ϵ = 0.001. The test data set includes eight different topics each containing 50 images. 200 images per concept were used to build the codebooks. The classifier is trained for another 200 images from each topic. The number of dictionaries formed randomly is varied from 1 to 5 and the word per dictionary is varied from 80 to 200. The distance measure used is Euclidean distance. The sample images from dataset are as shown in Fig. 2. Figure 3-7 shows the variation of accuracy rate with words per dictionary by varying the number of dictionary generated randomly from 1 to 5 which is named as Dictionary1, Dictionary2, Dictionary3, Dictionary4 and Dictionary5. In both baseline method and Multiple Dictionary Bag of Words model the clustering of words are done using modified Fuzzy C means soft clustering algorithm using R1 norm. The results obtained are compared with the Multiple Dictionary Bag of Words model with FCM .
As the number of words per dictionary is increased from 80 to 200, accuracy increases and reaches a maximum for a particular value of word per dictionary and then reduces. The results obtained shows that Multiple Dictionary Bag of Words model using modified Fuzzy C means soft clustering algorithm using R1 norm gives the maximum accuracy rate for words per dictionary of 160 and it is more than baseline and MDBoW with dictionary formed using FCM. As the number of dictionaries generated increases the classification accuracy rate increases and then for a given number of dictionary the method gives maximum measure and then reduces.
The results projected in Table 1-5 shows that Multiple Dictionary Bag of Words model using Separate dictionary and dictionary generated using modified FCM with R1 norm shows better performance than baseline method and MDBoW using FCM. The results obtained shows that the method gives maximum accuracy rate for word per dictionary of 160 and number of dictionary 3. Table 2 shows the variation of accuracy rate for word per dictionary 160 for various numbers of dictionaries. The accuracy rate increases as the number of dictionary is increased from 1 to 5. The parameters Macro

Precision, Micro Precision, Micro F1 and Macro F1
shows better values for Multiple Dictionary Bag of Words with modified FCM. The results obtained validate that MDBoW performs better for datasets having large number of classes and more number of images per topics. Macro-averaging gives an equal weight to each category and is often dominated by the systems performance on rare categories. Micro-average is a useful measure when dataset varies in size and gives an equal weight to each document and is often dominated by the system's performance on most common categories. Macro-average method can be used to analyse how the system performs overall across the sets of data.

CONCLUSION
In this study, the performance of Multiple Dictionary Bag of Words model with code book generated using modified FCM with R1 norm in the objective function is investigated. The analysis is done by varying the words per dictionary and also the number of dictionaries generated. It is compared with the base line method and MDBoW with FCM for dictionary generation. In base line method more words will be taken from same dictionary where as in this approach, more words are taken from different independent dictionaries. It is seen that the method works better when the number of topics and the number of images per topics are more.