Object Recognition Based on Image Segmentation and Clustering

: Problem statement: This study deals with object recognition based on image segmentation and clustering. Acquiring prior information of an image is done via two separate processes. Approach: The first process deals with detecting object parts of an image and integration of detected parts into several clusters. All these cluster centers form the visual words. The second process deals with over segmenting the image into super pixels and formation of larger sub region using Mid-level clustering algorithm, since it incorporates various information to decide the homogeneity of a sub region. Results: The outcome of the two processes are used for the similarity graph representation for object segmentation as proposed. In order to model the relationship between the shape and color or texture matrix representation has been used. Mask map ensures that the probability of each super pixel to harp inside an object. Conclusion: The basic whim is to integrate all the priors into an uniform framework. Thus the ORBISC can handle size, color, texture and pose variations better than those methods that focus on the objects only.


INTRODUCTION
Object segmentation is one of the fundamental problems in computer vision. Its goal is to segment an image into foreground and background, with the foreground solely containing objects of a class and background contains different attributes like color, texture. For this two algorithms are used namely supervised and unsupervised clustering algorithms. According to user feedback, grouping is done in supervised clustering. Object recognition system in practical needs to handle larger number of classes and objects, it is necessary that the learning does not involves any user feedback. This makes unsupervised clustering more appear in practical system. In order to increase the robustness of the system, Similarity Dependence Graph (SDG) is used. As it integrates object recognition and object segmentation into an commixed process.
The inputs to the similarity dependence graph are the outcomes of the two separate processes. The first process involves detecting the object parts of an image and integration of detected parts into several clusters. The second process deals with over segmenting the image into super pixels. The vertices of the similarity graph represent the super pixel and object parts of an image. These vertices are connected by directed and undirected edges. The directed edge represents the dependence between the entities (for recognition) whereas the undirected edge represents the similarity between the entities (for segmentation). The matrix that can be formed using the graph is known as mask. In mask map, there is difficulty in directly applying shape priors to super pixels and color or texture priors to object parts, because object parts are square while super pixels are irregular (Sheng and Qi-Cong, 2009). In order to overcome these difficulty SDG is used.

Related works:
Clustering is a process of organizing the objects into groups based on its attributes. Clustering techniques can be classified into supervised that demands human interaction to decide the clustering criteria and the unsupervised clustering that decides the clustering criteria by itself. Supervised includes hierarchical approaches such as relevance feedback techniques and unsupervised includes density based clustering methods. An image can be grouped based on keyword (metadata) or its content (description). In keyword based clustering, a keyword is a form of font which describes about the image keyword and its different features. The similar featured images are grouped to form a cluster by assigning value to each feature.
In content based clustering, a content refers to shapes, textures or any other information that can be inherited from the image itself. The tools, techniques and algorithms that are used, originate from fields such as statistics, pattern recognition, signal processing. Clustering based on the optimization of an overall measure is a fundamental approach explored since the early days of pattern recognition.
These clustering techniques are done to perform image segmentation. Where image segmentation refers to the process of partitioning a digital image into multiple segments (based on pixels). It is a critical and essential component of image analysis system. The main process is to represent the image in clear way. Real world image segmentation problems (Shirakawa and Nagao, 2009) actually have multiple objectives such as minimize overall deviation, maximize connectivity, minimize the features or minimize the error rate of the classifier.
Image segmentation is a multiple objective problem (Saha and Bandyopadhyay, 2010). It involves several processes such as pattern representation, feature selection, feature extraction and pattern proximity. Considering all these objectives is a difficult problem, causing a gap between nature of images. To bridge this gap multi-objective optimization approach is an appropriate method (Guliashki et al., 2009). The objective of image segmentation is to cluster pixels into salient image regions i.e., regions corresponding to individual surfaces, objects or natural parts of objects.

MATERIALS AND METHODS
A segmentation might be used for object recognition, image compression, image editing. The quality of the segmentation depends upon the digital image. In the case of simple images the segmentation process is clear and effective due to small pixels variations, whereas in the case of complex images, the utility for subsequent processing becomes questionable. In order to make the system practically implementable there is a need for unsupervised object segmentation, which does not demand any human interaction.
Images can be represented graphically either discrete or continuous, a discrete one is based on pixels and a continuous one is based on points in a plane. Graph based representations are mainly used in image analysis to represent irregular structures. Graphs were used in segmentation, shape matching, video action description, technical drawings, fingerprint recognition,. By using graphical representation we can conveniently achieve simultaneous segmentation and recognition by integrating both top-down and bottomup information into a unified process. Implementation: Superpixel formation: An image can be grouped based upon its properties such as brightness, color, pixel value. By converting pixels into superpixel groups reduces the computational cost and complexity (Achanta et al., 2010). The main aim of the superpixel formation is to achieve oversegmentation. Superpixel should be local, coherent and preserve most of the information necessary for segmentation. The value of the superpixel is the average of the all the pixel values. Each superpixel should be unique and provide accuracy improvements. The pixel difference within a superpixel should be minimum whereas the difference between two different superpixels should be maximum.
It is computationally efficient in reducing the complexity of images from hundreds of thousands of pixels to only a few hundred superpixels. It is also represents pair wise constraints between units efficiently, while only for adjacent pixels on the pixelgrid, can now model much longer-range interactions between superpixels. The superpixels are perceptually meaningful whereas such superpixel is a perceptually consistent unit, i.e. all pixels in a superpixel are most likely uniform in, say, color and texture.It is nearcomplete because superpixels are results of an oversegmentation, most structures in the image are conserved. There is very little loss in moving from the pixel-grid to the super pixel map. The Novel Recursive Clustering Algorithm is used to over segment the image into super pixels.
NRCA: Novel Recursive Clustering Algorithm is a gradient ascent based algorithm. It refines the cluster center value at each iteration.Our algorithm takes as input a desired number of approximately equally-sized superpixels K. For an image with N pixels, the approximate size of each super-pixel is therefore N=K pixels. For roughly equally sized superpixels there would be a superpixel center at every grid interval S = Sqrt(N/K).
At the onset of our algorithm, we choose K superpixel cluster centers C k = [R k ; G k ; B k ;]^T with k = [1,K] at regular grid intervals S. Since the spatial extent of any superpixel is approximately S^2, we can safely assume that pixels that are associated with this cluster center lie within a 2S X 2S area around the superpixel center on the xy plane.This becomes the search area for the pixels nearest to each cluster center.Euclidean distances in CIELAB color space are perceptually meaningful for small distances (Eq. 1). If spatial pixel distances exceed this perceptual color distance limit, then they begin to outweigh pixel color similarities (resulting in superpixels that do not respect region boundaries, only proximity in the image plane). Over segmentation: Over segmentation is the process by which the objects being segmented from the background are themselves segmented or fractured into subcomponents. It also refers to when the image is intentionally broken up into hundreds or thousands of small segments, more than necessary to perform segmentation into the objects in the scene. Typically they are uniform in size and commonly referred to as super pixels. It increases the chances of extracting important boundaries. It does so at the cost of creating many insignificant boundaries. In this case, prefiltering techniques should be used in an attempt to eliminate noise, improve inter object definition or smooth image textures, all of which might cause segmentation difficulties. If these techniques are not sufficient, grouping process is used following over segmentation in order to reassemble the objects into singular image events.
The effect of using larger segments is to increase the area of support, which usually improves the reliability and accuracy of pixel correspondence. Using segments, correct matches are possible even in the presence of noise, intensity bias, or slight deviations. The segment size needs to be at a trade-off point where the amount of information within a segment is sufficient for matching without compromising the characterization of the true disparity distribution.
If a segment is too small, it is difficult for it to unambiguously find the correct pixel correspondence. As a result, some mechanism for using information from neighboring segments is typically required to reduce the ambiguity. The use of over-segmentation strikes a good balance between providing segments that contain enough information for matching and reducing the risk of a segment spanning multiple objects (Shirakawa and Nagao, 2009;Guliashki et al., 2009). The use of over-segmentation also reduces the computational complexity of the algorithm, since disparities only need to be estimated per-segment rather than per-pixel. Given the smaller size of the segments, more information needs to be shared between segments to find correct correspondences than other segmentation approaches. However, more confidence can be placed in simple matching functions than with single pixel approaches.

Algorithm II:
Input: Superpixel image of n x n resolution (1): For each cluster C i For all C j adjacent to C i If (distance (C i , C j ) <threshold)) Then change values of C j to C i (2): End For Output: Display oversegmented image Graph formation: In graph based algorithms each superpixel is treated as a node in a graph and edge weight between two nodes are set proportional to the similarity between the two superpixels (Yu et al., 2002;Felzenszwalb and Huttenlocher, 2004).Let G=(V,E) be an undirected graph with vertices v i € V, represents a set of object parts and edges v i ,v j) € E, represents the similarities between the superpixels.Each edge (v i , v j ) € E, has a weight value dis((v i ,v j )) proportional to similarities between v i and v j and also it is non negative measure (Taghouti and Mami, 2010;Thilagamani and Shanthi, 2011a;2011b). The dissimilarity is calculated based on the color, motion,location,or some other attributes.
The relationship can be identified using the following entities: Conditional dependency matrix and similarity dependency matrix.

Conditional dependency matrix:
Let P be the matrix and V be the vertex with several nodes and they denoted as V={v 1 ,v 2 ,……v n } where n represents total number of nodes.The object parts are used as vertices in conditional dependency matrix.It is referred as asymmetric matrix because the node v 1 satisfies v 2 whereas v 2 doesn't satisfies v 1 in terms of conditional measures: v ij ≠v ji In matrix P,the term pij defines the conditional measure of vi and vj:

P=[pij]nxn
In case of conditional dependency matrix,more number of trained images are necessary,this increase the time complexity for an image.
Similarity dependency matrix: In this dependency matrix, the edge value is based on the (dis)similarity entities.Let E be the edge with several superpixel nodes.It is denoted as E={v 1 ,v 2 ,….v n } whereas n represents the total number of superpixel nodes. Euclidean distance formula has been used to find distance between the two superpixel nodes (Sayeed et al., 2009;Abas and Ono, 2010;Odeh et al., 2009;Jusoff, 2010;Al-Haddad et al., 2009;Al-Saqer et al., 2010;Nazif and Lee, 2010;Sleit et al., 2009;Moghaddasi et al., 2009). This formula is simple and easy to calculate distance between the nodes.It is represented as dis: Hence the similarity matrix is symmetric as the distance between v i and v j and v j and v i are equal: In this edge is represented as: where, a ij represents the similarity between v i and v j .

Algorithm III:
Input: Oversegmented image of n x n resolution (1):For each cluster C i For j=0 to all cluster Calculate the distance between C i and C j (2):Store the result in the two dimensional array (3):End For (4):End For Output: Graph representation of an image.

Object identification:
Mask is a small matrix, whose values are called weights. Each mask has an origin which is usually one of its position. The origin of symmetric masks is usually their center pixel position. For non-symmetric masks, any pixel location may be chosen as the origin. Mask map increases the probability of Object Parts and Super Pixel to harp inside an Object. Its basic notion is to integrate all the priors in to unified framework. In this works the segmentation task only needs the probability of Super Pixels, because the Object Parts cannot be waived, as it carries shape Priors (Al Rahedi and Atoum, 2009;AL-Salami, 2010;Harishchander et al., 2010;Maalla et al., 2009).
The mask map is first segmented, in which the two nearby regions, having the closest intensities are merged, as long as the intensities lies, below the certain threshold value. Using similar criteria threshold value T is selected and made to grow beginning with the regions with the intensity greater than (1+T)/2 and merge the next adjacent region with the highest intensity until all the intensities of adjacent regions fall below T.

Algorithm IV:
Input: Superpixel image of n x n resolution (1): C i =first cluster (2):For all C j If(distance(C i , C j )<threshold) Mask it to other color (3):End For Output: Segmented object of an image.

RESULTS
In this we represent the experimental results of using similarity dependence graph model for unsupervised object identification. In order to achieve object identification an input image is shown in Fig. 1 which should be of resolution 440×380. Depending upon the similarity among the pixels, the input image is segmented to form super pixeled image as shown in Fig. 2. The similar super pixe are grouped to form an over segmented image that is shown in Fig. 3. The similarity dependency graph as shown in Fig. 4 is formed by taking an over segmented image as input in which edge value is based on the similarity entities. The edge value E is the collection of super pixel nodes and the distance between super pixel nodes are calculated using Euclidean distance formula as shown: Dis = sqrt ((r i -r j ) 2 +(g i -g j ) 2 ) The similarity dependency graph is called so, because the distance between V i and V j is equal to the distance between V j and V i .From the above graph mask map is learned and object is identified as shown in Fig. 5.  For object identification the mask map is first segmented, in which the two nearby regions, having the closest intensities are merged, as long as the intensities lies, below the certain threshold value (Fig. 6-9).

DISCUSSION
The Novel Recursive Clustering Algorithm is used to over segment the image into super pixels. single object is identified from the given input image.

CONCLUSION
In this study we proposed an approach for identifying the object from the input image. Object identification involves super pixel formation, over segmentation, graphical representation. As this study only identifies a single object from an image, it can be enhanced to identify multiple objects in the future.