WAVELET BASED CONTENT BASED IMAGE RETRIEVAL USING COLOR AND TEXTURE FEATURE EXTRACTION BY GRAY LEVEL COOCURENCE MATRIX AND COLOR COOCURENCE MATRIX

In this study we proposes an effective content based image retrieval by color and texture based on wavelet coefficient method to achieve good retrieval in efficiency. Color feature extraction is done by color Histogram. The texture feature extraction is acquired by Gray Level Coocurence Matrix (GLCM) or Color Coocurence Matrix (CCM). This study provides better result for image retrieval by integrated features. Feature extraction by color Histogram, texture by GLCM, texture by CCM are compared in terms of precision performance measure.


INTRODUCTION
Due to increased development of multimedia information over internet and photographic technology the number of digital images are rapidly increasing. In multimedia world, research in information retrieval is one of the most important field. Text based and content based approachare applicable for image retrieval. In the text based approach, the images are manually annotated by text description and the database management system is used fot the image retrieval. The limitation over the text based search or keyword search is, the perspective of the textual description may differ from the perspective of the user. Large amount of man power also needed in manual image annotation. To overcome these limitations image retrieval is carried out according to the image contents.
Content-Based Image Retrieval (CBIR), which belongs to a research field of image analysis, also known as Query By Image Content (QBIC) and Content-Based Visual Information Retrieval (CBVIR). The key technologies image retrieval include: Image feature extraction, feature-based similarity calculation, semantically relevance feedback and image acquisition (Youngeun et al., 2008). It relates to machine vision, pattern recognition, database technology and information retrieval studies.
Everyone is interested for accurate and fast retrieval. Maintaining the image database and retrieval of correct image from the database is challenging task. Such strategy is called Content-Based Image Retrieval (CBIR). CBIR has become advancing research area in image retrieval. CBIR have been progressed in four major areas: Global image properties based, region-level features based, relevance feedback and semantic based.

RELATED WORKS
In global image property based method Global feature are calculated over the entire image. Ex: Average gray level, shape of intensity histogram. The advantage of global image properties based image retrieval is high speed for feature extraction and similarity measure. This method is sensitive to the location. These global feature cannot handle all the Science Publications JCS parts of the image that have different characteristics. Hence this method is fail to identify the important visual characteristics. Therefore region level feature based extraction is necessary to retrieve the local features of the image. Image segmentation and edge detection algorithm are used to extract the local features. The feature extraction of the image is limited to the subset of the image.
In relevance feedback based approach to improve the performance of the Information Retrieval (IR) by interactively asking a user whether a set of retrieved documents are relevant or not to the given input query. User provides judgment on the output images. If the user does not satisfied on the result user can iteratively search for result. Vilvanathan and Rangaswamy (2013) proposed decision tree based image classification and retrieval system. He used Classification and Regression Tree (CART) for the classification of the image.
In semantic based approach, image retrieval is based on the underlying semantic of images. To extract these semantics a hierarchical, probabilistic approach is proposed.
Content based image retrieval technology overcome the limitations of text based image retrieval technology. Content Based Image Retrieval (CBIR) support effective searching and browsing of large image digital libraries based on automatically derived imagery features. A typical CBIR system views the query image and the database images as a collection of features and ranks the images with respect to similarity measure calculated between query image and target image. Figure 1 and 2 shows the block diagram of convential content based image retrieval.
Image retrieval based on color by HSV color model for graphic design, RGB is not a very intuitive way to represent colors. Most of the artist like to use HSV color model because it is an intuitive way to modify the color in a region of an image.
HSV color model stands for Hue Saturation Value. This model describes colors in terms of its shades and brightness (Luminance).
Conversion from RGB to HSV color space: Different colors combined with the frequency bandwidth of each color, we obtain one dimensional vector L Equation (1): Color histograms are widely used for retrieval of results based on queries. For such queries, color histograms can be employed because they are very efficient regarding computations as well as they offer insensitivity to small changes regarding camera position (Youngeun et al., 2008). But the main problem with color histograms is their coarse characterization of an image. That may itself result in same histograms for images with different appearances. Color histograms are employed in systems such as QBIC, Chabot. They all utilize the advantages of color histogram. In this study, a modified scheme based on color histogram is used. This modified method is based on histogram refinement (Jeyanthi et al., 2010). The histogram refinement method provides that the pixels within a given bucket be split into classes based upon some local property and these split histograms are then compared on bucket by bucket basis just like normal histogram matching but the pixels within a bucket with same local property are compared. So the results are better than the normal histogram matching. So not only the color features of the image are used but also then spatial information is incorporated to refine the histogram.

PROPOSED SYSTEM FEATURE
In this proposed work, we use Kmeans clustering for the classification of feature set obtained from the histogram refinement method. Histogram refinement provides a set of features for proposed for Content Based Image Retrieval (CBIR). Histogram refinement method further refines the histogram by splitting the pixels in a given bucket into several classes based on color coherence vectors. Several features are calculated for each of the cluster and these features are further classified. Figure 3 shows the flow diagram of the integrated image retrieval method.

Histogram Refinement Method for Image Classification
The RGB image is changed to grayscale image, also known as the intensity image, which is a single 2-D matrix containing values from 0 to 255. After the conversion from RGB to gray scale image, we perform quantization to reduce the number of levels in the image. We reduce the 256 levels to 16 levels in the quantized image by using uniform quantization (Jeyanthi et al., 2010). The segmentation is done by using color histograms.

Fig. 2. Block diagram for CBIR system
This study proposes to use K-means clustering for the feature set obtained using the histogram refinement method which is based on the concept of coherency and incoherency. We have given comparisons for 8 bin and 16 bin. For 16 bin recall is better than 8bin (Jeyanthi et al., 2010). The gray scale values difference, mean, sizes of the objects are considered as appropriate features for retrieval. For indexing of images, we proposed K-means clustering. We have shown that K-means clustering is quite useful for relevant image retrieval queries.

Selection of Features
First the image is converted from RGB colorspace to gray scale color image or intensity image. The intensity image is a single 2-D matrix containing values from 0 to 255. Then the intensity image is quantized to reduce the number of levels in the image. We reduced 256 levels to 16 levels by quantization. Then we find out the coherent pixel and incoherent pixels. After the above process color histogram buckets are partitioned based on spatial coherence. A part of a some sizable similar colored region is called coherent, otherwise that pixel is incoherent within the bucket.
If a pixel is a large group of pixel of the same color that form at least five percent of the image then that pixel group is called coherent group or clouster. Otherwise it is incoherent group or cluster. The Number of cluster for coherent and incoherent bin are found and then average of each cluster are computed.

Color Histograms
Color histogram are used to compare the images. Examples of their use in multimedia application include scene break detection and querying a database of the image. Their popularity stems from many factors: • Color histograms are trival to compute • Camera view point small changes tend not to effect color histogram • Different objects have distinct color histogram We will assume that images are scaled to contain the same number of pixel M. The color space of the image discretized such that there are n distinct colors. A color histogram is vector (h 1 , h 2 …h n ) in which each bucket contains j number of colors. For a given image I, the color, the color histogram H1 is a summary of that image. Database image can be queried to find the most similar

Feature Selection
Then two more properties are calculated for each bucket or group. First the numbers of clusters are found for each coherent and incoherent bin. Secondly, the each cluster average is computed. So for each bin, there are six values: One each for percentage of coherent pixels and incoherent pixels, number of coherent clusters and incoherent clusters, average of coherent cluster and incoherent cluster are calculated. Some additional features are also calculated only for the cocurent cluster. Incoherent clusters are ignored.
Based on the size of the cluster three features selected. They are: (i) Size of largest cluster for each bin, (ii) Size of smallest cluster for each bin, (iii) Size of median cluster for each bin, (iv) Variance of clusters for each bin. Let us denote the largest cluster in each bin as Lαj, the median cluster in each bin as Mαj, the smallest cluster in each bin as Sαj and variance of clusters in each bin as Vαj. These features are used for the image retrieval (Youngeun et al., 2008).

The K-Means Algorithm
Algorithm: k-means. The k-means algorithm for partitioning based on the mean value of the objects in the cluster.
Input: The number of clusters k and a database containing n objects.
Output: A set of k clusters that minimizes the squared-error criterion. Method: • Arbitrarily choose k objects as the initial cluster centers • Repeat • (Re)assign each object to the cluster to which the object is the most similar, based on the mean value of the Objects in the cluster • Update the cluster means, i.e., calculate the mean value of the objects for each cluster • Until no change The purpose of K-mean clustering is to classify the data. We selected K-means clustering because it is suitable to cluster large amounts of data. K-means creates a single level of clusters unlike hierarchical clustering method's tree structure. Each observation in the data is treated as an object having a location in space and a partition is found in which objects within each cluster are as close to each other as possible and as far from objects in other clusters as possible. Selection of distance measure is an important step in clustering. Distance measure determines the similarity of two elements. It greatly influences the shape of the clusters, as some elements may be close to one another according to one distance and further away according to another. We selected to use quadratic distance measure which provides the quadratic between the various features. We calculated the distance between all the row vectors of our feature set obtained from previous section, hence finding similarity between every pair of objects in the data set. The result is a distance matrix.
Next, we used the member objects and the centroid to define each cluster. The centroid for each cluster is the point to which the sum of distances from all objects in that cluster is minimized. The distance information generated above is utilized to determine the proximity of objects to each other. The objects are grouped into K-clusters using the distance between the centroids of the two groups. Let Op is the number of objects in cluster p and Oq is the number of objects in cluster q, dpi is the ith object in cluster p and dqj is the jth object in cluster q. The centroid distance between the two clusters p and q is given as:

Improved K Means Cluster
Original K-means algorithm choose k points as initial clustering centers, different points may obtain different solutions. In order to diminish the sensitivity of initial point choice, (Youngeun et al., 2008) we employ a mediod, which is the most centrally located object in a cluster, to obtain better initial centers. The sample to nearly represent the original dataset, that is to say, samples drawn from dataset can't cause distortion and can reflect original data's distribution. So we have used the improved kmeans clustering technique to classify the images.

Performance Meaurse
We measure performance using two standard metrics from the Information Retrievalliterature, namely recall and precision. If we define the images in a given query category as the relevant images, then recall and precision are defined as follows:

TEXTURE FEATURE EXTRACTION
An important feature of an image is texture. To describe the texture of the region three approach are used in image processing these are statistical, structural and spectral.
Statistical approaches specify the characterization of the textures by smooth, coarse, grainy, silky and so on. The common second order statistic is gray level co occurrence matrix.
Gray Level Co-Occurrence Matrix (GLCM), Color Co-Occurrence Matrix (CCM) are most commonly used statistical approaches to extract the texture feature of an image.
Haralick suggested 14 features can be used for feature extraction. An image contains information about the textual characteristic such as Contrast, correlation, Inverse, Variance, Inverse difference moment, Sum Average, sum Entropy ,Sum variance, Entropy, Difference Variance, Mean of Correlation, Angular second Moment. Energy, Entropy, Contrast, Correlation and Homogeneity features are mostly used to extract the feature from the Images. The sum of squared elements in the gray level co-occurrence matrix called Energy. Entropy measures randomness of intensity distribution. Amount of local variation in the images called Contrast. Image intensity measure the correlation. Homogeneity measures the closeness of the distribution.

Gray Level Co-occurrence MATRIX (GLCM)
Gray Level Co-occurrence Matrix (GLCM) method is based on the conditional probability density function. GLCM introduced by Haralick. it contains the information about the positions of pixels that having similar gray level values. Co-occurrence matrix function represents the direction and distance. In the given direction and distance we can calculate the symbolic gray level pixel i, j. That can be expressed as the number of co occurrence matrix element (Hamza and Al-Assadi, 2012) A GLCM is represented as a matrix. In which the number of rows and columns is equal to the number of gray levels in the image. The matrix element P(i, j | d, θ) is the relative frequency with which two pixels, separated by distance d. The direction specified by the particular angle (θ) (Xing-yuan et al., 2012).
Texture feature are computed from the statistical distribution. This can be observed combination of intensity at specified position relative to each other in the image. Figure 1 represents an example image region and its equalent co occurrence matrix.
The number of intensity points in each combination are used to classify the statistics into first order, second order and higher order statistics.

Discrete Wavelet Transform (DWT)
The Wavelet Transform is created by repeatedly filtering the image coefficients on a row by row and column by column basis. A two-dimensional DWT decomposition of image contains various band information such as low-low frequency approximation band, high-low frequency vertical detail band, low-high frequency horizontal detail band and high-high frequency diagonal detail band.
Discreate wavelet transform is used to obtain good image retrieval base on the low computational cost.
Discrete Wavelet Transform (DWT), which transforms a discrete time signal to a discrete wavelet representation.

Wavelet Feature Extraction
Wavelet ψ (t) is a function that satisfy the following condition Equation (2): Discrete Wavelet Transform (DWT) is a spectral estimation technique, which decompose a function or a signal into different frequency sub-bands and leads to a set of wavelet coefficients. The low level physical features of the original image are buried in these wavelet coefficients; therefore the signal can be characterized by these wavelet coefficient.

Texture Feature Extraction by CCM
Actually, the main part is usually located in the center of the whole image for most images. Based on this, we propose a two-level color image retrieval method, which takes account of the color-spatial. For most of the image main part of the image located at the center feature, the main part and the retrieval speed in this study. The detailed steps are as follows.
The steps are as follows. Divide the query image into 64 blocks on an average and consist of 5 regions. Then we mark the center, bottom, left, right and top region as region (1 to 4).
Different region have different importance. We should give the different weights for the different regions (Xing-Yuan et al., 2012): • Set the threshold Ti for the ith level retrieval. M number of images wants to retrieve during the process of retrieval S • For each color image in the database calculate the similarity between the query image and the database image for each region • Similarity of the image greater than threshold value mark the image as similar • Else go to the next target image • Suppose n to be the total number of retrieved image.
If n< m then to the next iteration • Output the retrieval results Calculate the normalized co-occurrence matrix for each R, G, B and I component. Extract the following statistical values from each matrix. Then calculate the normalized feature vector of image.
In our proposed technique, the texture feature is extracted by the gray level co-occurrence matrix and cooccurrence matrix in which the result of those two methods are used in the Euclidean distance function to get the exact match of the images.

Distanc Measures
Distance measures such as the Manhattan distance and Euclidean distance have been used commonly to determine the similarity of feature vectors. In CBIR system Euclidean distance is used to commonly to compare the similarity between the images. In document retrieval system cosine angle is commonly used for the distance measure (Baharul and Kaliyaperumal, 2012).
Distance between two images is used to find the similarities between query image and the images in the database. The proposed method used the Euclidean distance between the two feature vectors of the images. The distance can be calculated by the following formula: where, p = (p1,p2…pn) and Q = (q1,q2…qn) are two points in n dimensional space. Table 1 shows the comparision precision value of retrieval by histogram refinement method, Image retrieval by texture feature extraction using GLCM and CCM. Integrated method color and texture based retrieval by CCM gives the better result. Table 2 shows the time comparision value for all the three methods.

CONCLUSION
In this study, an image retrieval based on color histogram technique, Texture feature extraction using gray level color co-occurrence and color co-occurrence matrix are presented. We combine the color histogram method and gray level co-occurrence technique. The color co-occurrence matrix and color histogram techniques are integrated. Euclidean distance measure used as a classifier to find the similarity. Our experiments shows the integrated method of color and texture is giving better results than the single color image retrieval. The wavelet based image retrieval using gray level co-occurrence matrix produces better result than the color co-occurrence matrix.