Fixed Representative Colors Feature Extraction Algorithm for Moving Picture Experts Group-7 Dominant Color Descriptor

,


INTRODUCTION
Overview: The term Content Based Image Retrieval (CBIR) describes the process of retrieving desired images from a large collection of images on the basis of features such as color, shape and texture that can be automatically extracted from the images themselves [1] .
The ideal approach of querying an image database is using content semantics, which applies the human understanding about image [2] . It is like an information filter process and is expected to provide a high percentage of relevant images in response. In general, the image features tend to capture only some of the aspects of image. So, retrieval systems cannot be expected to find all correct images. Otherwise they select the most similar images to let the user choose the relevant images.
Among different types of features, color is the most straightforward information which can be easily retrieved from digital images with simple, while others require more processing and computational tasks [3] .
While comparing image by color feature, two properties are usually considered: • Area of matching-Count the area or number of pixels having same or similar colors. Larger matched area means more similar • Color distance-Distance between colors, usually in a uniform color space such as CIELuv. Closer between matched colors means more similar In a typical color similarity measure, area of matching is usually considered as the similarity. Color distance is used to control the matching between colors and to adjust the similarity.

MPEG-7 dominant color descriptor:
An important achievement for CBIR is MPEG-7. MPEG-7 is an international standard for multimedia content description [4] . MPEG-7 has a collection of effective descriptors for images, videos, audios and other multimedia contents. In its visual part, several color descriptors are defined, in which Dominant Color Descriptor (DCD) is a compact and effective descriptor [5] .
In DCD, Image feature is formed by a small number of representative colors. These colors are normally obtained by using clustering and color quantization. The descriptor consists of the representative colors, their percentages in a region, spatial coherency of the color and color variance [5] . The DCD is defined as (1): where, N is the number of dominant colors. Each dominant color value c i is a vector of corresponding color space (e.g., RGB color space). The percentage p i is the fraction of pixels in the image or image region corresponding to color c i and The spatial coherency s is a single number that represent the overall spatial homogeneity of the dominant colors in the image. The number of dominant colors N is suggested that a maximum of eight dominant colors is sufficient to represent an image or an image region. The dominant color values c i uses 1-12 bits per color component. It is controlled by the color space and color quantization level chosen [6] . Table 1 shows the binary syntax for each component.

DCD extraction:
The extraction procedure for the dominant color uses the Generalized Lloyd Algorithm (GLA) [7] to cluster the pixel color values. It is recommended that the clustering be performed in a perceptually uniform color space such as CIELuv [8] . The distortion Di of i-th cluster is given as (2): Where: c i = The centroid of cluster C i , x(n) = The color vector at pixel n h(n) = The perceptual weight for pixel n The perceptual weights are calculated from local pixel statistics to account for the fact that human visual perception is more sensitive to changes in smooth regions than in textured regions.
The procedure is initialized with one cluster consisting of all pixels and one representative color computed as the centroid (center of mass) of the cluster. The algorithm then follows a sequence of centroid calculation and clustering steps until a stopping criterion (minimum distortion or maximum number of iterations) is met. The cluster with highest distortion is split by adding perturbation vectors to the centroid until the maximum distortion falls below a predefined threshold or the maximum number of clusters is generated. The percentage of pixels in each cluster of the image is then quantized to five bits. The color values are quantized according to the specifications of the color space and the associated color-quantization descriptors. Figure 1 shows the idea of DCD extraction.
Without considering the optional color variance parameter and the spatial coherence, quadratic histogram distance measure is used to measure the dissimilarity D 2 (F 1 , F 2 ) between the two descriptors, which is defined as 6: ( ) where, a 1i , 2 j is the similarity coefficient between two colors c 1i and c 2j , it is identified as (7): Where: 1i 2 j c c − = The Euclidean distance between two colors c 1i and c 2j in CIELuv color space T d = The maximum distance for two colors to be considered as similar In particular, this means that any two dominant colors from one single description are more than Td distance apart. A recommended value for Td is between 10 and 20 in the CIELuv color space and for α is between 1.0 and 1.5. The above dissimilarity measure is very similar to the quadratic distance measure that is commonly used in comparing two color histogram descriptors [3] .

Evaluation:
In MPEG-7 Experiments, datasets and evaluation metrics are defined [9] . For evaluation of retrieval performance, MPEG group have defined an evaluation metric Averaged Normalized Modified Retrieval Rate (ANMRR) to measure the performance of retrieval [3] . ANMRR is designed to measure performance on both number of correctly retrieved images and how high they are ranked.
Normalized Modified Retrieval Rate (NMRR) is used to measure the performance of each query. NMRR is defined by (8): Where: NG (q) = The size of the ground truth set for a query image q Rank(k) = The ranking of the ground truth images retrieved by the retrieval algorithm K(q) = The "relevance rank" for each query As the size of the ground truth set is normally unequal, a suitable K (q) is determined by (9): K(q) min(4 NG(q), 2 GTM) = * * where, GTM is the maximum of NG(q) for all queries. The NMRR is in the range of (0, 1) and smaller values represent a better retrieval performance. ANMRR is defined as the average NMRR over a range of queries, which is given by (10): where, NQ is number of query images.
Corel_1k Dataset: Corel_1k dataset is commonly used in image retrieval researches such as SIMPLIcity [10] . It consists of 1000 images. These images are divided into 10 categories based on semantic concepts. Although each category has its own semantic contents, the visual contents of images in one category could be very different. Ground truth sets of 20 sample query images from different categories are defined by [3] .

RESULTS
Experiments demonstrate that FRCFE has better average performance and it has better results than GLA in the chosen dataset. Table 3 listed the performance comparison of the dominant color descriptor in terms of ANMRR for the query images. Figure 4a and b demonstrate the visual differences for the retrieval results of query #12 "Yellow flower" of the two feature extraction methods. Figure 4 shows that the proposed method could achieve better perceptually relevant image retrieval. , this algorithm depends on converting color pixels of an image to its nearest color from a 38 fixed-colors set. After that the highest eight frequent colors with their percentages are selected to represent the DCD for that image. To evaluate the performance of the new technique, a CBIR system is implemented. Comparisons are made with the existing MPEG-7 feature extraction algorithm (the GLA) on the COREL_1k dataset. Experiments showed that the retrieval results using the new algorithm have performance "ANMRR = 0.3632" in most query images.