SRBIR: Semantic Region Based Image Retrieval by Extracting the Dominant Region and Semantic Learning

: Problem statement: The Semantic Region Based Image Retrieval (SRBIR) system that automatically segments the dominant foreground region, consisting of the semantic concept of the image, such as elephants, roses and does the semantic learning, is proposed. Approach: The system segments an image into different regions and finds the dominant foreground region in it, which is the semantic concept of that image. Then it extracts the low-level features of that dominant foreground region. The Support Vector Machine-Binary Decision Tree (SVM-BDT) is used for semantic learning and it finds the semantic category of an image. The low level features of the dominant region of each category image are used to find the semantic template of that category. The SVM-BDT is constructed with the help of these semantic templates. The high level concept of the query image is obtained using this SVM-BDT. Similarity matching is done between the query image and the set of images belonging to the semantic category of the query image and the top images with least distances are retrieved. Results: Experiments were conducted using the COREL dataset consisting of 10,000 images and its subset with 1000 images of 10 different semantic categories. The obtained results demonstrate the effectiveness of the proposed framework, compared to those of the commonly used region based image retrieval approaches. Conclusion: Efficient image searching, browsing and retrieval are required by users from various domains, such as medicine, fashion, architecture, training and teaching. The proposed


INTRODUCTION
Conventional content-based image retrieval systems use low level features, such as color, texture and shape. But there is a 'semantic gap' between the low level image features and the high level semantics perceived by the user. Therefore, to improve the retrieval accuracy, a CBIR system should reduce this semantic gap. There are several techniques available to reduce the semantic gap. Machine learning techniques can be used to associate the low level features with the high level semantics. The supervised learning technique, SVM is used to solve multiclass problems. In the multi class SVM, the SVM is trained using the DB images of known categories and the class label is predicted for the query image (Rahman et al., 2007).
Then the query image is tested only against those class images. Yogameena et al. (2010) have used the SVM to detect an individual's abnormal behavior in a human crowd. The SVM has achieved great success in pattern recognition and it can be applied in many domains, such as hand-writing recognition, face recognition, voice recognition, text classification and image processing (Yogameena et al., 2010). Rahman et al., (2007) uses the relevance feedback mechanism to learn the users' intention. Object ontology can be used to define high level semantics (Mezaris et al., 2003;Liu et al., 2007). In this system, each region of an image is described by its average color, its position, its size and shape. Object ontology provides a qualitative definition of the high level query concepts (Liu et al., 2007). Semantic templates can be generated to support high level image retrieval. The Semantic template is a map between the high level concept and the low level visual features. The Semantic template is the representative feature of a concept calculated from a collection of sample images (Zhang and Lu, 2008;Liu et al., 2007). Many systems exploit one or more of the above techniques to implement high-level semanticbased image retrieval.
Among the above mentioned techniques, machine learning tools, such as the Support Vector Machine (SVM), Artificial Neural Networks (ANN) and Bayesian Networks (BN) are often used for semantic learning. The proposed content based image retrieval framework, which combines the SVM-PWC and the Fuzzy C-Mean clustering to provide an efficient retrieval and also reduces the search space and time. Decision Tree (DT) learning, such as the ID3, C4.5 and CART are also used in data classification. The DT learning method is very useful in image semantic learning. Zhang and Lu (2008) proposed a region based image retrieval system which makes use of decision tree learning. It first selects the region of interest from the image. Then, that region's low level features are used to find the semantic concept of the image (Zhang and Lu, 2008). Their decision tree learning method is compared with the proposed method and the latter is seen to perform comparatively well.
The SVM-BDT is an efficient classification technique, which combines the essential features of the SVM and the high accuracy of the binary decision tree .  used the binary decision tree mechanism for hand written letters and digits recognition. Mao et al. (2005) have used the fuzzy support vector machine and the binary decision tree for multiclass cancer classification. It produces good results in finding the most important genes that affect certain types of cancer with high recognition accuracy. So this SVM-Binary decision tree technique is used in the SRBIR.
The SRBIR system is a region based image retrieval system using the SVM-BDT to extract the high level image semantics. There are several methods in segmenting the regions from the image. Suhasini et al. (2008) used a graph based segmentation for segmenting the region from an image. Zhang and Lu, (2008), developed a region based image retrieval system. This region based image retrieval system selects the region of interest in the image and extracts the features of that region and compares them with the features of the regions extracted from the Database (DB) images.
Selecting the region of interest for each of the DB images requires much time for training. So, in this implemented SRBIR system, the automatic segmentation of the dominant region in the image is proposed. This automatic segmentation provides the dominant foreground region in the image, which mainly constitutes the semantics of the image. This segmentation algorithm provides the solid region and not the outline alone. Therefore, the extracted dominant foreground region would be less distorted and possess less noise. Hence, the low level features of the images are maintained in the region without much distortion. The implemented SRBIR system uses the SVM-BDT for semantic learning. The color and texture features are extracted from the dominant region of each of the DB images. The color, texture and color-texture semantic templates are found for each semantic category. These semantic templates are used in constructing the SVM-BDT. The experimental results show that our approach provides higher accuracy than other decision tree learning methods like the DT-ST, ID3 and C4.5. The remainder of this study is organized as, system description, automatic segmentation of the dominant foreground region, feature extraction, semantic learning based on the SVM-BDT, experimental results and discussion. Finally, the conclusion and future work are provided.

MATERIALS AND METHODS
System description: The system finds the dominant foreground region in each of the database images and extracts the low level color and texture features for the dominant region of each image. These color texture features are stored in a database. For each category of images, the system finds the semantic templates, namely the color template, texture template and color texture template. These templates are used in building the SVM-Binary Decision Tree (SVM-BDT) which is used to find the class label of the query image. During retrieval, the proposed system finds the dominant foreground region of the query image and finds its color and texture features. These features are used as input to the SVM-BDT and it predicts the label of the query image. The distance between the color texture features of the query image and the color-texture features of the dominant region of the DB images of the predicted class is found. The distance measures used in this implemented system are the Euclidean distance, Bhattacharya distance and Mahalanobis distance. These distance values are sorted in the ascending order and  Figure 1 is the block diagram of the proposed system, SRBIR.

Segmentation of the dominant foreground region:
The dominant foreground region of an image is the region which occupies most of the space in the image foreground. The SRBIR system extracts the dominant foreground region which gives the semantics of the image. Also, the obtained dominant region is a solid region and not the outline. Thus, the dominant region extracted, has reduced noise. So the low-level features extracted from the dominant region will not have more distortions.

Algorithm for extracting the dominant foreground region of an image:
1. An RGB image is read and the indexed image is obtained from it. The indexed image is used to get back the color from the corresponding gray scale image Let ind_img ← Indexed image 2. The gray scale image is obtained from the color image 3. Noise is removed by applying median filtering 4. The edges of the image are found by using 'canny edge detection'. Heath et al. (1997) . Now the solid region is converted back into a color image, using the color mapping. Figure 2 is the result of the automatic segmentation of the dominant region from an image. The first column images 2a, 2c, 2e, 2g and 2i are the original images and the corresponding second column images 2b, 2d, 2f, 2h and 2j are the dominant regions of those images, obtained by the automatic segmentation of the dominant region. The images 2b, 2d, 2f and 2h show perfect segmentation; Fig. 2j is with some noise.
Many of the region based image retrieval systems are based on the selection of the region of interest. In the region based image retrieval proposed by Zhang and Lu, (2008), the region of interest is selected by the user and the region in that portion is extracted. The features of this region are considered for similarity matching. In the SRBIR system, the automatic segmentation of the dominant foreground region is attempted and the result is given in Fig. 2. Also, the segmentation based on the user's selection of the region of interest has been attempted in this study. Figure 3 shows the segmentation based on user selection. Figure 3a and 3c are the original images and Fig. 3b-3d are the segmented regions. The segmentation by selecting the region of interest has been carried out using the SegTool. μ μ μ σ σ σ color feature vector (f c ) is extracted. μ represents the mean and σ corresponds to the standard deviation of each color channel in the HSV (Rahman et al., 2007;Rahman et al., 2005).
The texture information is extracted from the graylevel co-occurrence matrix. A gray level co-occurrence matrix is defined as the sample of the joint probability density of the gray levels of two pixels separated by a given displacement d and angle θ (Rahman et al., 2007;Liu et al., 2005). The four co-occurrence matrices of the four different orientations (horizontal 0°, vertical 90° and two diagonals 45° and 135°) are constructed. The co-occurrence matrix reveals certain properties about the spatial distribution of the gray levels in the image. Higher order features such as energy, contrast, homogeneity, correlation and entropy are measured using Eq. 1-5 on each gray level co-occurrence matrix (Abbadi et al., 2010;Rahman et al., 2007;Haralick et al., 1973) to form a five dimensional feature vector: Finally, a twenty dimensional feature vector (f t ) is obtained by concatenating the feature vectors of each co-occurrence matrix. So, the color-texture feature vector of dimension 26 is obtained (f ct = f c + f t ).

Learning the image semantics using the SVM-BDT:
The purpose of constructing the SVM-BDT is to associate the low-level features of the image regions with the high-level concepts. Figure 4 is the block diagram of the construction of the SVM-BDT. The COREL dataset consisting of 1000 images of 10 different categories is used. The different categories are images of African faces, beaches, buildings, buses, dinosaurs, elephants, roses, horses, snowy mountains and dishes. The input to the SVM-BDT is the low level . The color-texture template is calculated as CT i = C i + T i .

Construction of the SVM-BDT:
The SVM binary decision tree construction consists of two major steps. The first step involves constructing the Binary Decision Tree (BDT) by clustering the various classes of the DB images. The second step involves associating a binary class SVM at each node of the BDT obtained in the first step (Liu et al., 2007). We use this approach in our region based image retrieval framework.
If K is the number of classes, then, the Euclidean distance between the color-texture templates of all the K classes is found. Thus, the KXK distance matrix is obtained. The distance matrix is used for further grouping. The two classes that have the largest Euclidean distance between them are assigned to each of the two clustering groups and the color-texture template of the two classes is taken as the cluster center for the corresponding group. After this, the pair of classes each of which is closest to the cluster centers of the groups is found and assigned to the corresponding group. Now, the cluster center is updated to the color-texture template of the class that has been included recently in the group. The process continues by finding the next pair of unassigned classes, each of which is closest to one of the two clustering groups and assigning them to the corresponding group and the cluster center is updated. Thus, all the classes are assigned to one of the two possible groups of classes. The SVM binary classifier is used to train the samples in the root node of the decision tree. The classes from the first clustering group are assigned to the first (left) sub tree, while the classes from the second clustering group are assigned to the second (right) sub tree. The process of recursively dividing each of the groups into two sub-groups continues, until there is only one class per group, which defines a leaf in the decision tree . This procedure leads to a binary tree for the SVM-BDT that will always be balanced, resulting in the best decision efficiency. Figure 5 shows the constructed SVM-BDT. After finding the Euclidean distance between all the semantic templates, class c 5 and c 7 are the farthest and assigned to the groups G 1 and G 2 respectively. c 5 contains the images of dinosaurs and c 7 contains the images of roses. Closest to group G 1 is class c 3 and closest to group G 2 is c 6 . In the next step, c 4 is assigned to group G 1 and C 8 to In the next step, c 10 is assigned to group G 1 and c 2 to group G 2 . In the next step, c 1 is assigned to group G 1 and c 9 to group G 2 . This completes the first round of grouping that defines the classes that will be transferred to the left and right sub tree of the root. The SVM binary classifier in the root is trained by considering the samples from the classes {c 5 , c 3 , c 4 , c 10 , c 1 } as positive samples and samples from the classes {c 7 , c 6 , c 8 , c 2 , c 9 } as negative samples.
The grouping procedure is repeated independently for the classes on the left and right sub trees of the root, which results in grouping {c 4 , c 3 , c 5 }in G 11 and {c 1 , c 10 } in G 12 on the left side of the sub tree and {c 8 , c 2 , c 9 } in G 13 and {c 7 , c 6 } in G 14 on the right side of the sub tree. In the next level, {c 5 , c 3 } is grouped in G 21 on the left side of the sub tree from the root and {c 2 , c 9 } in G 22 on the right side of the sub tree from the root. At each nonleaf node of the SVM-BDT, the SVM binary classifier is used for training the positive and negative samples.

Class prediction using the SVM-BDT and image retrieval:
For the query image, the dominant region is automatically found and the color and texture features are extracted for this region. This color and texture feature is given as the input to the SVM-BDT. The SVM binary classifier at each non-leaf node is used to branch through the SVM-BDT and thereby the class label of the query image is predicted. Thus, the statistical similarity measures can be applied between the query image and only the images of a particular class. This reduces the search space as well as the searches for the image, based on the high-level concept.
If the feature vector of the query image is represented by equation (8) and the feature vector of the target image is given by equation (9), then the Euclidean distance (Rahman et al., 2007;Rahman et al., 2005) between the query and target image is given by Eq. (10): q = (q 1 , q 2 , …, q n ) (8) t = (t 1 , t 2 , …, t n ) (9) If the query image q and the target image t are assumed to be in different classes and their respective densities are p q (x) and p t (x) both defined on ℜ 1 , then, a popular measure of similarity between the two Gaussian distributions is the Bhattacharya distance. The Bhattacharya distance (Rahman et al., 2007;Fukunaga, 1990) for the query image is calculated using Eq. 11: Where: µ q and µ t = The mean vectors Σ q and Σ t = The covariance matrices of the query image q and target image t respectively.
When all the classes have the same covariance matrices, the Bhattacharya distance reduces to the Mahalanobis distance (Rahman et al., 2007;Rahman et al., 2005) and it is calculated using Eq. 12: Where: µ = The mean vector ∑ = The covariance matrix The distances are found between the feature vector of the query image and the feature vector of the target images. These distances are sorted in the increasing order and the top k images with the least distance are obtained and the corresponding images are displayed as the output.

RESULTS
In order to verify the effectiveness and efficiency of the SRBIR system, experiments were conducted on the COREL dataset consisting of 10,000 images and its subset with 1000 images of sizes 256×384 and 384×256. The training sample for the SVM consists of the fully labeled DB.
The automatic segmentation algorithm is tried for both the image sets. For the 1000 image data set, 86% of the images are correctly segmented and the dominant object in the image is obtained. The remaining 14% of the images are not accurately segmented. In the case of 10,000 images of the Corel image data set, only 70% accuracy was achieved.
For the construction of the SVM-BDT, the semantic template of each category is calculated by obtaining the mean for each feature vector of the images in each semantic category. Thus, ten semantic templates are calculated for the considered 10 semantic categories. Then, the Euclidean distances between every pair of semantic templates are calculated. This distance matrix is used for the classification, which continues till each group contains a single class. The SVM binary classifier is used in training each level node, and it divides the group into two sub groups. Only nine SVM binary classifiers are needed to perform the multi class classification. The dominant region of the query image is found and the color-texture feature of the dominant region is given as the input to the SVM-BDT classification for predicting the semantic class of the query image. The LIBSVM package has been used for implementing the SVM-BDT in the SRBIR system (Hsu et al., 2003). The results obtained using the automatic segmentation of the dominant foreground region with the SVM-BDT and the region selection with the SVM-BDT are shown in Figs. 6 and 7.

DISCUSSION
The results of the SRBIR system show that if we train the SVM-BDT with 100% of the training set images, then the system produces 100% accuracy. If the SVM-BDT is trained with 75% of the images in the training data set, it produces 95.4% accuracy and if it is trained with 50% of the images in the training DB, the accuracy is 83%. The testing time is the same for all the three types, while the training time increases as the training set size increases. The training time is a single time operation and hence, it could be neglected. The results are shown in Table 1.    The results of the SRBIR system are compared with those of the other decision tree learning methods like the DT-ST, ID3 and C4.5 and also with the RBIR with the region selection method and the SVM-BDT. The comparison is given in Table 2 and the corresponding graph is shown in Fig. 8. It shows that the implemented SRBIR produces higher accuracy than the existing RBIR techniques.

CONCLUSION
The semantic region based image retrieval looks for high-level features which are close to the human interpretation of images. The proposed system uses the automatic segmentation of the dominant foreground region from the image which provides the high-level semantics of the image. The automatic segmentation of the dominant region reduces the noise in the segmentation and the low-level features of the region are maintained without much distortion. The low level features are extracted from the dominant region of each of the images and these features are used in training the SVM-binary decision tree. The SVM-BDT is trained with the color-texture template of each image category. This SVM-BDT is used to predict the class label of the query image. Thus, only the images whose high level semantics match with those of the query image are considered for similarity matching. This reduces the testing time and the accuracy of the system is promising, when compared to the other region-based image retrieval techniques. If the query image is not in the data set, the SRBIR system produces some misclassifications. This is the only limitation of the SRBIR system. Our future work aims at reducing such misclassifications and to provide the relevant images from the database. Hence, the SRBIR framework using the SVM-BDT can be used as a front-end for image search, which yields high accuracy and takes less access time.