Content Based Image Retrieval of Ultrasound Liver Diseases Based on Hybrid Approach

: Problem statement: In the past few years, immense improvement was obtained in the field of Content-Based Image Retrieval (CBIR). Nevertheless, existing systems still fail when applied to medical image databases. Simple feature-extraction algorithms that operate on the entire image for characterization of color, texture, or shape cannot be related to the descriptive semantics of medical knowledge that is extracted from images by human experts. Approach: In this study, we present a hybrid approach called Support vector machine combined with relevance feedback for the retrieval of liver diseases from Ultrasound (US) images is introduced. SVM and RF are supervised active learning technique used to improve the effectiveness of the retrieval system. Three kinds of liver diseases are identified including cyst, alcoholic cirrhosis and carcinoma. The diagnosis scheme includes four steps: image registration, feature extraction, feature selection and image retrieval. First the ultrasound images are registered in the database based on the modality. Then the features, derived from first order statistics, gray level co-occurrence matrix and fractal geometry, are obtained from the Pathology Bearing Regions (PBRs) among the normal and abnormal ultrasound images. The Correlation Based Feature Selection (CFS) algorithm selects the certain features for the specific diseases and also reduces dimensionality space for classification. Finally, we implement our hybrid approach for retrieval of specific diseases from the database. Results: This hybrid approach can get the query from user and has retrieved both positive and negative samples from the database, by getting feedback in each round from the radiologist is help to improve the retrieval of correct images. Conclusion: The hybrid approach (SVM+RF) comprises several benefits when compared to existing CBIR for medical system by neural network algorithms. Fractal geometry in feature extraction plays crucial role in ultrasound liver image retrieval. CFS also reduce the dimensionality issue during storage. Image registration plays an important role in the retrieval. It reduces the redundancy of retrieval images and increases the response rate. Getting relevance feedback from physician helps to improve the accuracy of retrieval images from the database


INTRODUCTION
Medical and healthcare sector is a big industry directly related to every citizen's quality of life. Image based medical diagnosis is one of the important service area in this sector. Nowadays, a large number of diverse radiological and pathological images in digital format are generated by hospitals and medical centers with sophisticated image acquisition devices and digital scanners (Deserno et al., 2009). The digital imaging revolution in medical domain of the last three decades has paved the way for physicians and radiologists to image guided diagnosis and treatment of diseases. Medical images are playing an important role to detect anatomical and functional information of the body part for diagnosis, medical research and education (Xue et al., 2005). Modern standards such as Digital Imaging and Communication in Medicine (DICOM) (Guild et al., 2007) and Picture Archival and Communication Systems (PACS) (Guild et al., 2007) make it relatively easy to store and transport these images and increase interoperability. Medical images of diverse modalities such as Computerized Tomography (CT), Magnetic Resonance Image (MRI), Single Positron Emission Computed Tomography (SPECT), Ultrasound (US) from radiological departments and dermatology, microscopic pathology and histology images from other departments are generally complex in nature and require extensive image processing techniques for computer aided diagnosis (Yeh et al., 2003;Hong et al., 2002;Aube et al., 2002). Due to this reason, in most of the cases physicians or radiologists examine images in conventional ways based on their individual experiences and knowledge.
Image retrieval in general and Content-Based Image Retrieval (CBIR) is particular in the field of medical domain has been one of the most exciting and fastest growing research areas over the last decade (Nadler and Smith, 1993;Oosterveld et al., 1993;Mustafa and Mostafa, 2003). The term image retrieval means finding similar images from a large database archive with the help of some key attributes associated with the images or features inherently contained in the images. In the medical domain, the ultimate goal of image retrieval is to provide diagnostic support to physicians or radiologists by displaying relevant past cases, along with proven pathologies as ground truth (Laws, 1999). However, medical image retrieval can also be useful as a training tool for medical students and residents in education, follow-up studies for detecting the growth of tumors and for research purposes. Several existing works on content based medical image retrieval for ultrasound liver diseases were undergone by neural network algorithms (Hsu and Lin, 2002).
Image registration or geometric alignment of twodimensional and /or three dimensional image data, is becoming increasingly important in diagnosis, treatment planning, functional studies and content based medical image retrieval in biomedical research (Wu et al., 1992;Sonka et al., 2008). Image registration is the process of determining the spatial transform that maps point from one image (defined as the moving image) to homologous points on an object in the reference image (called as the fixed image) for multimodality. The similarity of the two images will be calculated and investigated after each transform until they are matched. If it is mono-modal, directly store the images into the database.
In recent reports, some approaches for contentbased retrieval designed to support specific medical tasks have been published. Pratt (2001) describe a system for fast and effective retrieval of tumor shapes in mammogram X-rays (Kim et al., 2002). This approach has certain restrictions on both the images (only mammographic X-rays) and the features (only tumor shapes) that are supported by the system. Likewise, the ASSERT system operates only on high resolution computed topographies (HRCTs) of the lung (Mao, 2004). A physician delineates the pathology bearing region and marks a set of anatomical landmarks when the image is entered into the database. Hence, ASSERT has extremely high data entry costs, which prohibit its application for clinical routine. CHU et al. present a knowledge-based image retrieval system with spatial and temporal constructs (Hsu and Lin, 2002). Brain lesions are automatically extracted within 3D data sets from CT and MR. Their representation model consists of additional knowledge based forth layer within the semantic model. This layer provides a mechanism for accessing and processing spatial, evolutionary and temporal queries. However, those concepts for medical image retrieval are task specific and not transferrable to other medical applications. TAGARE et al. point out some of the unique challenges confronting retrieval engines with medical image collections (Park et al., 2004).
Liver is an important organ since it plays a vital role in human organ system. Therefore, liver diseases have attracted much attention for a long time. Diagnostic ultrasound is a useful clinical tool for visualizing organs and soft tissues in human abdominal wall without any deteriorating effects. It enables the physician to select the right image plane to display pathological anatomy accurately. Its significance is that it is safe to handle, non-radiological and non-invasive. One such application of diagnostic ultrasound is liver imaging. Liver diseases are best identified using these gray scale images. Traditionally, to determine whether the liver tissue is normal or abnormal relies on specialized radiologists. The decisions made by radiologist are heavily dependent on their experience, which might be related to certain characteristics from the visual interpretation of the image or some comparisons with different pathologies (Huang and Ling, 2005). However, several studies have shown the accurate decision rate by using simple visual interpretation of liver diseases is only about 54%.
The technique of image analysis is widely used in medical fields. In the CAD, image analysis and statistics have especially played the most important roles. The techniques of image analysis are used to extract the PBRs from images and calculate the features, while statistics provide the theoretical supports for feature classification. Our proposed hybrid approach of classification method for liver tissues consists of 3 stages. The first step involves image registration; the second step involves feature extraction and selection. The characterization of liver images, based on texture analysis techniques, has been developed over several years. Some features are appropriate for the classification of specific diseases and other features are suitable for other diseases. The features like first order statistics, grey level co-occurrence matrix and fractal geometry are suitable for identifying cyst, alcoholic cirrhosis, carcinoma diseases. The Correlation based Feature Selection (CFS) algorithm thus can select the certain features for the specific diseases and further to reduce the feature space for storage in database. After appropriate feature selection, these certain features are fed into the third step, the SVM classifier, to identify the diseases (Gletsos et al., 2003). Besides, a relevant feedback from physician helps to improve the effectiveness of the retrieval system.

Properties of ultrasound modality of liver diseases:
The block diagram of the proposed liver disease diagnostic retrieval system is shown in Fig. 1. First of all, PBR is extracted manually from input query image by physician or radiologists and registered into the database. The PBRs and then fed into the feature extraction and selection module. In this module, the appropriate features are evaluated and processed and the certain features for the specific diseases are selected. These features are then classified by SVM to determine what diseases belong to and finally passed to the retrieval model with the help of relevance feedback.
We categorize liver disease into: liver cysts, alcoholic cirrhosis and carcinoma. Based on expert knowledge, the physiological properties of liver disease are described as follows.
Liver cyst: In general, cysts are thin-walled structure that contains fluid. Most cysts are single, although some patients may have several cysts called Polycystic Liver Diseases (PLD). Simple liver cyst is always benign (Lee et al., 2002). The liver cyst contour is smooth, but contrast between normal liver and cyst tissue is high. i.e., the gray level of cyst tissue is much darker than that of normal tissue. Liver cysts occur in approximately 5% of the population.
Alcoholic cirrhosis: A liver normally contains a certain amount of fat due to alcohol intake, but if fat represents over 5-10% of the weight of the liver, that person is said to have cirrhosis (Lee et al., 2003). Most gray levels of cirrhotic tissue are darker than the normal tissue, but in some cases they may be similar. Cirrhosis tissue contours vary case by case, preventing cirrhosis from being distinguished simply form normal tissue, based on gray levels or contours. Cirrhosis, by itself does not cause symptoms. It is often diagnosed when something is seen to be abnormal in other types of tests, such as blood tests. Yet, a doctor may find that a liver appears enlarged during a physical examination by ultrasound scan test.

Carcinoma:
Carcinoma involves benign hepatic masses and consists of large thin-walled blood vessels, lined with flattened epithelium and separated by fibrous spaces filled with venous blood, commonly occurs to women (Chang and Lin, 2001;Jain et al., 2000). When carcinoma is not treated early or does not respond to treatment, the liver progressively shuts down, or fails.

Image registration:
In this study, we develop a monomodal image registration technique, which is based on ultrasound images of liver. In the pre-processing step, we remove some noise and normalize the size of the image. The registration is performed employing intensity based registration using mutual information. Mutual information is an automatic, intensity-based metric, which does not require the definition of landmarks or features such as surfaces and can be applied retrospect. Furthermore, it is one of the few intensity-based measures well suited to registration of multimodal images also. Unlike measures based on correlation of gray values or differences of gray values, mutual information does not assume a linear relationship among the gray values in the image. All images are stored into the database through image registration only.

Feature extraction method:
Feature extraction is a crucial step for any pattern recognition task especially for ultrasonic liver tissues classification since liver images are highly complex and it is difficult to define a reliable and robust feature vector. Generally, ultrasound B-scan images present various granular structures as texture; the analysis of ultrasound image is analogous to the problem in texture analysis. However textural features are those characteristics such as smoothness, fitness and coarseness of certain pattern associated with the image. There are three methods used for feature extraction.
Fractal geometry: Fractals provide a measure of the complexity of the gray level structure in a certain pathology bearing region, having the property of selfsimilarity at different scales.

Fig. 1: The Block diagram for content based image retrieval of ultrasound liver diseases by hybrid approach
This feature plays an important role in ultrasound image modality. Every texture, characterized through the intensity I, can be represented as a reproduction of the copies of N basic elements: One of the ways to express the fractal dimension is the Hurst Coefficient.
First order statistics: The first order statistics of grey levels considered appropriate for liver tissue characterization in various pathological stages were the mean grey level, the maximum and minimum of the gray levels and the autocorrelation function, defined on a region of size N x N through the following expression: where, I is the intensity function of the ultrasound image. The spatial autocorrelation represents, actually, the correlation of a variable with the spatial localization of that variable. It measures the interdependence level between the values of that variable in different points in space and the strength of the interdependence. It is also characteristic for the texture granularity.
Gray level Co-occurrence matrix: Gray Level Cooccurrence Matrix (GLCM) has been used in texture extraction. Compared to an image gray level histogram, GLCM is a spatial correlation between a pair of two gray levels. Specifically, it can be used as inputs to the SOM for classification. Let invariably providing a good performance in classification problem. Compared to conventional neural network, the SVM has the advantage of being usable under different kernel functions and highly accurate classification based on parameter selection. SVM is originally a method for binary classification (Lee et al., 2002;Lee et al., 2003;Chang and Lin, 2001;Jain et al., 2000), however, in medical practice; the number of possible disease types is rarely restricted to two categories, called positive samples and negative samples. Combining feature extraction methods and SVM should thus be promising in distinguishing among ultra sound liver diseases. Appropriately selected features by CFS are then fed to the SVM classifier. SVM classifies both positive and negative samples based up on the input query.

Relevance feedback method: Relevance feedback with only positive samples:
When only positive samples from the user's feedback or when we consider only the relevant images, several schemes can be applied, making use of the information to improve retrieval accuracy as shown in the Fig. 3. In this specific context, the physician suggestion can easily interpret by means of feedback, add the newly obtained positive samples into the query set and return the algorithm to retrieve results (Lee et al., 2001). In this way, the vector y will have multiple non-zero components that will spread their ranking scores in the propagation process. And the sequence f (t) converges to: n 1 1 1 i 1 f * (1 as) y (1 aS) where, Yi is an n-dimensional vector with the it h component equal to 1 and others equal to 0 and n is the number of positive samples fed back by the physician. Therefore these samples will spread ranking scores independently and assign large value to images belonging to their corresponding neighborhood; the ultimate ranking score is the sum of these individual scores.
Relevance feedback with positive and negative samples: Due to the asymmetry between relevant and irrelevant images, they should be processed differently. For example, in Rocchio formula (Gletsos et al., 2003), the initial query is moved towards positive samples and away from negative samples by different degrees; in MEGA (Mustafa and Mostafa, 2003), positive samples are used to learn the target concept in k-CNF. While negative samples are used to learn a k-DNF that bounds the uncertain region; some researchers even come up with the idea of introducing different penalizing factors for positive and negative samples into the optimization problem of SVM. A deeper reason for this asymmetry is that relevant images tend to form certain clusters in the feature space, while irrelevant images occupy the remaining feature space.
To accommodate this asymmetry, in hybrid approach, both positive and negative samples spread their ranking scores differently. To speak concretely, we first define two vectors Y+ and y-. The element of the former one is set to 1 if the corresponding image is the query or a positive sample; while the element of the latter one is set to -1 if the corresponding image is a negative sample (Lee et al., 2003). All the other element of the two vectors is set to 0. Generally, positive samples should make more contribution to the final ranking score than negative samples. Secondly, we modify the neighborhood of a negative sample by changing the iteration value. It also controls the neighborhood size within which the points will have a big similarity value to the center point. The formula to propagating negative ranking scores is:

RESULTS AND DISCUSSION
The image datasets in the experiment are provided by GEM Hospital, Coimbatore and Several Radiological centre. Currently we did our work with 150 images of liver diseases; 80 liver cysts, 45 alcoholic cirrhosis and 25 carcinoma images. First all images are registered in to the database through intensity based image registration method. Then from the registered images, three important features have been collected namely first order statistics, gray level cooccurrence matrix and fractal geometry which are more relevant features for ultrasound modality images which shown in Fig. 2 It is known that certain features are appropriate for the classification of specific disease and other specific features are suitable for other diseases. Therefore, CFS algorithm, a feature subset selection technique, is used in our study. The goal of feature subset selection is to identify and select the most influential variables from a large pool of variables and further to reduces the feature space for classification.  The size of the images is 512×512 pixels and the images are saved at 12 bits per pixel gray level. The training set for the classification SVM algorithm is created by manually segmenting sample regions of four patterns: liver cyst, alcoholic cirrhosis, carcinoma and normal liver.
The input query images is feed into the proposed hybrid system, after image registration, feature extraction and feature selection, it will be stored in the database and compare with the liver images in the database pool and the resultant images will be retrieved which shown in Fig. 3. The future iteration will be preceded by the physician to select appropriate images from the retrieved images by applying relevance feedback technique. The result of the RF is categorized in two categories-positive samples and negative samples which shown in Fig. 4 and 5. The hybrid approach (SVM+RF) comprises several benefits when compared to existing CBIR for medical system by neural network-Back propagation networks algorithm which shown in Table 1.

CONCLUSION
A hybrid approach for retrieval of liver diseases has been successfully proposed in this study. The hybrid approach (SVM+RF) comprises several benefits when compared to existing CBIR for medical system. In general CBIR system, image registration is not necessary for retrieval of similar images. But in medical image retrieval, image registration plays an important role in retrieval. It reduces the redundancy of retrieval images and increases the response rate. We not only adopt the gray level co-occurrence matrices to extract features of the PBRs. Through the selection of significant features by CFS, the input spaces can be simplified, which is forwarded to the SVM as input. Moreover, Fractal based geometry in feature extraction are also useful as the additional features to differentiate liver cyst from carcinoma and alcoholic cirrhosis. Getting the Relevance feedback at each iteration from the physician helps to improve the accuracy of retrieval images from the database.