A ROBUST APPROACH TO CLASSIFY MICROCALCIFICATION IN DIGITAL MAMMOGRAMS USING CONTOURLET TRANSFORM AND SUPPORT VECTOR MACHINE

Mammogram is the best available radiographic method to detect breast cancer in the early stage. However detecting a microcalcification clusters in the early stage is a tough task for the radiologist. Herein we present a novel approach for classifying microcalcification in digital mammograms using Nonsubsampled Contourlet Transform (NSCT) and Support Vector Machine (SVM). The classification of microcalcification is achieved by extracting the microcalcification features from the Contourlet coefficients of the image and the outcomes are used as an input to the SVM for classification. The system classifies the mammogram images as normal or abnormal and the abnormal severity as benign or malignant. The evaluation of the system is carried on using Mammography Image Analysis Society (MIAS) database. The experimental result shows that the proposed method provides improved classification rate


INTRODUCTION
Breast cancer is presently one of the leading causes of death in the world. Mammography is the most common procedure to detect non-palpable cancers. A mammogram is an X-ray system to examine the breast (breast X-ray). Among the various radiographic indications related to the breast cancer microcalcification clusters play a vital role because they are present in 30-50% of all cancers identified mammographically. The diagnosis result of tissue is classified into three categories: Normal, benign and malignant. Normal represents mammogram without any cancerous cell, benign represents mammogram showing a tumor, but not produced by cancerous cells and malignant represents mammogram showing a tumor with cancerous cells. It is difficult to distinguish a benign microcalcification from malignant. Stylianos et al. (2010) proposed an algorithm for the classification of mammograms based on breast density estimation and detection of asymmetry. Support vector machines are employed for classification. Dheeba and Selvi (2011) proposed an algorithm for the classification of microcalcification in digital mammograms using Support Vector machine. To improve the classification rate Law's texture energy measures are taken from the image Region of Interest (ROI). Tirtajaya and Santika (2010) presented the use of dual-tree complex wavelet transform as feature extraction technique and SVM as classifier. Wang et al. (2010) applied three approaches for classifying the microcalcification in the mammograms which includes feature selection using a neural classifier, a clustering criterion and a combined scheme. To evaluate the performance of these feature selection approaches, same Science Publications AJEAS neural classifier is applied on the selected features and the classification results are then compared. De Melo et al. (2010), proposed to identify a set of features that allows for making the best automatic classification. Groups with different numbers of features are generated using the scalar feature selection. Fisher's discriminant ratio and the area under receiver operating curve are used as auxiliary distance measurements. For classification purposes, different architectures of feed forward neural networks are employed. Huddin et al. (2011), presented a new method to extract features to classify the microcalcification clusters using steerable pyramid decomposition. The method is motivated by the fact that microcalcification clusters can be of arbitrary sizes and orientations. Thus, it is important to extract the features in all possible orientations to capture most of the distinguishing information for classification. Ma et al. (2010), proposed a shape analysis method to aid radiologists in classifying regions of interest that are difficult to diagnosis. A region growing and a gradient vector flow methods are used to obtain an ordered set of contour points of each microcalcification. A three level wavelet transform frequency analysis provides a band pass approximation of the normalized distance signature. A novel metric derived from the normalized distance signature is proposed to quantify the roughness of a microcalcification. Eltoukhy et al. (2010) presented an approach for breast cancer diagnosis in digital mammogram using curvelet transform. After decomposing the mammogram images in curvelet basis, a special set of the biggest coefficients is extracted as feature vector. The Euclidean distance is then used to construct a supervised classifier. Ramos et al. (2012), evaluated the texture classification using features derived from co-occurrence matrices, wavelet and ridgelet transforms of mammographic images. A false positive reduction in computer-aided detection of masses is also proposed. The data set consisted of 120 cranio-caudal mammograms, half containing a mass, rated as abnormal images and half with no lesions. The following texture descriptors are then calculated to analyze the regions of interest texture patterns: Entropy, energy, sum average, sum variance and cluster tendency. Barjoei and Bahadorzadeh (2012), proposed the method of wavelet thresholding for denoising medical images. The idea is to transform the data into the wavelet basis, in which the large coefficients are mainly the signal and the smaller ones represent the noise. Fuzzy rough feature selection with Π Membership Function is proposed by Thangavel and Roselin (2012), for classifying the mammogram. The Selected features are used to classify the abnormalities with help of Ant Miner and Weka tools.
The main goal of this study is to develop a better CAD technique for classification of microcalcification in digital mammograms using Contourlet transform and Support vector Machine. First, the features are extracted from the Contourlet coefficients which represent the unit of classification. Second, the mammogram images are classified by using Support Vector Machine (SVM). The purpose of the system is to determine the abnormal severity in the micro calcification as benign or malignant.

MATERIALS AND METHODS
The proposed system is built based on Contourlet transform of the image and by applying SVM for building the classifiers. The theoretical background of both the approaches are introduced.

Non Sub Sampled Contourlet Transform
The Contourlet transform is an extension of the wavelet transform which uses multi scale and directional filter banks. Here images are oriented at various directions in multiple scales, with flexible aspect ratios. The Contourlet transform effectively captures smooth contours images which are the dominant feature in natural images. The main difference between Contourlet and other multi scale directional systems is that the Contourlet transform allows for different and flexible number of directions at each scale, while achieving nearly critical sampling. In addition, the Contourlet transform uses iterated filter banks, which makes it computationally efficient. The Contourlet transform (Do and Vetterli, 2004) is a multidirectional and multi scale transform that is constructed by combining the Laplacian pyramid (Burt and Adelson, 1983;Do and Petteril, 2003) with the Directional Filter Bank (DFB) proposed in (Do and Petteril, 2003). Due to down samplers and up samplers present in both the Laplacian pyramid and the DFB, the Contourlet transform is not shift-invariant. Figure 1a displays an overview of the NSCT (Do and Vetterli, 2004). The structure consists in a bank of filters that splits the 2-D frequency plane in the sub bands illustrated in Fig. 1b. This transform can thus be divided into two shift-invariant parts: (1) a Nonsubsampled pyramid structure that ensures the multi scale property and (2) a Nonsubsampled DFB structure that gives directionality.

AJEAS
The multi scale property of the NSCT is obtained from a shift-invariant filtering structure that achieves sub band decomposition similar to that of the Laplacian pyramid. This is achieved by using two-channel non sub sampled 2-D filter banks. Figure 2a and b illustrates the Nonsubsampled pyramid (NSP) decomposition with J = 3 stages. The ideal pass band support of the low-pass filter at the j th stage is the region [-(∏ /2 j ), (∏ /2 j )] 2 . Accordingly, the ideal support of the equivalent high-pass filter is the complement of the low-pass, i.e., the region [(-∏ /2 j-1 ), (∏ /2 j-1 )] 2 \ [(-∏ /2 j ), (∏ /2 j )] 2 . The filters for subsequent stages are obtained by up sampling the filters of the first stage. This gives the multi scale property without the need for additional filter design. This structure is thus different from the separable Nonsubsampled Wavelet Transform (NSWT). In particular, one band pass image is produced at each stage resulting in J+1 redundancy. By contrast, the NSWT produces three directional images at each stage, resulting in 3J+1 redundancy.
Non sub sampled Directional Filter Bank (NSDFB): The directional filter bank of Bamberger and Smith (1992) and Arthur et al. (2006) is constructed by combining critically-sampled two-channel fan filter banks and re sampling operations. The result is a treestructured filter bank that splits the 2-D frequency plane into directional wedges. A shift-invariant directional expansion is obtained with a non sub sampled DFB (NSDFB). The NSDFB is constructed by eliminating the down samplers and up samplers in the DFB. This is done by switching off the down samplers/up samplers in each two-channel filter bank in the DFB tree structure and up sampling the filters accordingly. This results in a tree composed of two-channel NSFBs as shown in Fig. 3a and b illustrates the four channel decomposition. The synthesis filter bank is obtained similarly. The NSCT is flexible in that it allows any number of directions in each scale. In particular, it can satisfy the anisotropic scaling law. This property is ensured by doubling the number of directions in the NSDFB expansion at every other scale. The NSCT is constructed by combining the NSP and the NSDFB as shown in Fig. 1a. Eight sub bands have been produced and some of the Nonsubsampled Contourlet coefficients.

Support Vector Machine
Support Vector Machines (SVMs) are a set of related supervised learning methods that analyze data and recognize patterns, used for classification and regression analysis. The standard SVM is a non-probabilistic binary linear classifier, i.e., it predicts, for each given input, which of two possible classes the input is a member of. A classification task usually involves with training and testing data which consists of some data instances. Each instance in the training set contains one "target value" (class labels) and several "attributes" (features) (Ireaneus et al., 2009). SVM has an extra advantage of automatic model selection in the sense that both the optimal number and locations of the basic functions is automatically obtained during training. The performance of SVM largely depends on the kernel (Dheeba and Selvi, 2011).
SVM is essentially a linear learning machine. For the input training sample set defined in Equation 1: The classification hyperplane equation is let to be in Equation 2: Thus the classification margin is 2/| ω|. To maximize the margin, that is to minimize |ω|, the optimal hyperplane problem is transformed to quadratic programming problem as follows in Equation 3: After introduction of Lagrange multiplier, the dual problem is given in Equation 4: According to Kuhn-Tucker rules, the optimal solution must satisfy in Equation 5: That is to say if the option solution is in Equation 6 and 7: Then: For every training sample point x i , there is a corresponding Lagrange multiplier and the sample points that are corresponding to a i = 0 do, so it is called support vectors. Hence the optimal hyperplane equation is given in Equation 8: The hard classifier is in Equation 9 then: For nonlinear situation, SVM constructs an optimal separating hyperplane in the high dimensional space by introducing kernel function K (x,y) = φ(x). φ(y), hence the nonlinear SVM is given in Equation 10: Science Publications

AJEAS
Thus the optimal hyperplane equation is determined by the solution to the optimal problem.

Proposed System
The proposed system mainly consists of three different stages which include the preprocessing stage, feature extraction stage and classification stage. All the stages are explained in detail in the following sub sections.

Preprocessing System
In the pre-processing stage, the undesired distortion is suppressed and enhancements of image features are carried out to improve the image data. The Preprocessing stage comprises of three sublevels as described in Fig. 4.

ROI Selection
The MIAS dataset, had very large images of size 1024×1024. Almost 50% of the whole image comprised of the background with a lot of noise. To eliminate the background information and the noise, ROI image of size 800×800 is cropped from the input image. The original image is shown in Fig. 5a and the cropped image is shown in Fig. 5b.

Global Gray Level Thresholding
In this stage upper threshold (240) and lower threshold (120) were selected. The pixels between the pre-selected upper-threshold and lower-thresholding of the gray level histogram is retained and all others are set to zero. To apply this technique upper and lower thresholds are pre determined to make sure that the region of interest pixels values are between these thresholds. It returns the intensity values of specified image pixels. The threshold image is shown in Fig. 6a.

Adaptive Histogram Equalization
Adaptive histogram equalization is a technique used to improve contrast in images. It differs from ordinary histogram equalization in the respect that the adaptive method computes several histograms, each corresponding to a distinct section of the image and uses them to redistribute the lightness values of the image. Ordinary histogram equalization simply uses a single histogram for an entire image. Adaptive histogram equalization is applied to the threshold image and the resulting equalized image is shown in Fig. 6b.

Feature Extraction
Feature extraction is an essential pre-processing step for pattern recognition and machine learning problems. It is often decomposed into feature construction and feature selection. In our approach, Contourlet coefficients are used as features to classify the mammogram images over DWT due to the following notable properties of NSCT: • NSCT is talented of capturing the directional edges of the image at different scale better than DWT. Hence NSCT posses the property of directionality i.e., having basis functions at many directions but wavelet posses only three directions • NSCT is more efficient in representing smooth contours in different directions of an image than Wavelet transform • Improving the representation sparsity of images over the wavelet transform • The key feature of NSCT is possible to efficiently handle 2D singularities i.e., Edges, unlike wavelets which can deal with point singularities • Another important property is the anisotropy meaning that the basis function shows at various aspect ratios (depending on the scale) whereas wavelets are separable functions and thus their aspect ratio equals to 1 The following section gives the overview of feature extraction of the digital mammogram. The Feature Extraction stage is shown in Fig. 7.

Contourlet Coefficients Extraction
The enhanced image is decomposed by using the NSCT at three different scales from 2, 3 and 4. For an R level NSCT, we have 2 R directional sub bands (W). The Contourlet coefficients of all the sub bands are used as feature vectors individually. These feature vectors are given to the SVM classifier as an input.

Normalization
Normalization is the process that changes the range of pixel intensities to a new range and is used to simplify the coefficient value. This is achieved by dividing each feature vector by its maximum value. The results of this operation is that all vectors values become less than or equal one. The normalization process is defined by the Equation 12:

AJEAS
where, NORM k is the normalized k th directional sub band and k ij w is the k th directional sub band coefficient at location (i,j),1≤k≤2 R and 2≤R≤4.

Energy Computation
We compute the energy for each vector by squaring every element in the vector. The produced values are considered as features for the classification process. The energy computation is defined by the Equation 13: where, K ENERGY is the energy of the k th directional sub band and k ij NORM is the normalized k th directional sub band coefficient at location (i,j), 1 ≤ k ≤ 2 R and 2 ≤ R ≤ 4.

Feature Reduction
The size of ROI image is 800×800 and it produces high number of coefficients. The Contourlet coefficients are stored in a two Dimensional (2D) array W k . To reduce the number of features by summing a predefined number of energy values together, the coefficients in 2D array is converted into 1D Array X k . In the proposed technique, summation of 100 and 1000 energy values per feature is used. The Feature reduction process is defined by the Equation 14: where, FEAT k is the reduced feature set of the k th directional sub band, k th is the 1D array that contains the energy of the k th directional sub band, k th is the energy of k th directional sub band at location (j), 1 ≤ k ≤ ≤ 2 R , 2 ≤ R ≤ 4, T = 100 or 1000, i = i + T, 1 < I < MN, M and N are width and height of the , k th directional sub band.

Classification Stage
The SVM classifier was built with two phases. In the first stage, the classifier is applied to classify mammograms into normal or abnormal categories. The mammogram is considered to be abnormal if it contains tumor (microcalcification). If abnormal the image enters the second stage where the abnormal mammogram is further classified into malignant or benign. The classification stage is shown in Fig. 8.

RESULTS AND DISCUSSION
To assess the performance of the proposed system, many computer simulations and experiments with mammogram images were performed. The system was implemented in MATLAB version 7.6. Figure 9 shows the screenshot of the proposed system. The recognition training and tests were run on a modern standard PC (1.66 GHz INTEL processor, 1 GB of RAM) running under Windows XP.
MIAS database is used to evaluate the proposed system. In MIAS database, there are 322 mammograms of left and right breast from 161 patients are available. Among 322 mammogram images, 25 mammograms contain microcalcification clusters. Sample microcalcification clusters in MIAS database (mdb219) is shown in Figure 10a and its magnified view is shown in Figure 10b. All the microcalcification images and 100 normal images are used in this study. The performance of the proposed approach for the classification of microcalcification is measured by classification accuracy. The classification accuracy is defined by Equation (15) The numbers of training and testing sets are shown in Table 1. The simulations are performed by summing 100 and 1000 Contourlet coefficients per feature and trained with the 2 stage SVM classifier. The results from the classifier are listed out in Table 2     The proposed classification algorithm based on NSCT and SVM is tested on all microcalcification images of the MIAS database. Mousa et al. (2005) proposed system based on wavelet analysis and fuzzyneural, the maximum classification rate obtained was 87.5%. Zyod and Abdel-Qader (2011) proposed a system using GLCM features with PSO-KNN feature selection method. The classification rate achieved was 88%. Lakshmi and Manoharan (2011) used a set of Jacobi moments with wavelet features, achieved 91.99% as the classification rate. Table 2 shows the successful rate of normal and abnormal classification at scale 2, 3 and 4 using 100 and 1000 features. The maximum successful classification rate is 98.5% achieved using 1000 features at scale 2. At scale 2, the accuracy rate of normal and abnormal classification was 99 and 96% with one case misclassified. Table 3 shows the successful classification rate of benign and malignant cases at scale 2, 3 and 4 using 100 and 1000 features. The maximum successful classification rate is 95.8% achieved using 100 features at scale 3 and 4. Among the combined scale features, the summation of 1000 features produces better results than 100 features for both cases. For normal and abnormal cases, the maximum classification rate obtained is 95.5% at scales 2-3 whereas for benign and malignant cases, it is 96.15% at scales 3-4 and 2-4.The graphical representation of our results is shown in Fig. 11-14.
It can be concluded that the maximum successful classification rate using wavelet (Mousa et al., 2005) was 87.5% obtained by the features extracted at the decomposition level 2-3. For NSCT, the maximum successful classification accuracy rate obtained is 96.15%. The experimental results prove that the efficacy of NSCT in mammogram analysis since the NSCT is able to capture the directional edges of the image at different scale better than DWT. The proposed method is focused on the classification of whole enhanced image as being either normal or abnormal (Benign or Malignant). Comparison of our methodology with other methods described in (Papadopoulos et al., 2005a;Yu and Huang, 2010a) is not straight forward because their classification is focused on each cluster as either normal or abnormal. In the proposed classification system, the maximum average classification rate of 96.15% is achieved for the combined scale features of 3-4 and 2-4 with the summation of 1000 features. The proposed method has achieved very admirable results when compared with the results obtained from the other method presented in (Papadopoulos et al., 2005b) with a classification rate of 83% for Artificial Neural Network classifier and 81% for SVM classifiers based on 33 statistical features. The method described in (Yu and Huang, 2010b) achieved a classification rate of 94% based on combined model-based and statistical textural features.

CONCLUSION
In this study we have presented an effective method for building a computer-aided diagnosing system for classification of abnormality in digital mammograms. We have developed and analyzed Contourlet transform for features extraction and support vector machine for classification process. The maximum accuracy rate of normal and abnormal classification is 98.5% at scale 2. The success rate of benign and malignant classification is 96.15% at combined scale features at scales 3-4 and 2-4. From the experimental results, it is concluded that the summation of 1000 features produces better results than 100 features. Our classification system produces very promising classification rate. The evaluation of the system is carried out on MIAS dataset. The future work is to extend the feature set for the detection of mass classification in digital mammograms.