A Novel Approach in Malignancy Detection of Computer Aided Diagnosis

: Problem statement: Breast cancer is one of the most dangerous diseases that cause innumerable fatal in the female society. Early detection is the only way to reduce the mortality. Due to variety of factors sometimes manual reading of mammogram results in misdiagnosis. So that the diagnosis rate varies from 65-85%. Various computer aided detection techniques have been proposed for the past 20 years. Even then the detection rate is still not high. Approach: The proposed method consists of the following steps preprocessing, segmentation, feature extraction and classification. Noise, Artifact and pectoral region are removed in a preprocessing step. Contrast enhancement and Sobel operator with segmentation algorithm is used to segment the mass region. Feature extraction is performed on the segmented image using gray level co-occurrence matrix and local binary pattern method. Extracted features are classified using support vector machine. The performance of the proposed system is evaluated using partest method. Results: Proposed algorithm shows 98.8% sensitivity and 97.4% Specificity. Conclusion: The proposed algorithm is fully automatic and will be helpful in assisting the radiologists to detect the malignancy efficiently.


INTRODUCTION
Cancer has become the most dangerous threat to the human mankind in the past two decades. Proper treatment to completely cure the disease is still not available. According to the World Health Organization 7.6 million deaths occur due to the cancer in 2005. The growth of tumor larger than 2mm in every three months refers to cancer. It multiplies out of control and spreads to other parts of the body. It also destroys the healthy tissue. Cancer is generally named after the part of the body where it started.
Breast cancer is in the second place in causing the cancerous death particularly for female society. According to Tata memorial hospital breast cancer has been reported to occur in 1 woman out of 1000 during 1970's. But according to the recent studies it occurs in 1 woman out of 10, which shows the importance of taking preventive steps against breast cancer. At present vaccination is available for some kind of cancers such as lung cancer and cervical cancer. The root cause of breast cancer is still unknown; hence the proper preventive measures are absent. But if it is detected in the early stage then the survival rate of the patient can be improved by 95% Mass and microcalcification are the two confusing signs present in the mammogram. Microcalcification has been just the collection of calcium cells. Mass is the cancerous tissue with different shapes and boundaries. Another confusing terms are benign and malignant. Benign is just the growth of non cancerous tumor. Malignant refers to the cancerous tumor growth.
Many other techniques are available such as MRT, CT, Ultrasonic to detect the cancer cell. But the accordance rate between these instruments and histopathology feature is low. But between mammography and histopathology diagnosis the rate is quite high. Mammography is clinical gold standard for early detection. It is inexpensive and works fairly well. Another advantage of digital mammography is it stores the result as computer code, which allows to perform the enhancement for efficient diagnosis (Gonzalez and Woods, 2002;Hu et al., 2011;Jain, 1989;Liu et al., 2009;Mencamttini et al., 2008;Parthiban and Subramanian, 2009;Polat and Gunes, 2007;Zheng, 2010).
At present mammogram readings are performed by radiologists. But due to a variety of factors such as poor quality of image, benign appearance of lesions and eye fatigue factor the performance of radiologists greatly affected. To overcome this problem many computers Aided detection techniques have been developed. Even then the detection rate is still not high due to the high variance in shape and size of tumor and disturbance occur due to fatty tissues, vein and glands. The standard general algorithm that can produce good results for all kinds of mammograms is still not available. Hence, much more work need to be done to develop more efficient and effective CAD system.
Many attempts have been made to use fuzzy logic, neural network and genetic algorithm to improve the diagnostic efficiency in cancer detection. Morphological operation and seeded region growing method used to segment the pectoral muscles (Nagi et al., 2010) Contrast Limited Adaptive Histogram Equalization (CLAHE) and multiscale contrast enhancement algorithm are some of the effective methods of enhancing the mammogram (Khuzi et al., 2009). Morphological component analysis was designed to detect the suspicious region (Gao et al., 2010). A hybrid technique that incorporates seeded region growing with ASB algorithm is designed to isolate benign and malignant region in the breast tissue (Maitra et al., 2011). In a comparative study it is found out that Sobel produces comparatively better result than Prewitt and kirsch edge operator (Kekre et al., 2010). Gray Level Co-occurrence Matrix (GLCM) is generally used for extracting features but it was used to identify the mass region in mammograms (Khuzi et al., 2009). Various features like GLCM, Intensity histogram features are used for breast cancer diagnosis. In a comparative study it is proved that GLCM outperformed other methods (Nithya and Santhi, 2011). Support Vector Machine with different kernels can produce better result than other existing classifiers (Hussain et al., 2011). Preprocessing: Mammogram is the medical image which has a lot of noises like digitization noise, noise occurred during the image capturing process. Hence it is difficult to interpret it. In the proposed method median filter is used to remove the noise. In 2D Median filtering process each output pixel will be replaced with Median value of 3X3 kernel in the input image. Median filter provides comparatively better results than other noise removal process. Edges are the most important factor in the segmentation process. Median filter can remove the noises without disturbing the edges, which is one more advantage of using a median filter. Figure 1 shows the raw mammogram and the result of Median filtering process is shown in Fig. 2.

MATERIALS AND METHODS
Artifact removal: Raw mammogram image contains wedges and labels. These may produce unnecessary disturbances during mass detection process. Hence it should be removed in preprocessing. In the proposed method thresholding and morphological opening, closing, dilation and erosion are used to remove these artifacts.Median filtered gray scale image should be converted to binary image by thresholding it to the value T = 18. Using Matlab function area of the regions can be calculated. From the result, larger area should be extracted neglecting the small areas. Due to the above process smaller artifact regions can be easily removed from the mammogram. Morphological erosion and dilation is applied using structuring elements. Finally the holes produced in the binary image are filled using 'imfill' function in Matlab. Resulting image is multiplied with gray scale image. Mammogram after artifact removal process is shown in Fig. 3.

Pectoral region removal:
The pectoral is the term relating to the chest. It is a large fan shaped muscle that covers much of the front upper chest. Hence during the mammogram capturing process pectoral muscle also would be captured. The pectoral muscle represents a predominant density region. Hence it will severely affect the result of image processing. For better detection accuracy pectoral region should be removed from mammogram image. The orientation of the breast should be found out to remove the pectoral region. To perform this, binary image of the breast should be cropped such that it touches all the four borders of the window. The sum of the first and last five columns are calculated from the cropped image. By using simple ifelse operation breast orientation can be calculated as follows. If Sum first 5 > Sum last5 then the breast is right oriented else it is left oriented. The contrast of the grayscale mammogram is enhanced using adaptive histogram equalization method. After breast orientation determination and contrast enhancement process seed is placed in the pectoral muscle of the grayscale mammogram image. If the breast is left oriented then the seed will be placed in the last row and last column. If the breast is right oriented then the seed will be placed in the first row and first column. Seed will traverse to all unvisited neighbor pixels. The difference between the pixel intensity and intensity mean value is calculated. If the difference is small then the pixel has allocated to the respective region. When the intensity difference is larger than the threshold value then the process will be stopped. The result of the seeded region growing method is subtracted from the binary image of original mammogram. The resulting image will be multiplied with a gray scale image to get the pectoral segmented mammogram image which is shown in Fig. 4.

Region of interest extraction:
The interested region is extracted to reduce the processing time. Normally masses appear as whiter region in the mammogram. There is no possibility of detecting the mass region from the darker region of mammogram, hence the darker region can be simply ignored during processing. Thus the process of extracting the brighter region alone neglecting the darker region is called as Region of Interest Extraction.
Mammogram image (Ms) is mapped onto a special target image (Mt). Then mapped image is binarized with a threshold value. The object which contains more than 8 pixels as connected is considered as binary mask. Holes inside the region of interest is filled using Morphological closing operation. Region of interest is extracted using Eq. 1: where, µ and σ denotes mean and standard deviation.
S and T refer to the Source and Target image. Extracted Region of Interest image is shown in Fig. 5.

Contrast enhancement:
Contrast enhancement is performed using contrast limited adaptive histogram equalization method. The tiles are the smaller segmented regions in the image. CLAHE works on these tile regions. It adds intensity values over different segments by using an adaptive histogram equalization method. Contrast of each pixel relative to its local neighborhood is adaptively enhanced during this process. As a result, improved contrast will be produced for all levels in the image. CLAHE also helps to reduce the noise produced in homogenous area. Figure 6 shows the result of Contrast enhancement process. Edge detection: Edge detection is useful process in understanding the image features. Edges occur in image boundaries, hence edge detection is a very supportive process for image segmentation. Particularly in mammogram segmentation it helps to enhance the tumor area. Sobel operator has been proved to produce be better than any other methods such as prewitt, Kirsch and watershed algorithm. 2D spatial gradient measurement is performed on the image. It emphasizes on the high spatial frequency regions and finds the absolute gradient magnitude of all the pixels in the input image. 3X3 convolution kernels shown in Fig. 7 is used in the edge detection process.
Two arrays of numbers of different size are multiplied together in the convolution process. Kernel shown in Fig. 7 is convoluted over the image. The output pixel is calculated by multiplying together the kernel value and input image pixel value of each cell in the kernel. Let i , j implies row and column of kernel. I, J implies row and column of the image. Then the output image will have M rows (I-i+1) and N columns (J-j+1). Convolution operation can be mathematically written as shown in Eq. 2: where, x runs from 1 to M and y runs from 1 to N.
Derivatives in x and y direction are given as Eq. 3 and 4: G X = {K(x+1, y-1) + 2K(x+1,y)+ K(x+1, y+1)} -{K(x-1, y-1) + 2K(x-1,y)+ K(x-1, y+1)} Generally the size of the gradient is Eq. 5: By applying the Sobel operator S x and S Y row wise of gradient matrix can be obtained as the original image. Total gradient value can be obtained by using Eq.5. If G (X, Y) > Threshold value then it means that pixel is the edge pixel. If G (X, Y) < Threshold value then the pixel is not on the edge of the image. The algorithm is not only identifies the presence of edges but also identifies the direction of the edge. Result of an edge detection process is shown in Fig. 8. Segmentation algorithm: G (x) and G (y) calculated using Eq. 3 and 4 respectively. By fixing the minimum and maximum value, pixel value will be set in the image: • max = 0 and min =0 • Y value varies from 1 to height of the input image • X value varies from 1 to width of input image • If G (x, y) > max then max = G (x, y) • If G (x, y) < max then min = G (x, y) Output value obtained by following the three steps given below: • For y → to height of input image • For x →to width of input image • Value should be calculated by using Eq. 6: Statistics calculated between two neighbor pixel gives the second order relationship. Gray level Co-occurrence Matrix used in the proposed method comes under the second order texture measure. Segmentation output value is given as input to GLCM. GLCM is the probability of gray level occurrence of the two pixels i and j with a defined spatial relationship with an image. Distance d and angle θ are used for spatial relationship definition. GLCM is constructed at distance d = 1, 2, 3, 4 and four angles θ = 0°, 45°, 90° and 135°. The texture is course if the two points i and j have same gray values. If the texture is fine then the points will have different gray levels. Using GLCM features like contrast, energy, homogeneity and correlation can be derived using Eq. 7-10. Contrast is the contrast between a pixel and its neighbor. Energy is the sum of squared elements in GLCM or uniformity. Homogeneity is the closeness of the distribution of elements in GLCM. Correlation shows how correlated a pixel is to its neighbor over the whole image Eq. 7: σ 2 The variance of the intensities is calculated using Eq. 12: The feature extraction process by Local Binary Pattern: Local binary pattern is a type of feature used for classification. It thresholds the neighborhood of each pixel and considers the result as binary numbers. Based on the binary value it labels each pixel. LBP is not affected by monotonic gray scale changes of the image. Again Segmentation output value is given as input to LBP separately. In the kernel center pixel value is compared with the neighboring pixel to calculate the threshold value. The weights of each pixel are multiplied with threshold value and summed up to produce the LBP code. An orthogonal measure of local contrast is used to implement the LBP. The difference between the average of gray levels below the center pixel and gray levels above the center pixel is calculated. Two dimensional distributions LBP and local contrast measures are used as features.
Classification: Classifiers used to diagnose medical data shortly and more clearly. Support Vector Machine produces comparatively better result than any other classifiers. Results obtained from both GLCM matrix and Local Binary Pattern are given as input data to SVM classifier. Based on the statistical learning theory SVM classifies the given input data into two separable classes {1, -1}. SVM uses the separating hyper plane to classify the classes. Fig. 9: Flow chart of the proposed method Training data are given as input to SVM classifiers. It consists of N datum (x 1, y 1 )……. (x i, y i) ), x Є R i ,y Є {1, −1} Eq. 13: The inequality yi (w * xi ) + w0 ≥1 is produced for both y=1 and y=-1.
Hyper planes are performed using Eq. 14 as follows: If the data points satisfy the above inequality condition then they form support vectors. Classification process is performed based on the support vectors. Margins of hyper plane obey the inequality shown in Eq. 15: yk * D(xy) / y , k 1,2....n ≥ Γ = We can maximize the margin by minimizing w by using Eq. 16: In the case of non separable data slack variable ξi is added using Eq. 17 as follows: In the case of nonlinear data, nonlinear input should be converted to high dimensional linear feature via kernels. In the proposed method RBF kernels are used which has given I Eq. 18: Where, σ is a positive real number.
The summarization of the overall process in the proposed method is given as a flowchart in Fig. 9.

RESULTS
Experiments are conducted on the images taken from both MIAS and DDSM database. 400 mammograms have taken for experiments in which 150 are normal and 250 are abnormal. 50% of images are used for training and 50% of the images are used for the testing phase. Samples of the results for the benign case is shown in Figure form    The sample results of the Feature Extraction process for 20 mammograms are shown in Table 1. Results of 10 benign and 10 malignant mammograms are taken for the sample. The classification result obtained from SVM classifier is shown in Table 2. All 400 mammograms were taken into account for the classification process.

DISCUSSION
The perfect test method is one of the methods in ROC curve method. The perfect test method is used to evaluate the performance of the designed algorithm. The result obtained from the classification process is given as input to partest of the ROC curve method. The output is shown as graphical representation in Fig. 18. The performance measurement test also measures the sensitivity and specificity of the proposed method.

CONCLUSION
In the proposed work a new approach has been designed to develop the computer aided diagnosis of breast cancer. It is fully automatic and does not need any human interruption. Preprocessed image is enhanced and segmented using a Sobel operator with the proposed segmentation algorithm. Texture of segmented image is extracted using Gray Level Cooccurrence matrix and Local Binary Pattern method. Extracted features are classified using a support vector machine. The performance of the proposed method is evaluated using the purest method of ROC curves. The result shows 98.8% sensitivity and 97.4% specificity which is comparatively better than the previous results. Hence, the proposed method would be most helpful in assisting the radiologists in the early and accurate detection of malignancy efficiently.