A Computer Aided Diagnosis System for Lung Cancer Detection \Using Support Vector Machine

: Problem statement: Computer Tomography (CT) has been considered as the most sensitive imaging technique for early detection of lung cancer. Approach: On the other hand, there is a requirement for automated methodology to make use of large amount of data obtained CT images. Computer Aided Diagnosis (CAD) can be used efficiently for early detection of Lung Cancer. Results: The usage of existing CAD system for early detection of lung cancer with the help of CT images has been unsatisfactory because of its low sensitivity and False Positive Rates (FPR). This study presents a CAD system which can automatically detect the lung cancer nodules with reduction in false positive rates. In this study, different image processing techniques are applied initially in order to obtain the lung region from the CT scan chest images. Then the segmentation is carried with the help of Fuzzy Possibility C Mean (FPCM) clustering algorithm. Conclusion/Recommendations: Finally for automatic detection of cancer nodules, Support Vector Machine (SVM) is used which helps in better classification of cancer nodules. The experimentation is conducted for the proposed technique by 1000 CT images collected from the reputed hospital.


INTRODUCTION
The lung cancer is considered as the notable cancer because it claims more than a million deaths every year.This lead to the requirement of lung nodule detection in chest Computer Tomography (CT) images (Armato et al., 2001) in advance.Thus the Computer Aided Diagnosis (CAD) (Yamamoto et al., 1996;Wiemker et al., 2002) system is very essential for early detection of lung cancer.Early finding of the disease is critical but the truth remains that only 20% of cases are detected in the first phase.Radiologists can miss up to 30% of lung nodules (which may develop into cancer) in chest radiographs due to the background anatomy of the lungs which can hide the nodules.CAD helps radiologists by performing preprocessing of the images and suggesting the most likely locations for nodules.Detection of lung nodules proceeds through techniques for suppressing the background structures in lungs which include the blood vessels, ribs and the bronchi.The images obtained will afford better chest structure which make good regions for nodule and can be further classified depending on characteristics like size, contrast and shapes.Simple rule based classifications on such features tend to produce a lot of false positives.
To overcome these problems, the author proposed a Computer Aided Diagnosing (CAD) (Ginneken et al., 2001) system for detection of lung nodules (Lin and Yan, 2002).This study initially apply the different image processing techniques such as Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms for extraction of lung region.Then for segmentation Fuzzy Possibility C Mean (FPCM) algorithm is used and for learning and classification Support Vector Machine (EVM) is used.Yamomoto et al. (1996) proposed image processing for computer-aided diagnosis of lung cancer by CT (LSCT).This study presents the image processing method for computer-aided diagnosis of lung cancer by CT (LSCT).LSCT is the recently developed mobile-type CT scanner for the mass screening of lung cancer.In this novel LSCT system, one important difficulty is the increase of image information to about 30 slices per person from 1 X-ray film.To overcome these problems, the author tried to minimize the image information significantly to be displayed for the doctor, by image processing algorithms.Yim et al. (2005) stated about Hybrid lung segmentation in chest CT images (Fiebich et al., 2001) for computer-aided diagnosis.The author proposes an automatic segmentation technique for accurately identifying lung surfaces in chest CT images.The proposed technique consists of three steps.Initially, lungs and airways are extracted by an inverse seeded region growing and connected component labeling.Next, trachea and large airways are delineated from the lungs by three-dimensional region growing.Then, accurate lung region borders are acquired by subtracting the result of the second step from that of the first step.The initial ANN carries out the detection of suspicious regions in a low-resolution image.The input supplied to the second ANN is the curvature peaks computed for all pixels in every suspicious region.This is determined from the fact that small tumors possess an identifiable signature in curvature-peak feature space, where curvature is the local curvature of the image data when viewed as a relief map.The result of this network is threshold at a selected level of significance to give a positive detection.Tests are carried out using 60 radiographs taken from a routine clinic with 90 real nodules and 288 simulated nodules.This study employed free-response receiver operating characteristics method with the mean number of False Positives (FP's) and the sensitivity as performance indexes to evaluate all the simulation results.The grouping of the two networks provide results of 89%-96% sensitivity and 5-7 FP's/image, depending on the size of the nodules (Gurcan et al., 2002).Kanazawa et al. (1996) described Computer aided diagnosis system for lung cancer based on helical CT images.In this study, the author describes a computer assisted automatic diagnosis system (Hara et al., 1999) for lung cancer that detects tumor candidates at an early stage from helical Computerized Tomographic (CT) images.This mechanization of the process decreases the time complexity and increases the diagnosis confidence.The proposed algorithm consists of an analysis part and a diagnosis part.In the analysis part, this study extracts the lung and pulmonary blood vessel regions and analyzes the features of these regions using image processing techniques.In the diagnosis part, this study defines diagnosis rules based on these features and detect tumor candidates using these rules.The author has applied the proposed algorithm to 450 patient's data for mass screening.The experimental results indicate that the proposed algorithm detected lung cancer candidates successfully.

Related work:
Yamamoto et al. (2000) explained Computer aided diagnosis system with functions to assist comparative reading for lung cancer based on helical CT image.The author have reported that a prototype Computer-Aided Diagnosis (CAD) system (Kanazawa et al., 1998) to automatically detect suspicious regions from chest CT images had been presented and the CT screening system used was a TCT-900 super helix of the Toshiba Corporation.In this study, the author proposes a new and automatic technique for an early diagnosis of lung cancer based on a CAD system in which all the CT images are read.In addition, the CAD system is equipped with functions to automatically detect suspicious regions from chest CT images and to assist the comparative reading in retrospect.The main purpose of the CAD system is that it uses a slice matching algorithm for comparison of each slice image of the present and past CT scans and an interface to display some features of the suspicious regions.The experimental results show that this CAD system can work effectively.Cheran and Gargano (2005) gave Computer aided diagnosis for lung CT using artificial life models.This study introduces a novel computer assisted detection method for lung cancer from CT images.The proposed technique is based on different algorithms like: 3D region growing, active contour and shape models, centre of maximal balls but it can be said that at the core of this approach are the biological models of ant's also known as artificial life models.In the initial step of the algorithm the images are undergoing a 3D region growing for identifying the ribcage.Once the ribcage is recognized, an active contour is used in order to build a confined area for the incoming ants that are deployed to make clean and accurate reconstruction of the bronchial and vascular tree.Then the branches of the recently reconstructed trees are checked to see whether they include nodules or not by using active shape models and also to see if there are any nodules attached to the pleura of the lungs (centre of maximal balls).The next process is to eliminate the trees in order to offer a cleaner algorithm for localizing the nodules which is achieved by applying snakes and dot enhancement algorithms.
A New CAD System for Early Diagnosis of Detected Lung Nodules is proposed by El-Baz et al. (2007).The growth rate is predictable by measuring the volumetric variation of the detected lung nodules over time, so it is important to accurately measure the volume of the nodules to quantify their growth rate over time.In this study, the author introduces a novel Computer Assisted Diagnosis (CAD) system for early diagnosis of lung cancer.Yogameena et al. (2010) proposed method classified the behavior such as running people in a crowded environment, bending down movement while most are walking or standing, a person carrying a long bar and a person waving hand in the crowd is classified.
The projected CAD system involves five main steps.They are Segmentation of lung tissues from Computed Tomography (CT) images, Identification of lung nodules from segmented lung tissues, A non-rigid registration technique to align two successive LDCT scans and to correct the motion artifacts caused by breathing and patient motion, Segmentation of the detected lung nodules and Quantification of the volumetric changes.
This preliminary categorization results based on the analysis of the growth rate of both benign and malignant nodules for 10 patients (6 patients diagnosed as malignant and 4 diagnosed as benign) were 100% for 95% confidence interval.The experimental results of the proposed image analysis have yielded promising results that would supplement the use of current technologies for diagnosing lung cancer.

MATERIALS AND METHODS
The initial stage of the proposed technique is lung region extraction using several image processing techniques.The second stage is segmentation (Armatur et al., 1992) of extracted lung region using Fuzzy Possibilistic C Mean (FPCM) algorithm.Then the diagnosis rules for detecting false positive regions are elaborated.Finally, Support Vector Machine (SVM) technique is applied in order to classify the cancer nodules.

Lung region extraction:
The initial stage of the proposed Computer Aided Diagnosing (CAD) (Wiemker et al., 2003;Wiemker et al., 2002) techniques is the extraction of lung region from the CT scan image.The basic image processing techniques are utilized for this purpose.The image processing techniques applied in the proposed technique are Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms.
Usually, the CT chest image not only contains the lung region, it also contains background, heart, liver and other organ areas.The main aim of this lung region extraction process is to detect the lung region and Regions of Interest (ROIs) from the CT scan image.
The first step in lung region extraction is application of bit plane slicing algorithm to the CT scan image.The different binary slices will be resulted from this algorithm.The best suitable slice with better accuracy and sharpness is chosen for the further enhancement of lung region.
The next is application of Erosion algorithm which enhances the sliced image by reducing the noise from the image.Then dilation and median filters are applied to the enhanced image for further improvement of the image from other distortion.Outlining algorithm is then applied to the noise reduced images to determine the outline of the regions.The lung region border is then obtained by applying the lung border extraction technique.
Finally, flood fill algorithm is applied to fill the obtained lung border with the lung region.After applying these algorithms, the lung region is extracted from the CT scan image.This obtained lung region is further used for segmentation in order to detect the cancer nodule.

Lung regions segmentation:
After the lung region is detected, the next process is segmentation of lung region in order to find the cancer nodules.This step will identify the Region Of Interest (ROIs) which helps in determining the cancer region.In this study, Fuzzy Possibilistic C Mean (Gomathi and Thangaraj, 2010) (FPCM) is implemented for segmentation.

Fuzzy Possibility C Mean (FPCM):
FPCM is a clustering algorithm that combines the characteristics of both fuzzy and possibility c-means.Memberships and typicality's are important for the correct feature of data substructure in clustering problem.Thus, an objective function in the FPCM depending on both memberships and typicality's can be shown as: With the following constraints:

∑ ∑
A solution of the objective function can be obtained via an iterative process where the degrees of membership, typicality and the cluster centers are updated via: FPCM produces memberships and possibilities simultaneously, along with the usual point prototypes or cluster centers for each cluster.FPCM is a hybridization of Possibility C-Means (PCM) and Fuzzy C-Means (FCM) that often avoids various problems.

Features extraction and formulation of diagnostic rules:
After the segmentation is performed on lung region, the features can be obtained from it and the diagnosis rule can be designed to exactly detect the cancer nodules in the lungs.This diagnosis rules can eliminate the false detection of cancer nodules resulted in segmentation and provides better diagnosis.

Feature extraction:
The features that are used in this study in order to generate diagnosis rules are: • Area of the candidate region • Mean intensity value of the candidate region • Area of the candidate region This feature can be used here in order to: • Eliminate isolated pixels • Eliminate very small candidate object With the help of this feature, the detected regions that do not have the chance to form cancer nodule are detected and can be eliminated.This helps in reducing the processing in further steps and also reduces the time taken by further steps.

Mean intensity value of the candidate region:
In this feature, the mean intensity value for the candidate region is calculated which helps in rejecting the further regions which does not indicate cancer nodule.The mean intensity value indicates the average intensity value of all the pixels that belong to the same region and is calculated using the formula: where, j characterizes the region index and ranges from 1 to the total number of candidate regions in the whole image.Intensity (i) indicates the CT intensity value of pixel I and i ranges from 1-n, where n is the total number of pixels belonging to region j.

Formulation of diagnostic rules:
After the necessary features are extracted, the following diagnosis rules can be applied to detect the occurrence of cancer nodule.
There are three rules which are involved are as follows: Rule 1: Initially the threshold value T1 is set for area of region.If the area of candidate region exceeds the threshold value, then it is eliminated for further consideration.This rule will help us in reducing the steps and time required for the upcoming steps.Rule 2: In this, the range of value T3 and T4 are set as threshold for the mean intensity value of candidate region.Then the mean intensity values for the candidate regions are calculated.
If the mean intensity value of candidate region goes below minimum threshold or goes beyond maximum threshold, then that region is assumed as non cancerous region.
By implementing all the above rules, the maximum regions which are not considered as cancerous nodules are eliminated.The remaining candidate regions are considered as cancerous regions.This CAD system helps in neglecting all the false positive cancer regions and helps in detecting the cancer regions more accurately.These rules can be passed to the Support Vector Machine (SVM) in order to detect the cancer nodules for the supplied Lung image.

Classification:
Support Vector Machine (SVM): SVM is usually used for classification tasks introduced by Cortes.For binary classification SVM is used to find an Optimal Separating Hyper plane (OSH) which generates a maximum margin between two categories of data.To construct an OSH, SVM maps data into a higher dimensional feature space.SVM performs this nonlinear mapping by using a kernel function.Then, SVM constructs a linear OSH between two categories of data in the higher feature space.Data vectors which are nearest to the OSH in the higher feature space are called Support Vectors (SVs) and contain all information required for classification.In brief, the theory of SVM is as follows.
Consider training set with each input n i x ∈ R n and an associated output y i ∈{ -1, +1}.Each input x is firstly mapped into a higher dimension feature space F, by z = φ (x) via a nonlinear mapping φ: R n →F.When data are linearly non-separable in F, there exists a vector w ∈ F and a scalar b which define the separating hyper plane as: where, ξ(≥0) are called slack variable.The hyper plane that optimally separates the data in F is one that: 1 M in imise .w.w c. 2 where, C is called regularization parameter that determines the tradeoff between maximum margin and minimum classification error.By constructing a Lagrangian, the optimal hyper plane according to previous equation, may be shown as the solution of: a a a y y K(x , x ) 2 where, a 1 ,…..,a L are the nonnegative Lagrangian multipliers.The data points I, x that correspond to a i >0 are SVs.The weight vector w is then given by: For any test vector x ∈ Rn, the classification output is then given by:

∑
To build an SVM classifier, a kernel function and its parameters need to be chosen.So far, no analytical or empirical studies have established the superiority of one kernel over another conclusively.In this study, the following three kernel functions have been applied to build SVM classifiers: Linear kernel function, K(x, z) =x, z; Polynomial kernel function K(x, z) =(x, z +1) d is the degree of polynomial; Radial basis function

∑
Where x+ is defined as the truncated power function: Linear kernel: The Linear kernel is the simplest kernel function.It is given by the inner product <x,y> in addition with an optional constant c.Kernel algorithms using a linear kernel are often equivalent to their nonkernel counterparts: Polynomial kernel: The Polynomial kernel is a nonstationary kernel.Polynomial kernels are apt for problems where all the training data is normalized: Modifiable parameters are the slope alpha, the constant term c and the polynomial degree d.
After the learning process is completed by providing several conditions, the proposed technique is able to detect the cancer occurrence in the lung region automatically.

RESULTS AND DISCUSSION
The experiments are conducted on the proposed computer-aided diagnosis systems with the help of lung images obtained from the reputed hospital.This experimentation data consists of 1000 lung images.Those 1000 lung images are passed to the proposed CAD system.The diagnosis rules are then generated from those images and these rules are passed to the Support Vector Machine (SVM) for the learning process.After learning, a lung image is passed to the proposed CAD system.Then the proposed system will process through its processing steps and finally it will detect whether the supplied lung image is with cancer or not.The number of slices obtained for the dataset is 2441 from which best suited slice is chosen for further proceedings.The number of cancerous nodule in the dataset is 15 and 7 nodules are less than 2mm size.The proposed technique detects 9 cancer nodules correctly.The false positive region detected by the proposed CAD system is 117.This result is better detection when compared to the conventional CAD system.

CONCLUSION
This study presents the better Computer Aided Diagnosing (CAD) system for automatic detection of lung cancer.The initial process is lung region detection by applying basic image processing techniques such as Bit-Plane Slicing, Erosion, Median Filter, Dilation, Outlining, Lung Border Extraction and Flood-Fill algorithms to the CT scan images.After the lung region is detected, the segmentation is carried out with the help of Fuzzy Possibilistic C Mean (FPCM) clustering algorithm.With these, the features are extracted and the diagnosis rules are generated.These rules are then used for learning with the help of Support Vector Machine (SVM).The experimentation is performed with 1000 images obtained from the reputed hospital.The e xperimental result shows that the proposed CAD system can able to detect the false positive nodules correctly.Also the usage of Support Vector Machine will increase the accuracy of classification the cancer nodules.
w.z b) sign( a y k(x , x) b) = + = + of the function.SVM kernel functions:The classification ability of feature combinations in gait applications is obtained with first attempt work of SVM kernel function.The three main kernel functions are used for our study here.Partial kernel function, influence to data near test points.The above mentioned kernel functions are briefly explained in this chapter.The most used kernel function for SVM is Radial Basis Function (RBF).Radial basis function kernel:The B-Spline kernel is defined on the interval [-1, 1].It is given by the recursive formulan can be computed using the explicit expression: n