A Novel Approach to Medical Image Segmentation

: Problem statement: Segmentation is a vital aspect of medical imaging. It aids in the visualization of medical data and diagnostics of various diseases. Ultrasound image segmentation, in particular echocardiographic image segmentation, is required to identify the regions of interest such as Left Ventricle (LV) and other cardiac cavities. Existing methods do not address the drawbacks of speed and quality of segmentation. A faster method is required for effective, accurate and scalable clinical analysis and diagnosis. Approach: In this research, a novel approach is used to segment the 2D echo images of various views. A modified K-Means clustering algorithm, called “Fast SQL K-Means” is proposed using the power of SQL in DBMS environment. In K-Means, Euclidean distance computation is the most time consuming process. However, here it computed with a single database table and no joins. This method takes less than 10 sec to cluster an image size of 400×250 (100K pixels), whereas the running time of direct K-Means is around 900 sec. Since the entire processing is done with database, additional overhead of import and export of data is not required. The 2D echo images are acquired from the local Cardiology Hospital for conducting the experiments. Results: The proposed algorithm was tested by considering a number of echo images in apical four chamber, long-axis and short axis views. We have compared the direct K-Means implementation with the proposed algorithm by varying the data size from 10-100K and found that the results outperformed compared to the results of other authors. The pattern of the data and the number of clusters had almost no impact on the clustering time. Conclusion: An efficient and non-traditional model for echo image segmentation is presented by using the SQL. Fast algorithms are required for immediate analysis of echo images within ICUs, remote places, telemedicine. The challenge is that ultrasound images are prone to speckle noise, segmented echo images carry gaps in the cardiac regions which in turn causes difficulties in boundary tracing and selection of seed values for the K-Means. Future research can enhance the speed by partitioning the database tables and use of parallel SQL statements.


INTRODUCTION
Medical image segmentation subdivides an image into its constituent regions or objects. Automatic LV segmentation is a difficult task due to the relatively poor quality (speckle noise) of echocardiography images (Lim and Goh, 2009). Many researchers have proposed algorithms in the past for image segmentation tasks (active contour or snakes), but all of them consume extensive computation time and suitable for natural images. One of the main objectives of our work is to improve the computational efficiency of the segmentation process and at the same time enhance the quality of the output. In particular segmentation of medical images is of prime importance even in mammograms (Abdallah et al., 2008) to identify ROI. Similarly in echo images the ROI is Left Ventricle. The current work is automatic segmentation meaning no prior information is required. However, semiautomatic Seeded Region Growing (SRG) based image segmentation is also used in the literatures (Tamilselvi and Thangaraj, 2011).
Our focus is on designing a simple, elegant yet robust algorithm that segments a cardiac image for extracting its clinically relevant features (Muda et al., 2010). For this purpose, K-Means clustering algorithm is selected which partitions a data set into several groups such that the intra cluster points are similar and the inter-cluster points are dissimilar. K-Means is ideally suitable for biomedical image segmentation since the number of clusters (k) is usually known for images of particular regions of human anatomy. Though K-Means has been shown to be effective in producing good clustering results, one of its main drawbacks is the poor time complexity: O(Inkd), where I is the number of iterations, n is the number of data points, k is the number of clusters and d is the number of dimensions (Nandagopalan et al., 2010a). Integrating K-Means algorithm and SQL has many advantages. The image data can easily be stored in relational DBMS and we can perform all computations faster in SQL. Since the resolution of the image is normally large, handling such huge data sets without the help of DBMS is a daunting task. However, with proper SQL join statements (Ossama, 2010;Ordonez and Pitchaimalai, 2010) it is possible to make it faster. Semantic learning based dominant foreground region can be extracted for CBIR applications as discussed in (Rajam and Valli, 2011).
The authors contribution in this research is that the conventional K-Means algorithm is implemented in SQL and achieved an upper bound of O (n log n). Since SQL is already designed with efficient algorithms it is obvious that the proposed design must produce accurate output. We have kept the table joins to minimum to avoid unnecessary time delay. This algorithm can be used both for 2D echo and color Doppler images. To achieve further enhancement in speed, for all table updates a novel idea of TRUNCATE-INSERT combination is used (Nandagopalan et al., 2010b).
K-Means algorithm was successfully used for biomedical image segmentation using adaptive techniques and morphological operations. Muda et al. have applied K-Means algorithm to Intrusion Detection Systems (IDS). Three algorithms were proposed by Carlos Ordonez using DBMS SQL and C++ and demonstrated how K-Means can be of practical importance for clustering large data sets (Jaradat et al., 2009). Another approach to speed up the (Patel and Sinha, 2010) K-Means was based on k-d tree structure.

MATERIALS AND METHODS
Standard K-Means clustering algorithm: Generally the input to K-Means algorithm is the number of clusters (k) and is decided by the user depending upon the problem domain. This algorithm works like this: first it randomly selects k of the objects, each of which initially represents a cluster mean. For each of the remaining objects, an object is assigned to the cluster to which is the most similar. This is done based on the Euclidean distance between the object and the cluster mean. It then computes the new mean for each cluster and the process iterates until the criterion function converges. The quality of clustering is determined by the following error function: Where: E = The sum of the square error for all objects in the data set D p = The object m i = The mean of cluster C i Given an initial set of K-Means m 1 (1) ,…,m k (1) which may be specified randomly. Assign each observation to the cluster with the closest mean by: || for i* = 1,…, k} Calculate the new means to be the centroid of the observations in the cluster: It is obvious that the conventional K-Means algorithm (Nandagopalan et al., 2010a) for clustering n data objects to C j clusters is not efficient.
The input for K-Means is a data set D containing n points with d dimensions, D = {i 1 , i 2 , i 3 , .., i n }. For our case, we shall assume that k = 3, because typically an echo image is segmented into three regions, i.e. cardiac cavity (black region), near endocardium (white region) and the rest (gray region). The data set is the pixel values of the given image f(x, y) of size M × N, where f(x, y) is the gray scale value of a pixel at location (x, y). No spatial details of these pixels are taken into account for clustering. We use the matrices or tables as shown in Table 1 throughout the discussion of this study.
Each tuple in Data represents a pixel with its spatial co-ordinates and the gray scale intensity [0-255] value. This means that the number of dimensions is just oneintensity value of the pixel. Since k = 3, the Centroid table always contains 3 rows with the pixels being selected as centroids in each iteration. Next, in order to store the Euclidean distances, d1, d2 and d3, the table Eucl is used and each entry gives the distance of i th pixel to the respective centroids in k clusters. Our algorithm uses Euclidean distance to find the nearest centroid to each pixel, i.e., the distance between C j and D i as shown in Eq. 4: The table CVCD stores the pixels and their assigned cluster number (j) during every iteration. At the end of the predefined iterations, the CVCD table would contain the desired segmented pixel data. We use the following subscripts in this study: i: 1..n: number of data points (pixels), j: 1..k: number of clusters and l: 1..d: number of dimensions.

Proposed algorithm: Fast SQL K-Means:
The design follows almost the same conventional approach as given in (Patel and Sinha, 2006;Muda et al., 2010, Ordonez andPitchaimalai, 2010), except that the procedure is implemented in SQL with its overview appearing in Fig.  1. The shaded boxes represent the name of the database tables being used for segmentation.
Image preprocessing: Generally, an echo image is acquired from an ultrasound and echocardiography system with a resolution of 800×564 (Philips machine is used for our experiment). Then, it is median filtered to remove the speckle noise (Lim and Goh, 2009). These images are of JPG format with 24bpp grayscale. This means the RGB values are same and taken as a onedimension intensity value of each pixel, i.e., one byte. Now this filtered image is given as input to K-Means algorithm.

Relational DBMS tables:
Following are the normalized database tables that are required for the clustering of pixel data: The Data table stores the pixel data in which i is the id and declared as primary key and also indexed. Next, to store the mean values of each cluster, a separate table Centroid is used. For this table j acts as the primary key which references i in Data table. The Euclidean distance is computed for each data point in Data with rows in Centroid and the computed value is stored in Eucl table for all clusters. Since k = 3, the Centroid table will have 3 tuples at any point of time. The table CVCD is to store the pixel and the assigned cluster number (1, or 2, or 3).

Algorithm:
In order to speed up the processing, minimum number of tables and joins must be formulated in the SQL statements. Figure 2 shows the proposed algorithm for Fast SQL based K-Means clustering.
Steps 1-4 are initialization steps and do not depend on n. The for loop in line 5 iterates Q times and executes 3 INSERT statements. The first sub-step deletes the Centroid table data (using TRUNCATE) and inserts the newly computed mean of each cluster. Next step computes the distance between each pixel in Data table with the cluster centroid and insert them into Eucl table. Finally, the third step computes the minimum out of d1, d2, d3 and assigns the cluster id of each pixel and inserts them into CVCD. The final table SI is to quantize the pixels into 3 colors: 0, 150 and 255 for the final image. Delete table data: Before populating the database tables, we must delete all the existing rows. The fastest way to do this in Oracle 10g is by executing the following statements: TRUNCATE TABLE CVCD;  TRUNCATE TABLE EUCL;  TRUNCATE TABLE CENTROID;  TRUNCATE TABLE DATA; Note that TRUNCATE is faster than DELETE, because the former does not store the deleted tuples in rollback segments. Next, loading the image data into Data It is easy to notice that the only input to this algorithm is Q-number of iterations. Next, k is set to its default value as 3. It is experimentally verified that a maximum of 6-8 iterations is sufficient to get good segmentation results. Euclidean distance, d is calculated with a single SQL statement for all pixels w. r. t the centroids without join (Nandagopalan et al., 2010b).

Update of database tables:
We must update all the tables, except the Data table to cluster the data points. Depending upon the pixel data distribution and initial seed values, certain clusters may contain NULL values. This would cause incorrect join operations. To overcome this problem, left-outer join and NVL are appearing in the queries. Hence, our queries are carefully designed for any eventualities and this exhibits the robustness of the proposed design. The sub-steps of statement 5in Fig. 2 (SELECT i,, 2)) as d1, sqrt(power ((e2.val -c2.val), 2)) as d2, sqrt(power ((e2.val -c3.val), 2)) as d3 FROM Data e2, (SELECT j, x, y, NVL(val, 0) as val FROM Centroid WHERE j = 1) c1, (SELECT j, x, y, NVL(val, 0) as val FROM Centroid WHERE j = 2) c2, (SELECT j, x, y, NVL(val, 0) as val FROM Centroid WHERE j = 3) c3 ); The advantage of SQL is evident here; the distance calculation for all clusters and all data points are computed with a single query; thus obtaining fast running time.

Update CVCD table:
Finally, the cluster assignment to each pixel is done by finding the least distance in Eucl  table and update CVCD table accordingly: INSERT INTO CVCD (i, j, val) (SELECT v1.i, Case when d1 <= d2 and d1 <= d3 then 1 when d2 <= d3 and d2 <= d1 then 2 when d3 <= d2 and d3 <= d1 then 3 end as j, v1.val FROM Eucl v2, Data v1 WHERE v2.i = v1.i); With a Case statement, this study can easily be done as shown above.
Next, in order to create an image for display we assign 0 (black) to all pixels in cluster 1, assign 150 (gray) to all pixels in cluster 2 and 255 (white) to all pixels in cluster 3. This task is also carried out by the following SQL statement: INSERT INTO SI (i, j, val) ( SELECT i, j, DECODE (j, 1, 0, 2, 150, 3, 255) val FROM CVCD ); Now, with the data in CVCD table the image can be constructed and returned to the calling routine. The sample segmentation of images is shown in the results section. Since the number of dimensions is just 1 and the data size is small, K-Means algorithm converges very fast.

RESULTS
Following approach is used to demonstrate the improved efficiency of the proposed algorithm: • 2D echo images of different views but of same size (i.e., n remains constant) • A single image at different resolutions (10K pixels to 100 K pixels -varying n) • A data size of 100 K pixels with k = 5 (varying d) These real test data are executed on two algorithms: Conventional and Fast SQL K-Means. The running times are also compared with the earlier results. Figure 3 shows five 2D echo images of different views with resolution 400×250 (n = 100 K) and their segmented outputs.
The objective of the next experiment is to compute the execution time for 10 echo images by running conventional and our algorithm. As per Table 2, it is evident that the conventional K-Means is significantly slow compared to the SQL implementation.
These timings are calculated only for the statements inside the for loop (with Q = 4) and does not include the set up time or other times required for the clustering process.
Another surprising inference obtained from this analysis is that, when k = 5, there is no significant change in the running time for the same data size. This clearly indicates that the SQL statements are independent of the number of clusters. The table data is plotted as a bar chart and shown in Fig. 4.      To understand the relative efficiency of this algorithm under more practical circumstances, two algorithms were executed by varying the data size from 10 K pixels to 100 K pixels. For this experiment second image of Fig. 3 was selected as input under different resolutions. The results appear in Table 3 and Fig. 5.

DISCUSSION
The proposed Fast SQL K-Means is faster than conventional and SQL K-Means by a factor of 90 and 10 respectively. No significant change in running time when k increases. We can compare the results of these experiments with that of other authors and is given below: The above arguments reveal that our algorithm is fast to a large extent than others and also the quality of clustering is good for visual interpretation. It is also useful for ventricle border tracing and to extract clinically relevant features.

CONCLUSION
The main task of this research is the echo image segmentation and for this an efficient implementation of K-Means clustering algorithm, called Fast SQL K-Means is implemented. Traditional K-Means algorithm presents scalability problems with increasing number of clusters or number of points. Its performance graphs exhibit nonlinear behavior. The SQL based algorithm does not require extensive set up and also take extra time in the segmentation process, because the patient data is already available in the database.
The algorithm has been implemented on C#.NET framework. We have demonstrated the practical efficiency of this algorithm both theoretically, through a data sensitive analysis and empirically, through experiments on both synthetically generated and real data sets like live patient.