BRAIN TUMOR CLASSIFICATION BASED ON CLUSTERED DISCRETE COSINE TRANSFORM IN COMPRESSED DOMAIN

This study presents a novel method to classify the brain tumors by means of efficient and integrated methods so as to increase the classification accuracy. In conventional systems, the problem being the same to extract the feature sets from the database and classify tumors based on the features sets. The main idea in plethora of earlier researches related to any classification method is to increase the classification accuracy.The actual need is to achieve a better accuracy in classification, by extracting more relevant feature sets after dimensionality reduction. There exists a trade-off between accuracy and the number of feature sets. Hence the focus in this study is to implement Discrete Cosine Transform (DCT) on the brain tumor images for various classes. Using DCT, by itself, it offers a fair dimension reduction in feature sets.Later on, sequentially K-means algorithm is applied on DCT coefficients to cluster the feature sets. These cluster information are considered as refined feature sets and classified using Support Vector Machine (SVM) is proposed in this study. This method of using DCT helps to adjust and vary the performance of classification based on the count of the DCT coefficients taken into account. There exists a good demand for an automatic classification of brain tumors which grealtly helps in the process of diagnosis. In this novel work, an average of 97% and a maximum of 100% classification accuracy has been achieved. This research is basically aiming and opening a new way of classification under compressed domain. Hence this study may be highly suitable for diagnosing under mobile computing and internet based medical diagnosis.


1.INTRODUCTION
The major task being the classification, it is a foremost role to distinguish the images based on the image content. The differences among all types of healthy and tumor images would be very less and highly challenging to classify. Tumor images are acquired based on a very critical imaging technique subsequently followed by enormous mathematical calculations before it is formed as an image. Typically the images obtained are nothing but the energy emitted by the cells or tissues present in the exposed test area.
So abnormalities in the brain images would not reflect properly in the distinguished intensity levels.
Although MR Images can show the location and size of tumors, it may not be able to classify the types of tumors. So an improved wavelet based classification had been done in Meenakshi and Anandhakumar (2013) and the results were verified with different types of tumors.
The local binary patterns and gray level co-occurrence features, gray level and wavelet features are extracted and these features are trained and classified using Support vector machine classifier in the works done by Kumar and Kumar (2014). Othman and Basri (2011), after which the classification was carried out using probabilistic neural network. In their work, it was observed that the PNN classifier showed good accuracy along with less training time. Moreover PNN offers good robustness for changes in weights andhas very negligible retraining time. Based on the spread values, accuracies were obtained in the range of 73% to a maximum of 100%. Magnetic Resonance Spectroscopy (MRS) is also a most relevant data through which tumors can be detected and classified. Arizmendi et al. (2011) methodology, they proved that MRS data could be used and a combination of DWT and PCA would give fair classification results.

Principal component analysis had been done in
Nuero fuzzy methods had been used by Murugavalli and Rajamani (2007) to detect the tumor through MRI data. In specific, hierarchical self organizing map and Fuzzy C menas had been used to classify the image layer by layer. Nanthagopal and Sukanesh (2013) had proposed to use wavelet based statistical approach to extract the feature vectors and classify the brain tumor using MRI images with SVM. They had performed classification with a data base size of 108 slices. Sridhar and Krishna (2013), involves the usage of DCT to extract the features and probabilistic neural network to classify the images. Through the works done in, Zhang et al. (2011) relevant authors achieved 0.0451 seconds for extracting feature vectors from each image using wavelet transforms, PCA and neural network. Zhang et al. (2013), used wavelet transforms and SVM for an efficient classification of brain tumors. Ortiz et al. (2012), proposed Self Organized Mapping (SOM) based on neural networks and Genetic Algorithm (GA) had been used to select the features sets. Loukas et al. (2013) employed an integrated procedure consisting three pattern recognition algorithms (K-NN, SVM and PNN) for classifying and to characterize the breast cancer.Diabetes diagnosis had been with an accuracy of 93.2% as given by Purnami et al. (2009). wavelet features have been considered in the works done by Sebri et al. (2007). Through the earlier works discussed above, clearly emphasize process flow to obtain the results better as given in Fig. 1. Resizing the data base images to a common size is essential to fix the number of features obtained in all images to a common count irrespective of the size of the images. This common size will help while comparing the feature sets kept in feature data base one to one, otherwise there would be a dimensional mismatch.Anyhow, it is not advisable to resize the images to too less size. But, resizing allows a data loss, hence a trade off is always maintained to preserve the accuracy level. Reducing the dimensions of feature sets is also based on the eagerness to achieve a fair accuracy with less number of mathematical operations when a large data base of images is presented.
This study has been organized as follows. Section 2 describes the methodology, description of classification system and describes the Pseudo codes of classification system. In section 3 and 4 Results obtained and discussion has been presented.

Discrete Cosine Transform (DCT)
A Discrete Cosine Transform (DCT) of a time domain signal is a sum of cosine functions oscillating at different frequencies. The authors Krishneswari and Arumugam (2012) hadexplained the method of applying DCT for feature extraction for biometric applications and essence has been presented below in order to obtain a good idea about DCT. The DCT is almost similar to the Discrete Fourier Transform (DFT), except that in DCT, more energy is concentrated in the lower order coefficients. For a 2D image data, the DCT is purely real (only magnitude). While performing a DCT operation, on a square matrix of pixels produces coefficients which are similar to the frequency domain coefficients produced by a DFT operation. An Npoint DCT is closely related to a 2N-point DFT. The N frequencies of a 2N point DFT correspond to N points on the upper half of the unit circle in the complex plane. The DCT of an image is typically given by Equation 1:

JCS
where, x n1, n2 represents the pixel value at n 1 and n 2 . N 1 and N 2 represents the total number of rows and columns taken into consideration while applying DCT and k 1 = 0…N 1-1 and k 2 = 0.N 2 -1. For 2D images, after transformation, the major amount of signal energy is held by just a few lower order DCT coefficients. Quantizing these DCT coefficients would be based on the energy content in them. Such lower order coefficients can be more quantized with full more number of bits than the higher order coefficients. In addition to this dimension reduction, higher order coefficients after a threshold may be quantized to 0. When an input image is divided into several horizontal and vertical blocks, with each block sizing 8×8. So, in this case, N 1 and N 2 are typically 8 (after dividing the input image into several horizontal and vertical blocks) and the DCT is applied to each block of size 8×8. The result is an 8×8 matrix of transformed coefficients in which the top left element (1,1) shown in Fig. 2 is the DC (Zero frequency) component and entries with increasing vertical and horizontal index values represent higher vertical and horizontal spatial frequencies. The DCT coefficients are reordered by the zig-zag scan. This method of scanning reorders all the DCT coefficients from low to high frequencies. In case of image processing, neglecting the higher order frequencies by quantizing with zero bits will not affect the appearance of the images. Human eyes are more perceivable to the low frequency components. Hence a good dimensionality reduction is achieved with DCT stage itself by discarding the higher order frequencies.
Zig-zag scanning process is performed on DCT coefficients shown in Fig. 2 after zig-zag scanning, (1×64) vector is produced. This vector contains coefficients of DCT block from low frequency to high frequency. Lowpass, Band-pass, High-pass filters are applied to this vector for constructing a feature vector by considering the whole spectrum range of the sub block.

K-Means Algorithm
The simplest way to explain K-means algorithm is as follows. Let us assume that given a dataset of n data points x 1 ,x 2 ,, …, x n such that each data point is in R d . The major aim will be to group the given data into k clusters. Each of the K clusters will contain certain number of elements. The k points {m j } (j = 1, 2, …, k) in R d such d that is minimized, where d(x i , m j ) denotes the Euclidean distance between x i and m j .which should have the minimum variance. The points {m j } where j = 1, 2, …, k are known as cluster centroids. The problem in Equation 2 is to find k cluster centroids, such that the average squared euclidean distance or Mean Squared Error (MSE) between a data point and its nearest cluster centroid is minimized.
The K-means algorithm can be understood as a gradient descent procedure, updates the cluster centroids to minimize the objective function in Equation 2. The K-means always converge to a local minimum. Velmurugan and Santhanam (2010) found the particular local minimum found on the starting cluster centroids.

Support Vector Machines (SVM)
The basic idea behind the SVM classification technique is to identify the class of the input test vectors. This is a supervised learning algorithm, where the training vectors are used to train the system to map these training vectors in a space with clear gaps between them using some standard kernel functions and the input test vectors are mapped on to the same space to predicted as given in Bharathi and Natarajan (2011), the possible class. The linear kernel scenario is shown in Fig. 3.
Given some training data D, a set of n points of the form where the class y i is either belonging to the class 1 or class-1, indicating the class to which the point Xi belongs. Each Xi is assumed to be a p-dimensional real vector.Here it is needed to find the maximum-margin hyper plane that divide the points having y i = 1 from those having y i = -1. Samples on the margin are called the support vectors. w.x-b = 0 wheredenotes the dot product andis the normal vector to the hyper plane. The parameter b/||w|| determines the offset of the hyper plane from the origin along the normal vector. If the training data are linearly separable, then two hyper planes can be selected in such a way that they separate the data and there are no points between them and then tried to maximize their distance. The region bounded by them is called "the margin". These hyper planes can be described by the equation: w.x -b and w.x -b 1 = = − At the testing phase, the data points x i are separated using the following constraints w.x i -b≥1 for x i of thefirst class or w.x i -b≤1 for x i of the second class.
The discussion is all about a two class SVM. In our proposedmethod, the target is a multiclass SVM where the number of classes is more than 2. A conventional two class SVMcan be extended into a multiclass SVM, by two methods. (1) By comparing one feature sets with all other remaining feature sets as in another class (oneversus-all). (2) Between every pair of classes (oneversus-one). SVM classification methods have been well discussed as given in Yogameena et al. (2010).

Description of Classification System
The Tumor images were obtained from Harvard Medical School (Web: http://www.med.harvard.edu/ aanlib/home.html) database. The brain datasets consists of 102 T2-weighted MR brain images in axial plane and 256×256 in-plane resolution, were downloaded. A total of 102 images from 5 different classes like Normal, Glioma, Meningioma, Alzhimer and Alzhimer with visual agnosia. The typical size of the resized images in our case is 150×150. Resizing is done to reduce the time processing Fig. 4.
The preprocessed images were decomposed into its discrete cosine terms. The number of coefficients were taken as option to decide the accuracy of theclassification system.The cluster counts and cluster centers are considered as feature elements representing an image.The training sets with such 40 vectors are chosen and trained with SVM. Reducing the dimension of the feature vectors is a common technique to reduce the processing time. Anyhow, in our classification, dimension reduction is achieved using DCT as well as using a novel method of feature vectors, being the element counts and centroids of each clusteras features.
The schematic shown in Fig. 5 shows the typical image process for one image. The down arrows represents the contents of the corresponding block. All the processes are carried out by individual block of size 8×8 basis in our implementation.

Pseudo CodesofClassification System
Step1: Collecting the data base. Step2: Preprocessing (This includes resizing the image and noise elimination) Step3: Dividing the images into horizontal and vertical blocks of size 8×8. And applying pass band DCT for every 8×8 window Step4: DCT coefficients are clustered using k-means algorithm Step4.1: Initial group centroids are placed into the2D space Step4.2: Each object of the data set is placed into the group that has the closest centroid Step4.3: Recalculate the positions of the centroids Step4.4: If the positions of the centroids didn't change, advance to the next step, else go to Step4.2 Step4.5: End Step5: Constructing SVM using a default linear kernel Step6: Input the test MRI to the trained SVM and outputting the prediction

RESULTS
The experiments were conducted on the hardware platform of P4 IBM with 3.3 GHz processor and 2 GB RAM, running under Windows 7 operating system. The algorithm was developed as Matlab script using Image processing and Mathematical tool box. The test sets are chosen and classification accuracy is obtained as given in the Table 1. When a resized image of size 150×150 is divided into 8×8 blocks, we obtain 324 blocks (18 horizontal and 18 vertical blocks) which is a rounded number through matlab coding. So, 64 coefficients from each window are mapped into only one DCT coefficient, which leads to 324 coefficients for the whole image. Anyhow, number of coefficients taken from each window can be increased when the accuracy obtained is less.
MR images with various diseases as named by medical experts were framed as a final image data base for our classification. It is well known that, any individual brain disease type reflects in the appearance of image. Hence it is worth to comment, that the classification process would highly help in identifying the disease concerned with the input MR image Fig. 6.
Thus the classification had been done with a total of 40 features, out of 40 features, 20 features are centroids and 20 features are the number of elements in each cluster. The classification accuracy is obtained as 97.05%. As this is a new method of taking cluster centroids and number of elements in each cluster as features of each DCT image, there is no motive to compare with earlier methods. The values presented in Table 1 are just the accuracy values obtained through our method which is fair.

DISCUSSION
The following Table 2 shows the accuracy variations for different number of DCT coefficient and number of cluster.
From Table 2 and Fig. 7 it is known that, the classification accuracy is insensitive to the number of DCT coefficients. But it depends on the number of features, in turn it depends on number of clusters. It may be thought thathow classification accuracy is dependent of number of DCT coefficients, but a minimum accuracy of 96% is achieved with only one DCT coefficient per block against the earlier methods discussed in previous sections. Figure 8 shows the performance increases linearly with an increase in the number of cluster features. But a better performance is arrived while a higher number of DCT coefficients are considered for classification. Table  3 shows the comparison of classification accuracies of existing methods and our proposed methods. Our proposed method yields 100% accuracy while calculating at compressed domain. Hence this research work opens a new way of classificaion method for medical images.

CONCLUSION
In this study, a new brain tumor classification work had been done with dimensionality reduction with pass band DCT and k-means clustering combined with SVM classification. As discussed in previous sections many advantages are reaped. While designing a classfication algorithm, earlier works have never focused on the memory utilization. Though earlier works focus on speed of computations, it makes use of the full dimension of the actual image or their coefficients. In our study, an attempt has been done, to reduce the memory usage and hence the feature vectors are extracted in compressed domain. By reducing the number of DCT coefficients and k-means clustering, a two stage dimension reduction has been achieved. In our work the linear kernel has been chosen for SVM classification. The cluster centroids and number of elements obtained through the k-means fully depends on initial random seeds of centroid values. Besides this, while using k-means algorithm it introduces Not A Number (NAN) while clustering due to convergence problem. This can be avoided by setting input argument options for "kmeans" function given in matlab, while running the algorithm. As result of this NAN problem and random seed values, the number of elements in the same cluster and centroid values may differ when clustering is done at a different instant for the same DCT transformed image. But this randomness is bound to less than 5% noted from our observations and do not affect our classification results is our justification. In future, our work will be focused towards other clustering techniques and optimizing SVM parameters while using different kernel functions. Using such different kernel functions may accomadate even noisy images too. In such case, it is expected that classification performance is still improved even in case of visual and non visual changes in image characteristics. It is strongly believed by the authors of this study, that this method of classification would still become popular for any kind automatic classification applicable to other image data bases as well.

ACKNOWLEDGEMENT
The researcher would like to thank Bharath scans, scans world, chennai and harvard medical school for their data base support of MRI brain images.