Classification of Squamous Cell Carcinoma Based On Color and Textural Features in Microscopic Images of Esophagus Tissues Corresponding Author:

This paper presents a method for feature extraction using color and texture from microscopic images of esophagus tissues obtained fr om the abnormal regions of human esophagus detected through endoscopy. This method is used for classification of Squamous Cell Carcinoma (SCC) of esophagus, namely, poorly differentiated S CC, moderately differentiated SCC, and well differentiated SCC. Three different color spaces, n amely, HSV, YC bCr, and Lab, are used for color texture analysis to test the classification of SCC of esophagus. The texture features are extracted fr om the luminance channel and the color features are ex tracted from the chrominance channels. The color and textural features are fused to characterize tex ture properties of image. The experimental results show that the classification accuracy of 100% is ob tained using YC bCr color space. Also, the proposed method is robust enough to yield 100% classificatio n rate even with small training/ testing sample in case of poorly differentiated SCC in all the three color spaces. This is a significant result, since t he number of training images is small in most cases an d also the number of testing images of a patient may be small.


INTRODUCTION
As more and more of today's digital images are color images, color image segmentation and classification has become an important problem in image processing and analysis. Its applications include medical image analysis, face recognition, object based image and video coding, and hyper spectral image analysis. Object detection, segmentation, and classification are the key building blocks of a computer vision system for image analysis. The goal of detection and segmentation is to locate and extract meaningful objects from the image. For example, in cytological and histological images, this detection, segmentation, and classification play important roles in squamous cell carcinoma classification of diseased tissues.
Medical images are used as an important tool for determination of pathological condition of the vital organs of the body like lung, brain, esophagus etc. In this study our focus is on microscopic images of esophagus tissue obtained from the abnormal regions of human esophagus detected through endoscopy.
Segmentation is the first step towards automatic processing for analysis and evaluation of medical images. Image segmentation is the technique which partitions an image into units which are homogeneous with respect to one or more characteristics. Texture is one of the important characteristics used in identifying an object or a region of interest in an image. Robust segmentation results generally require the gray scale / color and textural information simultaneously. Texture features play an important role in image classification and analysis. In classification, texture features can be used to discriminate and label areas of an image, for example, crop identification in an aerial photograph and medical diagnosis of an X-ray photograph. Texture features can also be used in scene segmentation and identification in an image understanding or computer vision system, for example, in robot vision and industrial inspection. Therefore, the choice of texture features is the key in these applications [1] . To diagnose and assess the behavior of many diseases, microscopic image analysis is important. This is a heavy and complicated work for the pathologists, both time consuming and expensive.
In pathology, diagnoses of disease are based on the recognition by a highly trained observer of visual clues or diagnostic criteria from various tissue specimens. Although many of these criteria have been clearly defined for each disease, they are often interpreted differently by each pathologist confronted with a specific specimen. With the hope of introducing more objective and accurate diagnostic criteria to the practice of pathology, many quantitative techniques have been developed. In quantitative pathology, the most important and, in fact, the most difficult task in image morphometry is the recognition or segmentation / classification of cells. Although the interactive manual tracing method is still the most reliable approach for the segmentation / classification of cells, it involves considerable user participation and is very timeconsuming. To simplify the tracing process, more efficient approaches make use of a priori knowledge of medical cells, require less work from the user in tracing and are effective in practice.
Squamous cell carcinoma is by far the most common malignant tumor of the esophagus. It is usually graded as well, moderately, or poorly differentiated. Well-differentiated tumors are those with abundant amounts of keratin, easily demonstrated intercellular bridges, and minimal nuclear and cellular polymorphisms. Poorly differentiated tumors are those with no or virtually no keratin and intercellular bridges or with marked cellular and nuclear polymorphisms. Moderately differentiated tumors are those intermediate between well and poorly differentiated [2] . Most of studies are development of diagnosis support system that solved problem of pathologist shortage. One of the studies is Pap smear screening systems for cervical cancer that have already released. Many techniques of image processing were proposed to handle with problems such as nucleus segmentation and classification in development of these systems [3] . Chen et al. [4] used spatial adaptive filter, watershed. Anoraganingrum [5] used a combination of median filter and mathematical morphology operation. Hazem Refai et al. [6] used similar approach as of Anoraganingrum [5] for cell segmentation, and P.S.Hiremath and Humnabad Iranna Y. [7] have proposed an automated cell nuclei segmentation and classification of squamous cell carcinoma from microscopic images of esophagus tissue using moment based textural features. In this paper, we present a novel method for feature extraction using color and textural information from microscopic images of esophagus tissues obtained from the abnormal regions of human esophagus detected through endoscopy. This method is used for classification of Squamous Cell Carcinoma (SCC) of esophagus, namely, poorly differentiated SCC, moderately differentiated SCC, and well differentiated SCC. Three different color spaces, namely, HSV, YC b C r , and Lab, are used for color texture analysis to test the classification of SCC of esophagus. The texture features are extracted from the luminance channel and the color features are extracted from the chrominance channels. The color and textural features are fused to generate feature vectors that characterize texture properties of image. In most cases of medical images of patients, the number of images available for training/ testing would be small. The proposed method is experimented for both small and large sample of training/ testing images.

PATHOLOGICAL FEATURES OF BENIGN AND MALIGNANT CELLS
This section deals with the basic concepts of pathology related to the present work. It is not the main aim of this paper to expose in details the general pathology of benign and malignant tumors [8] . Pathological features in brief are explained. Majority of neoplasms can be categorized morphologically into benign and malignant on the basis of certain characteristics, the most important being degree of differentiation of the tumour cells. Differentiation is defined as the extent of morphological and functional resemblance of parenchymal tumour cells to corresponding normal cells. If the deviation of neoplastic cell in structure and function is minimal as compared to normal cell, the tumour is described as well-differentiated such as most benign and low-grade malignant tumors. Poorly differentiated or undifferentiated are synonymous terms for poor structural and functional resemblance to corresponding normal cell. In other words, lack of differentiation is termed as anaplasia which is a characteristic feature of most malignant tumors. As a result of anaplasia, noticeable morphological and functional alterations in the neoplastic cells are observed. These are considered below [8] : (i) Polymorphisms: The term polymorphisms means variation in size and shape of the tumour cells. The extent of cellular polymorphisms generally correlates with the degree of anaplasia. Tumour cells are often bigger than normal but they can be of normal size or smaller than normal.

(ii) Nucleocytoplasmic(NC) changes:
The nuclei of tumour cells show most conspicuous changes compared to normal cells. (a) Generally, the nuclei of malignant tumour cells are enlarged, disproportionate to the cell size so that the NC ratio is increased. (b) Just like cellular polymorphisms, the nuclei, too, show variation in size (anisonucleosis) and shape in malignant tumour cells. (c) Characteristically, the nuclear chromatin of the malignant cell is increased and coarsely clumped. This is due to increase in the amount of nucleoprotein resulting in dark-staining nuclei, referred to as hyperchromatism. Besides, a prominent nucleolus or nucleoli may be present in these nuclei reflecting increased nucleoprotein synthesis. (d) The cytoplasm of tumour cells in betterdifferentiated cancers and in benign tumors may show the normal constituents from which the tumour is derived, e.g. the presence of mucus, keratin, cross striations etc.

Squamous cell carcinoma:
This is the most commonly occurring malignant tumor of the esophagus. It is usually graded as well, moderately, or poorly differentiated. Well-differentiated tumours are those with abundant amounts of keratin, easily demonstrated intercellular bridges, and minimal nuclear and cellular pleomorphism. Poorly differentiated tumours are those with no or virtually no keratin and intercellular bridges or with marked cellular and nuclear polymorphism. Moderately differentiated tumours are those intermediate between well and poorly differentiated tumours [2] .

MATERIALS AND METHOD
Image data: The histological material used in this study has been collected from Gulbarga Diagnostic and Research Laboratory, Gulbarga. The image data set comprised 120 image samples from H and E (Haematoxylin and Eosin) stained tissue sections of esophagus. The digital images of stained tissue slides are captured by using a light microscopy imaging system (Olympus BX51 with DP12 camera) at a magnification of x40. For experimentation 120 color microscopic images of size 256x256 pixels are used.

Features:
The proposed method, which is presented in the following, is based on the cooccurrence matrix and Haralick features that characterize color and texture of images. This method is classical in the pattern recognition community and has extensively been used on gray scale images. In the present approach, we extend this method to color texture analysis of images. We briefly recall the definitions: Let I be a grayscale image coded on m gray levels. Let  ) element is the number of pairs of pixels separated by the translation vector t that have the pair of gray levels ( ) (1) The choice of the relative position vector is the same as Haralick's. This is a distance on one pixel in eight directions to take in to account the eight nearest neighbors of each pixel. The eight matrices obtained are then summed to obtain a rotation-invariant matrix M. It is observed that since Haralick assumed that the texture information is contained in this matrix, and texture features are then calculated from it. He extracted 14 parameters from the cooccurrence matrix, but only five are commonly used because it was shown that the 14 are very correlated with each other, and that the five sufficed to give good results in a classification task [9] . The features are homogeneity (E), contrast (C), correlation (Cor), entropy (H) and local homogeneity (LH).
where i µ and i σ are the horizontal mean and variance and j µ and j σ are the vertical statistics.  ( ) Secondly, the YC b C r color space is used. This color space is widely used for digital video. In this format, luminance information is stored as a single component (Y), and chrominance information is stored as two color-difference components (C b and C r ). The C b represents the difference between the blue component and a reference value. The C r represents the difference between the red component and a reference value. These features are defined for video processing purposes and so are not meaningful concerning human experience.

Feature extraction algorithm
Step 1: Input the RGB microscopic image I of esophagus tissue Step 2: Transform the RGB color space of I to r b C YC / HSV / Lab and choose quantization level q.
Step 3: Compute the cooccurrence matrix for luminance channel in the color space chosen in Step 2 and extract Haralick features (as described in Section 3.2) Step 4: Extract the color textural features for chrominance channels (as described in Section 3.3) Step 5: Fuse the color and Haralick textural features to yield a feature vector with 9 features (i.e. 5 Haralick features + 2 statistical moment features x 2 chrominance channels, as shown in Figure 1) and store in feature library Step 6: Repeat Step 1 through Step 5 for all the training images and build the feature library completely.

TRAINING AND CLASSIFICATION
For experimentation, the data set consists of 40 microscopic images of esophagus tissue obtained from the abnormal regions of human esophagus detected through endoscopy of each category namely, poorly differentiated, moderately differentiated, and well differentiated Squamous Cell Carcinoma (SCC) of size 256x256. The entire image is used for feature computation. Thus, the dataset contains 120 images.
Training: In the training phase, the color and texture features are extracted (as described in section 3) from t randomly selected sample images of each category, namely, poorly differentiated, moderately differentiated, and well differentiated SCC. These features are stored in the feature library, which are further used for classification of SCC images.

Classification:
In the classification phase, the remaining 40-t images of each category (out of total 40 images of each category, t images of each category have been used for training) are used. The color and texture features are extracted for each test image as described in section 3 and then compared with the features of all the images from the feature library. The Canberra distance measure is used for computing the distance between image features. This distance measure allows the feature set to be in unnormalized form. The Canberra distance is given by

EXPERIMENTAL RESULTS
The experimentation has been done choosing a small training sample and also for a large training sample of images in different color spaces: YC b C r , HSV and Lab, with varying quantization levels. The experimental results of the proposed method are presented in Table 1 and Table 2, which show the percentage classifications for three different image classes, namely, poorly differentiated, moderately differentiated, and well differentiated SCC of esophagus tissue. Five sample images of each class are shown in Figure 2. The analysis of the experimental results shows that the classification accuracy of 100% is obtained using YC b C r color space.   Small training sample: The Table 1 shows that, for small training sample, the classification rate of 100% is achieved in all the three color spaces with varying quantization levels for poorly differentiated SCC. However, the overall classification rate is better in YC b C r color space than in other color spaces.
Large training sample: The Table 2 shows that, for large training samples, the classification rate is 100% in all the three color spaces with varying quantization levels for all the three disease classes of SCC. However, only in case of well differentiated SCC, the classification rate is 80-90% in HSV and Lab spaces. But, in YC b C r space, the classification rate is 100% for all the three disease classes of SCC upto K=7 (except quantization level 128 at K=7). Thus, in general, the classification rate of the proposed method is 100% in YC b C r space. Further, the proposed method is robust enough to yield classification rate of 100% even with small training/ testing sample of images in case of poorly differentiated SCC in all the three color spaces. These results are significant in view of the fact that in most cases of medical images of patients, the number of images available for training/ testing would be small.

CONCLUSION
In this paper, a novel method for feature extraction using color and texture of microscopic images of esophagus tissue obtained from the abnormal regions of human esophagus is presented. The proposed method consists of a change of the color space of the images, in order to obtain one channel containing the luminance information and two others containing chrominance information. Texture (Haralick) features are then computed from the luminance channel and other features, namely, statistical moment features are computed from the chrominance channels. This method is used for classification of SCC of esophagus namely, poorly differentiated SCC, moderately differentiated SCC, and well differentiated SCC. Three different color spaces, namely, HSV, YC b C r , and Lab are used for color texture analysis. The experimental results show that the good classification accuracy of 100% is obtained using YC b C r color space. The proposed method is robust enough to yield 100% classification rate even with small training/ testing sample in case of poorly differentiated SCC in all the three color spaces. This is a significant result, since the number of training images is small in most cases and also the number of testing images of a patient may be small.