Dermoscopic Image Segmentation using Machine Learning Algorithm

,


INTRODUCTION
Dermoscopy, also known as epiluminescence microscopy, is a non-invasive skin imaging technique that uses optical magnification and either liquid immersion or cross-polarized lighting, making subsurface structures more easily visible when compared to conventional clinical images. Dermoscopy allows the identification of dozens of morphological features such as pigment networks, dots/globules, streaks, blue-white areas and blotches. This reduces screening errors and provides greater differentiation between difficult lesions such as pigmented Spitz nevi and small clinically equivocal lesions. The standard approach in automatic dermoscopic image analysis has usually three stages: (1) image segmentation; (2) feature extraction and feature selection; (3) lesion classification. The segmentation stage is one of the most important since it affects the accuracy of the subsequent steps. The segmentation technique subdivides an image into its constituent regions or objects. The segmentation should stop when the objects or regions of interest in an application have been detected. It is used in image analysis and recognition. For instance, for the automated detection of cancerous cells from mammographic images, segmentation followed by recognition or classification is required. Most of the segmentation algorithms are based on one of two basic properties of intensity values: discontinuity and similarity. In the first category, the approach is to partition an image based on abrupt changes in intensity, such as edges. The principal approaches in the second category are based on partitioning an image into regions that are similar according to a set of predefined criteria. A large numbers of algorithms have been proposed in previous years. One of the Conventional image segmentation algorithms is clustering by which homogeneous properties around a given pixel are enlarged.
The crisp segmentation methods such as thresholding (Ganster et al., 2001), region growing, Edge-based approaches (Kapur et al., 1996), k-means and split and merge methods are generally used for image segmentation. . Besides this, soft segmentation methods were also seen effective for segmentation. Segmentation methods are applied from the artificial intelligence field, especially using fuzzy and neural networks approaches. The present survey is intended to be a more comprehensive study of the existing Fuzzy and Neural-network-based segmentation techniques. In this study we propose and evaluate several Fuzzy and Neural Network based clustering techniques: Fuzzy C Means Algorithm (FCM), Possibilistic C Means Algorithm, Hierarchical C Means Algorithm, C-mean based Fuzzy Hopfield Neural Network, Adaline Neural Network and Regression Neural Network. These algorithms are applied to the dermoscopic image and are compared with the expected lesion segmentation (ground truth). The evaluation is based on different parameters and quality metrics that take into account different types of error.

MATERIALS AND METHODS
Intelligent fuzzy and neural network based clustering techniques: The Intelligent system is a branch of computer science concerned with making computers behave like humans. Cluster analysis is a technique for classifying data, i.e., to divide a given dataset into a set of classes or clusters. In classical cluster analysis each datum must be assigned to exactly one cluster. The intelligent system cluster analysis relaxes this requirement by allowing gradual memberships, thus offering the opportunity to deal with data that belong to more than one cluster at the same time. Crisp clustering assigns each data to a single cluster but in fuzzy the membership function measures the degree of belonging of each feature in a cluster. Most fuzzy clustering algorithms are objective function based: They determine an optimal classification by minimizing an objective function.
The degrees of membership to which a given data point belongs to the different clusters are computed from the distances of the data point to the cluster centers. Several fuzzy clustering algorithms can be distinguished depending on the additional size and shape information contained in the cluster prototypes, the way in which the distances are determined and the restrictions that are placed on the membership degrees (Silveira et al., 2009;Lin et al., 1996). Here we focus on the fuzzy c-means algorithm (Zhang, 2006), which uses only cluster centers and a Euclidean distance function and the Gustafson{Kessel algorithm, which uses cluster centers, covariance matrices and a Mahalanobis distance function. Hopfield network is a recurrent network in which all neurons are connected to each other, with the exception that no neuron has any connection to itself (Bezdek and Pal, 1992). Computer Aided Diagnosis (CAD) system for the detection of brain tumor by using parallel implementation of ACO system for medical image segmentation applications due to the rapid execution for obtaining and extracting the Region of Interest (ROI) from the images for diagnostic purposes in medical field (Jaya and Thanushkodi, 2011). A New Modified Gaussian Mixture Model for Color-Texture segmentation presents a new, simple and efficient modified Gaussian mixture model based clustering algorithm for colortexture segmentation. The proposed mixture model introduces a new component density function which incorporates spatial information and the weighting factor for neighborhood effect is fully adaptive to the Image conten (Sujaritha and Annadurai, 2011) A genetic algorithm based segmentation is implemented in the process of computer vision and object classification. The objective of this study was to develop a robust technique for the automatic segmentation and classification of touching objects (Scavino et al., 2009). This survey describes the current state of the ongoing the BC automated diagnosis research program and describes a software system that provides expert diagnosis of breast cancer based on three step of cytological image analysis (Sebri et al., 2007) This work propose and compare various intelligent fuzzy (Bezdek and Pal, 1992) and neural network based clustering techniques. The various intelligent system based clustering methods are: The intelligent clustering technique differs from the conventional hard computing in that, unlike the later, it is tolerant of imprecision, uncertainty, partial truth and approximation.

Fuzzy C Means algorithm (FCM):
The most prominent algorithm is the FCM or Fuzzy C Means algorithm. The FCM algorithm receives the data or sample space in matrix format. The number of clusters, the assumption partitioning matrix, the convergence value all must be given to the algorithm. The FCM algorithm assigns pixels to each category by using fuzzy memberships N. The algorithm is an iterative optimization that minimizes the cost function. The number of clusters c, the assumption partition matrix U, convergence value E all must be given to the algorithm. The first step is to calculate the cluster centers. The second step is to calculate the distance matrix d. The distance matrix constitutes the Euclidean distance between every pixel and every cluster center. If the difference between the initial partition matrix and calculated partition matrix is greater than the convergence value then the entire process from calculating the cluster centers to the final partition matrix. The final partition matrix is taken and used to reconstruct the image. The cluster centroid V i for each cluster Eq. 1: The objective function is minimized when pixels close to the centroids are assigned high membership values and low membership values assigned to pixel far from centroid. The standard FCM objective function is given by Eq. 2: where, X = {X1, X 2 ,………X j ,…..X N } is a p×N input data matrix, where p represents the dimension of each feature vector and N represents the number of feature vectors. C is the number of clusters, U ij represents the membership function of the jth data in ith cluster C i , d is the distance between input and centroid, V i is the ith cluster center and m is a constant. The membership functions and cluster centers are updated by the following Eq. 3: where, m is a weighting factor which controls the degree of fuzziness. A measure of similarity between X j and V i is given as Eq. 4: Convergence can be detected by comparing the changes in the membership function or the cluster center at two successive iteration steps.

Possibilistic C Means algorithm (PCM):
In possibilistic fuzzy (Zhang and Jiang, 2009) clustering one tries to achieve a more intuitive assignment of degrees of membership by dropping the probability constraint of FCM, which is responsible for the undesirable effect. However, this leads to the mathematical problem that the objective function is now minimized by assigning U ij = 0 for all i € {1,……, c} and j € {1,….., n}. In order to avoid this trivial solution, a penalty term is introduced, which forces the membership degrees away from zero. That is, the objective function J is modified to Eq. 5: where, d ij is the distance between the j t h data and the ith cluster center, µ ij is the degree of belonging of the jth data to the ith cluster, m is the degree of fuzziness, η i is a suitable positive number, c is the number of the clusters and N is the number of the data. µ ij can be obtained as Eq. 6: where, d ij is the distance between the jth data and the ith cluster center, µ ij is the degree of belonging of the jth data to the ith cluster, m is the degree of fuzziness, η i is a suitable positive numbers. The value of η i determines the distance at which the membership value of a point in a cluster becomes 0.5. The value of η i is obtained as Eq. 7: The value of η i can be fixed or changed in each iteration by changing the values of µ ij and d ij . This method is more robust in the presence of noise, in finding valid clusters and in giving a robust estimate of the centers. At first sight this approach looks very promising. However, if we take a closer look, we discover that the objective function J defined above is, in general, truly minimized only if all cluster centers are identical. The reason is that formula for the membership degree of a datum to a cluster depends only on the distance of the datum to that cluster, but not on its distance to other clusters. Hence, if there is a single optimal point for a cluster center (as it will usually be the case, since multiple optimal points would require a high symmetry in the data), all cluster centers will converge to this point. More formally, consider two cluster centers β 1 and β 2 which are not identical and let Eq. 8: That is, let z i be the amount that clusters β i contributes to the value of the objective function. Except in very rare cases of high data symmetry, it will then either be z 1 > z 2 or z 2 > z 1 . That is, we can improve the value of the objective function by setting both cluster centers to the same value, namely the one which yields the smaller z-value, because the two z-values do not interact. In the probabilistic approach the cluster centers are driven apart, because a cluster, in a way, consumes part of the weight of a datum and thus leaves less that may attract other cluster centers. Hence sharing a datum between clusters is disadvantageous. In the possibilistic approach there is nothing equivalent to this effect. Nevertheless, possibilistic fuzzy clustering (Steck and Balakrishnan, 1994) usually leads to acceptable results, although it suffers from stability problems if it is not initialized with the corresponding probabilistic algorithm. We assume that other results than all cluster centers being identical are achieved only, because the algorithm gets stuck in a local minimum of the objective function.

Hierarchical C Means algorithm (HCM):
Given a set of elements X, a mixed approach is applied to build a fuzzy hierarchical structure. The process starts building a fuzzy partition of X applying fuzzy c-means. This results into a set of fuzzy membership functions µ i , each one built on the centroid v i . This fuzzy partition bootstraps the process. Then, the iterative process is applied to build the hierarchical clustering following a bottom-up strategy. Such set of membership functions is partitioned using a partitive clustering method for fuzzy sets. Such partitive clustering method returns a new fuzzy partition µ i / that is used as the starting point of the new step.
In this algorithm, the fuzzy c-means algorithm is used for building the initial fuzzy partition. Such fuzzy partition is obtained by applying the fuzzy c-means algorithm to X. In this case, the algorithm is applied with a large number of clusters (i.e., c is large). This selection of c is to have a large number of leaves in the fuzzy hierarchy. In the iterative process, fuzzy c-means based clustering method is used. Differences consist on the way the distance ||x k -v i || is computed. Here, x k and v i represent fuzzy sets. More specifically, x k stands for the k-th fuzzy set to be partitioned and v i is one of the fuzzy sets in the new partition. Accordingly, ||x k -v i || is a distance between fuzzy sets. Following the standard approach in fuzzy c-means, the fuzzy membership of a fuzzy set with centroid v is defined considering all other centroids v i . In our case, the membership of the fuzzy set with centroid x k is computed for all x taking into account all other centroids x j as follows Eq. 9: Similarly, the membership of the fuzzy set with centroid v i is computed for all x taking into account all other centroids v j as follows Eq. 10: Note that here, x j are the centroids of the fuzzy sets being clustered and v j is the centroids of the clusters we are constructing with the fuzzy c means. Similarly, c is the number of centroids xj j and c is the number of centroids in v j .Then, the distance between a fuzzy set with centroid x k and another with centroid v i will be computed. This is, how to determine the new v i Eq. 11: Note that this approach leads to different membership values. Thus this method builds hierarchies of clusters where membership to clusters is fuzzy.
Fuzzy Hopfield Neural Network method (FHNN method): This method use the fuzzy c-means algorithm to eliminate the need for finding weighting factors in the Lyapunov energy function. The number of neurons used to construct the Network depends on the image size; the larger the image size, the more neurons that are required. These neurons are fully interconnected. The total input of neuron (i, k) denoted as Neti,k can be formulated as Eq. 12: where, N is the number of data points, c is number of clusters, V j,q denotes the binary state of neuron (j,q), W i,j,k,q is interconnection weight between neuron (i,k) and neuron (j,q), I i,k is external bias vector for neuron (i,k). The Hopfield neural network consists of N x c neurons that can be conceived as a 2-D array for the image-segmentation problem and the Lyapunov energy function is given (Roozbahani et al., 2001) as Eq. 13: N N c c N c i,k i,k; j,q j,q i,k i,k k 1 q 1 i 1 j 1 k 1 i 1 When the Lyapunov energy function is minimized, the neural network reaches a stable state. The optimization problem can be mapped into a 2D fully interconnected Hopfield neural network with the fuzzy c-means algorithm. The total input for neuron (i, k) can be modified (Ganster et al., 2001) as Eq. 14: Lyapunov energy can be changed (Roozbahani et al., 2001), m is the fuzzification parameter, N M a i,k;i,q i,q 1 W µ ∑ = is the total weighed input received from the neuron (i,q), x k is x pixel value of image and membership value µ i,k is the output state at neuron (i, k). A neuron (i, k) in a maximum membership state indicates that x k pixel belongs to class i. The 2D Hopfield neural network represents cluster centroids in columns and image pixel in rows. In order to generate an adequate classification with the constraints, we define Lyapunov energy function as follows (Roozbahani et al., 2001) Eq. 15: E is the total intra-class scatter energy that accounts for the scattered energies distributed by all pixels in same class. More specifically, the first term within-class scatter energy, minimizes the intra-class Euclidean distance from a sample to the cluster center in any given cluster and the second term which guarantees those number of data point N in image can only be distributed among these c classes, imposes constraints on the objective function (Roozbahani et al., 2001). The quality of classification result is very sensitive to the weighting factors and to search optimal values for these weighting factors is expected to be time-consuming and laborious. To alleviate this problem, a Hopfield neural network with a fuzzy cmeans clustering method, called FHNN, is proposed.
Because each image pixel can only be occupied by one class, the summation of states in the same row equals 1. This also ensures that only N data points will be classified into these c clusters . That is, the network must match the following constraints Eq. 16: Therefore, the energy function can be further simplified as Eq. 17: The normalization operation guarantees that each image pixel will be absorbed on several classes with certain probability degrees so there will be N data points assigned among c clusters. The minimization of energy E is greatly simplified because it contains only one term and hence the requirement of having to determine the weighting factors A and B vanishes. The synaptic interconnection weights and the bias input can be obtained as Eq. 18: The input to neuron (i,k) can be expressed as Eq. 19: The membership function for k-th pixel is given as Eq. 20: This membership function is effective to minimize new objective function in iteration. New objective function consists of average distance between image pixels and cluster centroids for separate and compact clustering. New objective function is given as Eq. 21: The FHNN Algorithm steps are given below: • Given the data set X, choose the number of clusters 1<c<N, the weighting exponent m>1 (Membership functions for large value m are fuzzier than those for small value m, but the interconnection weights are updated slowly), the termination tolerance ε>0 (is used as a criterion to determine the performance of the objective function. The larger the threshold value ε, the less the numbers of iterations will, however, be the optimal membership function cannot be found) and the norm-inducing matrix A. • Normalization, (gray levels of image) • Calculate of primary centroids v0 • Compute the distances Compute the initial membership value: Compute new membership value (Fuzzy c-means) Eq. 23: Compute J t Eq. 24: If | J t+1 -J t |>∈go to step 6, otherwise stop Adaline neural network: ADALINE (Adaptive Linear Neuron or later Adaptive Linear Element) is based on the McCulloch-Pitts neuron. It consists of a weight, a bias and a summation function. It was first developed to recognize binary patterns so that if it was reading streaming bits from a phone line, it could predict the next bit. The difference between Adaline and the standard perceptron is that in the learning phase the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function's output is used for adjusting the weights. Adaline is a neural network with multiple nodes where each node accepts multiple inputs and generates one output. Given the following variables: • x is the input vector • w is the weight vector • n is the number of inputs • θ some constant • y is the output Then we find that the output is Eq. 25: If we further assume that x n+1 = 1 and w n+1 = θ then the output reduces to the dot product of x and w, y = x j . w j . Adaline Network is a simple Neural Network with two Neuron Layers-one input Neuron Layer and one output Neuron Layer. The output layer has only one Neuron node. Adaline Network is also the first Neural Network we built that "learns". The learning rule is simple: We give it some input values, fire the Network and compare the output value with the desired value. If there is any discrepancy, the Links in the Link Layer will adjust their weights until the rate of error is smaller than our tolerance. The weight vector w can be obtained by minimizing the least-squares-error criterion. The delta learning rule adopted in ADALINE is a dataadaptive technique for deriving a least-squares-error solution. Let us assume: • η is the learning rate (some constant) • d is the desired output • is the actual output Then the weights are updated as follows w←w+η (d-o)x. The ADALINE converges to the least squares error which is E = (d*-o) 2 .
Regression neural network: Generalized regression neural networks are a kind of radial basis network that is often used for function approximation. Radial basis transfer function calculates a layer's output from its net input. This takes one input, N-S×Q matrix of net input (column) vectors and returns each element of N passed through a radial basis function. The probability density function used in GRNN is the normal distribution Eq. 26: Each training sample, X j , is used as the mean of a Normal Distribution. The distance, D j , between the training sample and the point of prediction, is used as a measure of how well the each training sample can represent the position of prediction, X. If the Distance, D j , between the training sample and the point of prediction is small, exp (-D j 2 /2σ 2 ), becomes big. For D j =0, exp (-D j 2 /2σ 2 ), becomes one and the point of evaluation is represented best by this training sample. The distance to all the other training samples is bigger. A bigger distance, D j , causes the term exp (-D j 2 /2σ 2 ) to become smaller and therefore the contribution of the other training samples to the prediction is relatively small. The term Yj* exp (-D j 2 /2σ 2 ) for the jth training sample is the biggest one and contributes very much to the prediction. The standard deviation or the smoothness parameter is subject to a search. For a bigger smoothness parameter, the possible representation of the point of evaluation by the training sample is possible for a wider range of X. For a small value of the smoothness parameter the representation is limited to a narrow range of X, respectively. With it is possible to: • Predict behavior of systems based on few training samples • Predict smooth multi-dimensional curves • Interpolate between training samples Performance measures: Different parameters were used to analyze the performance of various fuzzy clustering algorithms. They are False Positive Error (FPE), False Negative Error (FNE) Coefficient of similarity and spatial overlap. To define the first two types of quality metrics let SR denote the result of an automatic segmentation method and GT denote the ground truth segmentation obtained by the medical expert. Both SR and GT are binary images such that all the pixels inside the curve hove label 1 and all others have label 0. The metrics are calculated as follows: False Positive Error (FPE): These metric measures the rate of pixels classified as lesions by the automatic segmentation that were not classified as lesion by the medical expert Eq. 27:

False Negative Error (FNE):
The FNR measures the rate of pixels classified as lesions by the medical expert that were not classified as lesion by the automatic segmentation Eq. 28: Clinically, this is worse of two types of error.

Coefficient of Similarity:
The coefficient of similarity is the measure of relatedness between the automatic and manual segmentation (12) and is given by Eq. 29: The value of 1 represents perfect overlap and 0 represents no overlap.

Spatial overlap:
The measure of spatial overlap between the automatic (algorithmic) and the manual segmentation is given as Eq. 30: int er sec tion s manual a lg orithm Spatial Overlap is more accurate measure of agreement than the coefficient of similarity, because the approach takes into account the spatial properties of the segmented region. It is more sensitive to small unmatched errors.

RESULTS AND DISCUSSION
Six different segmentation methods were simulated for melanoma diagnosis. The evaluation was based on the performances measures calculated and tested using ground truth image which is manually segmented. The proposed Fuzzy clustering algorithms is simulated using MATLAB and tested with ground truth image to explore the segmentation accuracy of the various fuzzy clustering techniques. The effectiveness of the proposed approach is experimentally determined using the ground truth image.
The input malignant melanoma image is as shown in Fig. (1) was selected randomly from the database. The size of the image is 256×1200. Fig. 2-7 shows the segmentation results of malignant melanoma image using various intelligent systems based clustering techniques.
In the segmented output , the white region indicates the infected region (portions of skin affected with malignant melanoma) and the black trace or spots indicates the non-infected region (portions of the skin free from the malignant melanoma). The adaline network is constructed and is trained with about 200 training samples and the corresponding training targets. The training samples are the pixels taken from the Melanoma digital image. The GRNN (Fig.7) is trained by the input samples and the corresponding target vectors that are taken from the malignant melanoma image. In these cases, all the methods produce segmentation results which are close to the ground truth segmentation as shown in Fig.8. This happens when there is a good contrast between the lesion and the skin, thus the lesion boundaries are well defined. For obtaining the more accurate result in performance analysis, various parameters are to be considered. Fig.  9-11 shows the performance analysis using various parameters. The segmentation results were compared with the reference image (ground truth) and various parameters (Coefficient of Similarity, Spatial Overlap and False Positive and Negative Error) were evaluated and shown in Fig. 9-11.
The Coefficient of Similarity clearly shows the closeness or similarity of the segmented result with the Ground truth image. While comparing the coefficient of similarity of the results obtained from the 6 segmentation methods, the Hierarchical C Means algorithm performs better with a value of 0.9612 which is close to 1.
The best method according to the Spatial Overlap is the Hierarchical C Means algorithm with the value of 0.9421. It is more sensitive to small unmatched errors.
The best false positive and negative error (4459) is obtained by Hierarchical C Means algorithm. Among the two types of error, the false negative error is the worst type. Thus the Hierarchical C Means method was considered the most relevant metric from clinical point of view.

CONCLUSION
We proposed and evaluated six methods (intelligent fuzzy and neural networks based clustering techniques) for the segmentation of skin lesions in dermoscopic images. The various segmentation methods employed are Fuzzy C Means Algorithm (FCM), Possibilistic C Means Algorithm (PCM), Hierarchical C Means Algorithm (HCM), Fuzzy Hopfield Neural Network method (FHNN method), Regression Neural Network and Adaline Neural Network. The Segmentation methods was compared with the manually segmented image (the ground truth image) using various parameters such as Coefficient of Similarity, Spatial Overlap and False Positive and Negative Error to evaluate the performance of the proposed intelligent system based clustering techniques. The experimental results show that the Hierarchical C Means algorithm( Fuzzy) provides better segmentation than other (Fuzzy C Means ,Possibilistic C Means, Adaline Neural Network , FHNN and GRNN) clustering algorithms. Thus Hierarchical C Means approach can handle uncertainties that exist in the data efficiently and useful for the lesion segmentation in a computer aided diagnosis system to assist the clinical diagnosis of dermatologists.