A New Cooperative Algorithm Based on PSO and K-Means for Data Clustering

: Problem statement: Data clustering has been applied in multiple fields such as machine learning, data mining, wireless sensor networks and pattern recognition. One of the most famous clustering approaches is K-means which effectively has been used in many clustering problems, but this algorithm has some drawbacks such as local optimal convergence and sensitivity to initial points. Approach: Particle Swarm Optimization (PSO) algorithm is one of the swarm intelligence algorithms, which is applied in determining the optimal cluster centers. In this study, a cooperative algorithm based on PSO and k-means is presented. Result: The proposed algorithm utilizes both global search ability of PSO and local search ability of k-means. The proposed algorithm and also PSO, PSO with Contraction Factor (CF-PSO), k-means algorithms and KPSO hybrid algorithm have been used for clustering six datasets and their efficiencies are compared with each other. Conclusion: Experimental results show that the proposed algorithm has an acceptable efficiency and robustness.


INTRODUCTION
Clustering is an unsupervised classification technique in which datasets that are often vectors in multi dimensional space, based on a similarity criterion, are divided into some clusters. Data clustering has vast application in data categorization (Memarsadeghi and Leary, 2003), (Velmuruqan and Santhanam, 2010), data compression (Celebi, 2011), data mining (Pizzuti and Talia, 2003), pattern recognition (Wong and Li, 2008), compacting (Marr, 2003), machine learning (Yang et al., 2007), image segmentation (Vannoorenberghe and Flouzat, 2006) and Data clustering importance in various sciences causes the introduction of various methods of data clustering (Hartigan, 1975). When used on a set of objects, which have attributes that characterize them, usually represented as vectors in a multi-dimensional space, are grouped into some clusters. When the predefined clusters number is K and there are N m-dimensional data, clustering algorithm would assign each of these data to one of the clusters, such that assigned data to a cluster with respect to a specific criterion are more similar to each other than data in other clusters.
The k-means clustering algorithm was developed by Hartigan (1975) which is one of the earliest and simplest clustering approaches that has been ever widely used. K-means method starts with K cluster centers and divides a set of objects into K subsets. This is one of the most famous and applied clustering techniques since it can be easily understood and implemented and its time complexity is linear. But kmeans method has major weaknesses. One of these weaknesses is extra sensitivity to initial values of cluster centers. Objective function of k-means has multiple local optimums and k-means method is not capable to guarantee to pass local optimums. Therefore, if initial position of cluster centers in problem space was chosen inappropriately, this could converge to a local optimum.
Data clustering is of NP problems. One of the most applied methods for finding suitable solution for these kinds of NP problems belongs to swarm intelligence algorithms. Particle Swarm Optimization (PSO) is one of the most famous swarm intelligence algorithms, which was presented by Kennedy and Eberhart (1995). This algorithm is an effective technique for solving optimization problems that works based on probability rules and population. So far, different PSO-based methods for solving data clustering problem have been presented (Esmin et al., 2008;Kao and Lee, 2009;Tsai and Kao, 2010). Presented a hybridized algorithm based on k-means methods and PSO, called KPSO in (Merwe and Engelbrecht, 2003). In KPSO, first, k-means method is executed and then, outcome of k-means is used as one of the particles in initial solution of PSO. Therefore, first in this method, high convergence rate of k-means is used and after k-means converges, PSO is applied for exiting from local minimums and improving the result of k-means. In this study, a cooperative algorithm is proposed based on PSO and k-means. In the proposed algorithm, first, particles perform optimization process in PSO. After particle swarm convergence, obtained cluster centers by particles are used as initial cluster centers of k-means algorithm. After forwarding PSO's output to k-means, particles are reinitialized and performs clustering again. In fact, in the proposed algorithm, PSO is used for a global search and k-means is used for a local search. The proposed algorithm and also k-means, PSO, CF-PSO (Eberhart and Shi, 2000) and KPSO algorithms are applied for clustering 6 real datasets iris, glass, wine, sonar, pima and WDBC. Comparing obtained results from experiments shows an acceptable efficiency of the proposed algorithm.

K-means algorithm:
Clustering in D-dimensional Euclidean space is a process in which a set of N members, based on a similarity criterion, is divided into K groups or clusters. Various clustering methods are represented so far. The base of clustering algorithms is measuring the similarity between data and it is determined how much similar these two data vectors are, by a function. K-means algorithm is one of the oldest and most famous clustering methods. This method sorts data vectors in D-dimensional space in clusters, which their number was determined before, this clustering is based on Euclidean distance between data and cluster center which are considered as similarity criterion.
Euclidean distance between data vectors of a cluster with the center of that cluster is less than their Euclidean distance with other cluster centers. Standard k-means algorithm is as below:  Initial positions of K cluster centers are determined randomly. Following phases are repeated: For each data vector: the vector is allocated to a cluster which its Euclidean distance from its center is less than the other cluster centers. The distance to cluster center is calculated by Eq. 1: In Eq. (1), X p is p th data vector, Z j is j th cluster center and D is the dimension of data and cluster center. b) Cluster center are updated by Eq. 2: In Eq. 2, n j is the number of data vectors corresponding to j th cluster and C j is a subset of the total data vectors which constitute j th cluster and are in it.
Phases (a) and (b) are repeated until stop criterion is satisfied (Hartigan, 1975).
Particle swarm optimization algorithm: PSO is one of the swarm intelligence methods and evolutionary optimization techniques, which was proposed by Kennedy and Eberhart (1995). PSO was presented according to animals social interactions such as bird folk and fish swarm. In this method, there is a swarm of particles that each of particles shows a feasible solution for optimization problem. Every particle tries to move toward final solution by adjusting its path and moving toward the best personal experience and also the best swarm experience.
Suppose that the population size is N.
During optimization process, velocity and position of each particle at each step is updated by Eq.3 and 4: where, x i,j is the component j of particle i, c 1 and c 2 are acceleration coefficients and w is inertia weight that can be a constant number or a positive function (Shi and Eberhart, 1998). R is a random number with uniform distribution in interval [0, 1]. Pbest i (t) is the best position that is found by particle i until time t (the best individual experience of particle i) and Gbest (t) is the best position that until time t is found by whole swarm's members (the best swarm experience). At each iteration, the best individual experience of particle i is given by Eq. 5: where, f(x) is the fitness value of vector x. The best swarm experience is given by Eq. 6: Clerc presented another version of PSO in which by using construction factor (CF-PSO), PSO convergence rate has been improved. In this version of PSO, particles velocity is updated by Eq. 7: Eberhart showed that the appropriate value of χ is 0.729843788 and c 1 =c 2 =2.05 (Eberhart and Shi, 2000). According to how particles move in PSO, particles may leave search space, which leads to decrease efficiency and algorithm convergence rate. To remove this problem, some constraints are considered for velocity components' values. For this reason, in each of iterations, after computing velocity by Eq. 3, all of its components' values would be considered in various dimensions. The value of each velocity vector component can be clamped to the range [-V max ,V max ] to reduce the likelihood of particles leaving the search space. The value of V max is usually chosen to be K×X max (Here, X max is the length of changes interval in search space dimensions), with 0.1≤K≤1 (Bergh and Engelbrecht, 2004). To find the optimal cluster centers, PSO algorithm applies Eq. 8 as the fitness function (Tsai and Kao, 2010). Eq. 8 shows generating function of Sum of Intra Cluster Distances (SICD) which is one of the most known evaluating criteria for clustering data. Less value of SICD is higher quality the clustering is performed. Therefore, for data clustering, PSO algorithm should minimize the fitness function in Eq. 8: In Eq. (8), the Euclidean distance between each data vector in a cluster and the centroid of that cluster is calculated and summed up. Here, we have K clusters C i (1≤ i ≤ K) that each of N data vectors X j ) are clustered on the basis of distance from each of these cluster centers Z i (1 ≤ i ≤ K). Data vectors belong to a cluster that their Euclidean distance from its cluster center is less than their Euclidean distance from other cluster centers. Thus, PSO's objective is to determine cluster centers that are minimizing Eq. 8. Since data vectors and cluster center vectors are d-dimensional and there are K clusters, eachparticle should represent K cluster centers in d-dimensional space, consequently it has K×d FQUOTE k×d components in its vector. Fig. 1 shows a vector of a particle that contains K ddimensional cluster centers.
Proposed algorithm: In this section, a new cooperative algorithm based on PSO and k-means algorithms is described. The purpose of designing the proposed algorithm is to take advantages of both algorithms and remove their weaknesses. K-means is of high convergence rate, but it's very sensitive to initializing the cluster centers and in the case of selecting inappropriate initial cluster centers, it could converge to a local optimum. PSO can pass local optima to some extent but cannot guarantee reaching to global optima. However, PSO's computational complexity for data clustering is much more than k-means. How the proposed algorithm functions remove weaknesses of these two algorithms and apply their advantages is as following: In the proposed algorithm, first, the particles are initialized in PSO. Each of particles contains K cluster centers which are displaced in the problem space by performing PSO algorithm. PSO continues to perform until the particles converge. After convergence of PSO, Gbest position including the best cluster centers which have found by particles so far is considered as the input of k-means. Then, k-means algorithm starts working and while it is not converged, it continues working. Therefore, PSO searches globally and as far as it can, it passes local optima. After convergence of PSO's particles, PSO's output would have an appropriate initial cluster centers for k-means. Hence, after sending PSO' outcome to k-means, this algorithm starts searching locally. Consequently, in the proposed algorithm, global search ability of PSO has been used and after converging, a great part of optimization process will be given to k-means to utilize high capability of local search of this algorithm and its high convergence rate. Since initial cluster centers for k-means are obtained by PSO and k-means is used for local search, k-means weakness of sensitivity to initial cluster centers is removed. But, PSO capability may not be enough for preventing from being trapped in local optima. If this algorithm is trapped in local optima, it cannot present proper initial cluster values to k-means. Thereafter, according to low ability of k-means in passing local optima, the obtained result cannot be acceptable. To raise this problem, after convergence of PSO, the output of this algorithm is sent to k-means. Simultaneously with starting of k-means, PSO's particles are initialized and start global search again. In fact, in one time of executing the proposed algorithm, PSO has many times of chance to perform an acceptable global search. It should be noted that in the proposed algorithm, in each time of executing PSO, particles just search globally and converge after a short time and k-means undertakes the remaining of optimization process which is local search. Therefore, with respect to low computational complexity of k-means, huge amount of computations for local search is prevented.
In the proposed algorithm, it has been tried to utilize this conserved computation load for giving new opportunities to PSO in order to perform an acceptable global search in at least one of given opportunities to it. Hence, for each execution of global search by PSO, k-means is also performed once. In the proposed algorithm, to determine the convergence of particle swarm, the difference of obtained results in consecutive iterations of performing the algorithm is used. When particles converge, the obtained results difference in consecutive iterations decreases, so by considering a threshold for the difference between Gbest fitness values in iterations i and j, it can determine their convergence. In the proposed algorithm, because PSO and k-means algorithms are performed multiple times, always, it has to save the best found cluster centers by algorithm so far. For this purpose, a bulletin is applied that each time k-means finishes after convergence of PSO, the obtained result of that will be compared with saved result in bulletin. If obtained cluster centers are better than saved result in bulletin, saved value in bulletin is updated. Kmeans execution finishes when after two consecutive iterations of its execution, cluster centers wouldn't be displaced. Pseudo code of the proposed algorithm is represented in Fig. 2.

Experiments:
Experiments were performed on 6 datasets and efficiency of k-means, PSO, hybridized algorithm of PSO and k-means called KPSO (Merwe and Engelbrecht, 2003) and Proposed method were compared on these datasets. In all the methods, the objective function is Eq. 8, which calculates sum of intra cluster distances. In this study 6 datasets were used that all of them were selected among the standard real dataset of UCI (http://archive.ics.uci.edu/ml/) which include datasets of Iris, WDBC, Sonar, Glass, Wine and Pima. Brief specifications of datasets including name, size, number of attributes, number of classes and number of available data in every class are given in Table 1. The performance of the six algorithms is evaluated and compared using the following criteria: Sum of Intra cluster Distances (SISD): The distance between each data vector within a cluster and the cluster center of that cluster is calculated and summed up. Eq. 8 is used for calculating the SICD which has to be minimized.
Error rate: It is defined as the number of misplaced points over the total number of the points in the dataset which is given by Eq. 9: N is the total number of the points in dataset and Class(i) represents the class number which point i belongs to and Cluster (i) represents the cluster number which point i was assigned to. Eq. 9 shows the number of misplaced points divided by the total number of points.
In the proposed algorithm, since it is convenient that PSO performs fast global search and converges, it should use a version of PSO which is of high convergence rate. Among PSO's versions, PSO with contraction coefficient (Shi and Eberhart, 1998) has higher convergence rate, therefore, this version is applied in the proposed algorithm. Parameters adjustment in algorithms is as following: Initial positions of cluster centers of all algorithms are selected among data vectors randomly. Population size in PSO, PSO with contraction coefficient, KPSO and the proposed algorithm is considered 5 times of problem space dimensions according to (Kao et al., 2008). In the proposed algorithm and PSO with contraction coefficient, c 1 = c 2 = 2.05 and χ = 0.729843788 are considered. c 1 = c 2 = 2 are considered in PSO and KPSO and inertia weight value is obtained by "w = 0.5 + rand/2" at each iteration (Kao et al., 2008). With respect to various experiments, if SICD relating to Gbest is less than 0.1 in 5 iterations, it means that particle swarm has converged. In (Kao et al., 2008) the number of iterations of algorithms execution based on PSO is equal to 10 times of the problem space dimensions. For instance, for clustering Iris dataset which has 3 data classes of four dimensions, the problem space would be 12 dimensional as Fig. 1 (D = 12). Therefore, the algorithms have to be run 120 times. One of the other stop criteria for performing the algorithms is based on the number of fitness evaluations. In this study, for fairness of comparisons, the number of times which PSO executes fitness evaluations in 10×D iterations of its performance, i.e. SICD is calculated, is considered as stop criterion of algorithms. Hereon, all the algorithms can do the same number of fitness evaluations until finishing their work. The algorithms are performed 50 times for data clustering and the best, mean and standard deviation of SICD obtained from algorithms are presented in Table 2 for clustering of 6 datasets. Fig. 3; illustrate the convergence behaviors of the five algorithms for iris and glass dataset. As it is observed in Fig. 3, the proposed algorithm is executed in less iteration in comparison with other algorithms because during execution of the proposed algorithm, these iterations are used for k-means execution. In fact, each iteration of PSO execution with N particles is equal to N iterations of k-means execution based on fitness evaluation number. Table 2, the proposed algorithm is of better efficiency than other tested algorithms. Nevertheless, it has to be mentioned that the best obtained result from CF-PSO is better than the proposed algorithm in three cases because this algorithm is of greater local search ability than k-means. Indeed, although CF-PSO has higher computational load than k-means for local search, the accuracy of its obtained results is better than k-means. As a result, the best obtained results from it are sometimes better than the proposed algorithm because in the proposed algorithm local search is done by k-means.  While the averages of results of the proposed algorithm, in all cases, are better than the other tested algorithms in this study. The reason is usage of strategies which have been used for global search in this algorithm. In fact, the proposed algorithm is successful in finding the global optima in most runs and can prevent final result from being trapped in local optima, whereas, this ability is observed less in other algorithms and they cannot guarantee passing local optima. This weakness causes that other algorithms to be of less strength and not to be able to reach to almost the same results in their various implementations. For instance, CF-PSO algorithm that has gotten better results in some cases couldn't reach to these results in various runs again. This weakness in k-means is observed more than other algorithms and the reason is its sensitivity to initial centers positions and low capability of passing local optimums. In KPSO, also since PSO is performed after k-means, in some cases, it may be possible that the improper obtained result from k-means even causes PSO's particles to be trapped in that local optimum. Therefore, KPSO has less strength in clustering some datasets either. Obtained results from CF-PSO in all conditions is better than PSO with random inertia weight. It shows that applying CF-PSO instead of PSO with random inertia weight can have better results. Totally, results of Table 2 show that the proposed algorithm is of very high strength in comparison to other tested algorithms which lower values of standard deviation of the proposed algorithm's results confirm it. Table 3 shows mean value of error rate with standard deviation of clustering of 5 algorithms on datasets of Table 1 over 50 runs. It is seen that, in 4 cases, mean value of error rate of proposed algorithm is less than the other algorithms and in Pima doesn't have the least error rate, while in this one case, mean value of SICD of proposed algorithm is better than the other algorithms'. This is there is no absolute correlation between the SICD and the error rate.

CONCLUSION
In this study, a new cooperative algorithm based on k-means and PSO is presented. In the proposed algorithm, PSO performs global search and k-means is responsible for local search. The process of the proposed algorithm is such that the strength and ability of preventing from being trapped in local optimums is improved. The proposed algorithm along with four other algorithms is used for clustering 6 standard datasets and obtained results are compared with each other. Experimental results show that the proposed algorithm is of higher robustness and better efficiency to other tested algorithms. To improve the obtained results of the proposed algorithm, it can increase local search ability around the best found position by the algorithm. This is issue that merits further research.