Single Pass Seed Selection Algorithm for k-Means
K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G. R. Sridhar
DOI : 10.3844/jcssp.2010.60.66
Journal of Computer Science
Volume 6, Issue 1
Problem statement: The k-means method is one of the most widely used clustering techniques for various applications. However, the k-means often converges to local optimum and the result depends on the initial seeds. Inappropriate choice of initial seeds may yield poor results. k-means++ is a way of initializing k-means by choosing initial seeds with specific probabilities. Due to the random selection of first seed and the minimum probable distance, the k-means++ also results different clusters in different runs in different number of iterations. Approach: In this study we proposed a method called Single Pass Seed Selection (SPSS) algorithm as modification to k-means++ to initialize first seed and probable distance for k-means++ based on the point which was close to more number of other points in the data set. Result: We evaluated its performance by applying on various datasets and compare with k-means++. The SPSS algorithm was a single pass algorithm yielding unique solution in less number of iterations when compared to k-means++. Experimental results on real data sets (4-60 dimensions, 27-10945 objects and 2-10 clusters) from UCI demonstrated the effectiveness of the SPSS in producing consistent clustering results. Conclusion: SPSS performed well on high dimensional data sets. Its efficiency increased with the increase of features in the data set; particularly when number of features greater than 10 we suggested the proposed method.
© 2010 K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G. R. Sridhar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.