Single Pass Seed Selection Algorithm for k-Means

K. Karteeka Pavan; Allam Appa Rao; A.V. Dattatreya Rao; G. R. Sridhar

doi:10.3844/jcssp.2010.60.66

Research Article Open Access

Single Pass Seed Selection Algorithm for k-Means

K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G. R. Sridhar

Abstract

Problem statement: The k-means method is one of the most widely used clustering techniques for various applications. However, the k-means often converges to local optimum and the result depends on the initial seeds. Inappropriate choice of initial seeds may yield poor results. k-means++ is a way of initializing k-means by choosing initial seeds with specific probabilities. Due to the random selection of first seed and the minimum probable distance, the k-means++ also results different clusters in different runs in different number of iterations. Approach: In this study we proposed a method called Single Pass Seed Selection (SPSS) algorithm as modification to k-means++ to initialize first seed and probable distance for k-means++ based on the point which was close to more number of other points in the data set. Result: We evaluated its performance by applying on various datasets and compare with k-means++. The SPSS algorithm was a single pass algorithm yielding unique solution in less number of iterations when compared to k-means++. Experimental results on real data sets (4-60 dimensions, 27-10945 objects and 2-10 clusters) from UCI demonstrated the effectiveness of the SPSS in producing consistent clustering results. Conclusion: SPSS performed well on high dimensional data sets. Its efficiency increased with the increase of features in the data set; particularly when number of features greater than 10 we suggested the proposed method.

Journal of Computer Science

Volume 6 No. 1, 2010, 60-66

DOI: https://doi.org/10.3844/jcssp.2010.60.66

Submitted On: 27 June 2009 Published On: 31 January 2010

How to Cite: Pavan, K. K., Rao, A. A., Rao, A. D. & Sridhar, G. R. (2010). Single Pass Seed Selection Algorithm for k-Means. Journal of Computer Science, 6(1), 60-66. https://doi.org/10.3844/jcssp.2010.60.66

Copyright: © 2010 K. Karteeka Pavan, Allam Appa Rao, A.V. Dattatreya Rao and G. R. Sridhar. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

6,667 Views
7,011 Downloads
20 Citations

Download

Keywords

Clustering
k-means
k-means++
local optimum
minimum probable distance
SPSS