Research Article Open Access

Single Pass Seed Selection Algorithm for k-Means

K. Karteeka Pavan1, Allam Appa Rao1, A.V. Dattatreya Rao1 and G. R. Sridhar2
  • 1 , Afganistan
  • 2 ,
Journal of Computer Science
Volume 6 No. 1, 2010, 60-66

DOI: https://doi.org/10.3844/jcssp.2010.60.66

Submitted On: 27 June 2009 Published On: 31 January 2010

How to Cite: Pavan, K. K., Rao, A. A., Rao, A. D. & Sridhar, G. R. (2010). Single Pass Seed Selection Algorithm for k-Means. Journal of Computer Science, 6(1), 60-66. https://doi.org/10.3844/jcssp.2010.60.66

Abstract

Problem statement: The k-means method is one of the most widely used clustering techniques for various applications. However, the k-means often converges to local optimum and the result depends on the initial seeds. Inappropriate choice of initial seeds may yield poor results. k-means++ is a way of initializing k-means by choosing initial seeds with specific probabilities. Due to the random selection of first seed and the minimum probable distance, the k-means++ also results different clusters in different runs in different number of iterations. Approach: In this study we proposed a method called Single Pass Seed Selection (SPSS) algorithm as modification to k-means++ to initialize first seed and probable distance for k-means++ based on the point which was close to more number of other points in the data set. Result: We evaluated its performance by applying on various datasets and compare with k-means++. The SPSS algorithm was a single pass algorithm yielding unique solution in less number of iterations when compared to k-means++. Experimental results on real data sets (4-60 dimensions, 27-10945 objects and 2-10 clusters) from UCI demonstrated the effectiveness of the SPSS in producing consistent clustering results. Conclusion: SPSS performed well on high dimensional data sets. Its efficiency increased with the increase of features in the data set; particularly when number of features greater than 10 we suggested the proposed method.

  • 1,368 Views
  • 3,277 Downloads
  • 14 Citations

Download

Keywords

  • Clustering
  • k-means
  • k-means++
  • local optimum
  • minimum probable distance
  • SPSS