skip to main content
10.1145/2345396.2345414acmotherconferencesArticle/Chapter ViewAbstractPublication PagesacciciConference Proceedingsconference-collections
research-article

Far efficient K-means clustering algorithm

Published:03 August 2012Publication History

ABSTRACT

Clustering in data analysis means data with similar features are grouped together within a particular valid cluster. Each cluster consists of data that are more similar among themselves and dissimilar to data of other clusters. Clustering can be viewed as an unsupervised learning concept from machine learning perspective. In this paper, we have proposed an effective method to obtain better clustering with much reduced complexity. We have evaluated the performances of the classical K-Means approach of data clustering and the proposed Far Efficient K-Means method. The accuracy of both these algorithms were examined taking several data sets taken from UCI [13] repository of machine learning databases. Their clustering efficiency has been compared in conjunction with two typical cluster validity indices, namely the Davies-Bouldin Index and the Dunn's Index for different number of clusters, and our experimental results demonstrated that the quality of clustering by proposed method is much efficient than K-Means algorithm when larger data sets with more number of attributes are taken into consideration.

References

  1. Z. Li, J. Yuan, H. Yang and Ke Zhang, "K-Mean Algorithm with a Distance Based on the Characteristic of Differences", "IEEE International conference on Wireless communications, Networking and mobile computing", pp. 1--4, Oct. 2008.Google ScholarGoogle Scholar
  2. S. Saha S. Bandyopadhyay and C. Singh, "A New Line Symmetry Distance Based Pattern Classifier", "International joint conference on Neural networks as part of 2008 IEEE WCCI", pp. 1426--1433, 2008.Google ScholarGoogle Scholar
  3. Shi Na, L. Xumin, G. Yong, "Research on K-Means clustering algorithm-An Improved K-Means Clustering Algorithm", "IEEE Third International Symposium on Intelligent Information Technology and Security Informatics", pp. 63--67, Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. L. Davies and D. W. Bouldin, "A Cluster Separation Measure", "IEEE Trans. Pattern Analysis and Machine Intelligence", vol. 1, pp. 224--227, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", J. Cybernetics, vol. 3, pp. 32--57, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  6. T. Kanungo, D. Mount, N. Netanyahu, C. Piatko and A. Wu, "An Efficient K-Means Clustering Algorithm: Analysis and Implementation", "IEEE Transactions on Pattern analysis and Machine intelligence", vol. 24, no. 7, 2002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Xu and D. Wunsch, "Survey of Clustering Algorithms", "IEEE Transactions on Neural networks", vol. 16, no. 3, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. M. Cheung, "A New Generalized K-Means Clustering Algorithm", "Pattern Recognition Letters, Elsevier", vol. 24, issue 15, 2883--2893, Nov. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. S. Li, "Cluster Center Initialization Method for K-means Algorithm Over Data Sets with Two Clusters", "2011 International Conference on Advances in Engineering, Elsevier", pp. 324--328, vol.24, 2011.Google ScholarGoogle Scholar
  10. M. Erisoglu, N. Calis and S. Sakallioglu, "A new algorithm for initial cluster centers in K-Means algorithm", "Published in Pattern Recognition Letters", vol. 32, issue 14, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Napoleon and P. G. Laxmi, "An Efficient K-Means Clustering Algorithm for Reducing Time Complexity using Uniform Distribution Data Points", "IEEE Trendz in Information science and computing", pp. 42--45, Feb. 2011.Google ScholarGoogle Scholar
  12. J. Mac Queen, "Some methods for classification and analysis of multivariate observations", "Fifth Berkeley Symposium on Mathematics, Statistics and Probability", pp. 281--297, University of California Press, 1967.Google ScholarGoogle Scholar
  13. C. Merz and P. Murphy, UCI Repository of Machine Learning Databases, Available: fttp://ftp.ics.uci.edu/pub/machine-learning-databases.Google ScholarGoogle Scholar

Index Terms

  1. Far efficient K-means clustering algorithm

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics
      August 2012
      1307 pages
      ISBN:9781450311960
      DOI:10.1145/2345396

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader