skip to main content
10.1145/1980022.1980143acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicwetConference Proceedingsconference-collections
research-article

A survey on clustering in data mining

Authors Info & Claims
Published:25 February 2011Publication History

ABSTRACT

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. Unsupervised learning (clustering) deals with which have not been pre classified in any way and so do not have a class attribute associated with them. The scope of applying clustering algorithm is to discover useful but unknown classes of items. Unsupervised learning is an approach of learning where instances are automatically placed into meaningful groups based on their similarity. This paper addresses fundamental concepts of unsupervised learning while it serveys recent clustering algorithm and their complexities.

References

  1. A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ(1988) Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal R., Gehrke J., Gunopulos D. and Raghavan P. (1998). Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proc. of the 1998 ACM-SIGMOD Conf. On the Management of Data, 94--105 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Cheeseman P. & Stutz J., (1996), Bayesian Classification (AutoClass): Theory and Results, In U. M. Fayyad, G. piatetsky -Shapiro, P. Smith, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 153--180, AAAI/MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dubes, R. C. and Jain, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits, Ed. Academic Press, Inc., New York, NY, 113--125Google ScholarGoogle Scholar
  5. Fasulo, D. 1999. An analysis of recent work on clustering algorithms. Technical Report UW-CSE01-03-02, University of Washington.Google ScholarGoogle Scholar
  6. Ghosh, J., 2002. Scalable Clustering Methods for Data Mining. In Nong Ye (Ed.) Handbook of Data Mining, Lawrence Erlbaum, to appear.Google ScholarGoogle Scholar
  7. Goebel M., Gruenwald L. (1999), "A Survey Of Data Mining And Knowledge Discovery Software Tools", SIGKDD Explorations, Vol. 1,/no. 1, P 20--33, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guha, S., Rastogi, R., Shim K. (1998), "CURE: An Efficient Clustering Algorithm for Large Data sets", Published in the Proceedings of the ACM SIGMOD Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Kiri Wagsta, Claire Cardie, Seth Rogers, Stefan Schroedl, Constrained K-means Clustering with Background Knowledge, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577--584 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Kaufman and P. J. Rousseeuw, (1990), Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons.Google ScholarGoogle Scholar
  11. Murtagh, F. 1983. A survey of recent advances in hierarchical clustering algorithms. Computer Journal, 26, 4, 354--359.Google ScholarGoogle ScholarCross RefCross Ref
  12. M. N. Murty, A. K. Jain, P. J. Flynn, Data clustering: a review, ACM Comput. Surv. 31(3) (1999) 264--323 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Patrick L. Odell and Benjamin S. Duran, (1974), Cluster Analysis: A Survey, Springer-VerlagGoogle ScholarGoogle Scholar
  14. Pavel Berkhin A Servey of clustering data mining techniques. Technical Report. Accrue Software, Inc.Google ScholarGoogle Scholar
  15. Zhang, Y., Fu, A. W., Cai, C. H., and Heng. P.-A. 2000. Clustering categorical data. In Proceedings of the 16th ICDE, 305, San Diego, CA.Google ScholarGoogle ScholarCross RefCross Ref
  16. Zhang, T., Ramakrishnan, R., and Linvy, M. (1997), BIRCH: An efficient data clustering method for very large data sets. Data Mining and Knowledge discovery, 1(2), 141--182. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A survey on clustering in data mining

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology
          February 2011
          1385 pages
          ISBN:9781450304498
          DOI:10.1145/1980022

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 February 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader