research-article

A survey on clustering in data mining

Authors:
M. A. Dalal

M.G.M's college of Engineering and Technology, Kamothe, Navi Mumbai

M.G.M's college of Engineering and Technology, Kamothe, Navi Mumbai
View Profile

,
N. D. Harale

M.G.M's college of Engineering & Technology, Kamothe, Navi Mumbai

M.G.M's college of Engineering & Technology, Kamothe, Navi Mumbai
View Profile

ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in TechnologyFebruary 2011Pages 559–562https://doi.org/10.1145/1980022.1980143

Published:25 February 2011Publication History

ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology

Pages 559–562

ABSTRACT

Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. Unsupervised learning (clustering) deals with which have not been pre classified in any way and so do not have a class attribute associated with them. The scope of applying clustering algorithm is to discover useful but unknown classes of items. Unsupervised learning is an approach of learning where instances are automatically placed into meaningful groups based on their similarity. This paper addresses fundamental concepts of unsupervised learning while it serveys recent clustering algorithm and their complexities.

References

A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ(1988) Google ScholarDigital Library
Agrawal R., Gehrke J., Gunopulos D. and Raghavan P. (1998). Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proc. of the 1998 ACM-SIGMOD Conf. On the Management of Data, 94--105 Google ScholarDigital Library
Cheeseman P. & Stutz J., (1996), Bayesian Classification (AutoClass): Theory and Results, In U. M. Fayyad, G. piatetsky -Shapiro, P. Smith, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 153--180, AAAI/MIT Press. Google ScholarDigital Library
Dubes, R. C. and Jain, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits, Ed. Academic Press, Inc., New York, NY, 113--125Google Scholar
Fasulo, D. 1999. An analysis of recent work on clustering algorithms. Technical Report UW-CSE01-03-02, University of Washington.Google Scholar
Ghosh, J., 2002. Scalable Clustering Methods for Data Mining. In Nong Ye (Ed.) Handbook of Data Mining, Lawrence Erlbaum, to appear.Google Scholar
Goebel M., Gruenwald L. (1999), "A Survey Of Data Mining And Knowledge Discovery Software Tools", SIGKDD Explorations, Vol. 1,/no. 1, P 20--33, June 1999. Google ScholarDigital Library
Guha, S., Rastogi, R., Shim K. (1998), "CURE: An Efficient Clustering Algorithm for Large Data sets", Published in the Proceedings of the ACM SIGMOD Conference. Google ScholarDigital Library
Kiri Wagsta, Claire Cardie, Seth Rogers, Stefan Schroedl, Constrained K-means Clustering with Background Knowledge, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577--584 Google ScholarDigital Library
L. Kaufman and P. J. Rousseeuw, (1990), Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons.Google Scholar
Murtagh, F. 1983. A survey of recent advances in hierarchical clustering algorithms. Computer Journal, 26, 4, 354--359.Google ScholarCross Ref
M. N. Murty, A. K. Jain, P. J. Flynn, Data clustering: a review, ACM Comput. Surv. 31(3) (1999) 264--323 Google ScholarDigital Library
Patrick L. Odell and Benjamin S. Duran, (1974), Cluster Analysis: A Survey, Springer-VerlagGoogle Scholar
Pavel Berkhin A Servey of clustering data mining techniques. Technical Report. Accrue Software, Inc.Google Scholar
Zhang, Y., Fu, A. W., Cai, C. H., and Heng. P.-A. 2000. Clustering categorical data. In Proceedings of the 16th ICDE, 305, San Diego, CA.Google ScholarCross Ref
Zhang, T., Ramakrishnan, R., and Linvy, M. (1997), BIRCH: An efficient data clustering method for very large data sets. Data Mining and Knowledge discovery, 1(2), 141--182. Google ScholarDigital Library

Index Terms

A survey on clustering in data mining
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects
Abstract
Clustering is an essential tool in data mining research and applications. It is the subject of active research in many fields of study, such as computer science, data science, statistics, pattern recognition, artificial intelligence, ...
Highlights
- Provide an up-to-date comprehensive review of the different clustering techniques .
Read More
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Read More
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Clustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology
February 2011
1385 pages
ISBN:9781450304498
DOI:10.1145/1980022
Program Chair:
B. K. Mishra
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 February 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
data mining
hierarchical clustering
k- means
unsupervised learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 778
  Total Downloads
- Downloads (Last 12 months)23
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A survey on clustering in data mining

ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Improved k- means clustering algorithm for two dimensional data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A survey on clustering in data mining

ICWET '11: Proceedings of the International Conference & Workshop on Emerging Trends in Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Improved k- means clustering algorithm for two dimensional data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media