ABSTRACT
Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. Unsupervised learning (clustering) deals with which have not been pre classified in any way and so do not have a class attribute associated with them. The scope of applying clustering algorithm is to discover useful but unknown classes of items. Unsupervised learning is an approach of learning where instances are automatically placed into meaningful groups based on their similarity. This paper addresses fundamental concepts of unsupervised learning while it serveys recent clustering algorithm and their complexities.
- A. K. Jain, R. C. Dubes, Algorithms for Clustering Data, Prentice Hall, Englewood Cliffs, NJ(1988) Google ScholarDigital Library
- Agrawal R., Gehrke J., Gunopulos D. and Raghavan P. (1998). Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proc. of the 1998 ACM-SIGMOD Conf. On the Management of Data, 94--105 Google ScholarDigital Library
- Cheeseman P. & Stutz J., (1996), Bayesian Classification (AutoClass): Theory and Results, In U. M. Fayyad, G. piatetsky -Shapiro, P. Smith, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 153--180, AAAI/MIT Press. Google ScholarDigital Library
- Dubes, R. C. and Jain, A. K. 1980. Clustering methodology in exploratory data analysis. In Advances in Computers, M. C. Yovits, Ed. Academic Press, Inc., New York, NY, 113--125Google Scholar
- Fasulo, D. 1999. An analysis of recent work on clustering algorithms. Technical Report UW-CSE01-03-02, University of Washington.Google Scholar
- Ghosh, J., 2002. Scalable Clustering Methods for Data Mining. In Nong Ye (Ed.) Handbook of Data Mining, Lawrence Erlbaum, to appear.Google Scholar
- Goebel M., Gruenwald L. (1999), "A Survey Of Data Mining And Knowledge Discovery Software Tools", SIGKDD Explorations, Vol. 1,/no. 1, P 20--33, June 1999. Google ScholarDigital Library
- Guha, S., Rastogi, R., Shim K. (1998), "CURE: An Efficient Clustering Algorithm for Large Data sets", Published in the Proceedings of the ACM SIGMOD Conference. Google ScholarDigital Library
- Kiri Wagsta, Claire Cardie, Seth Rogers, Stefan Schroedl, Constrained K-means Clustering with Background Knowledge, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, p. 577--584 Google ScholarDigital Library
- L. Kaufman and P. J. Rousseeuw, (1990), Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons.Google Scholar
- Murtagh, F. 1983. A survey of recent advances in hierarchical clustering algorithms. Computer Journal, 26, 4, 354--359.Google ScholarCross Ref
- M. N. Murty, A. K. Jain, P. J. Flynn, Data clustering: a review, ACM Comput. Surv. 31(3) (1999) 264--323 Google ScholarDigital Library
- Patrick L. Odell and Benjamin S. Duran, (1974), Cluster Analysis: A Survey, Springer-VerlagGoogle Scholar
- Pavel Berkhin A Servey of clustering data mining techniques. Technical Report. Accrue Software, Inc.Google Scholar
- Zhang, Y., Fu, A. W., Cai, C. H., and Heng. P.-A. 2000. Clustering categorical data. In Proceedings of the 16th ICDE, 305, San Diego, CA.Google ScholarCross Ref
- Zhang, T., Ramakrishnan, R., and Linvy, M. (1997), BIRCH: An efficient data clustering method for very large data sets. Data Mining and Knowledge discovery, 1(2), 141--182. Google ScholarDigital Library
Index Terms
- A survey on clustering in data mining
Recommendations
A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects
AbstractClustering is an essential tool in data mining research and applications. It is the subject of active research in many fields of study, such as computer science, data science, statistics, pattern recognition, artificial intelligence, ...
Highlights- Provide an up-to-date comprehensive review of the different clustering techniques .
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information TechnologyClustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Comments