ABSTRACT
Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has utilized supervised data in one of two approaches: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distance-function learning methods that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semi-supervised clustering algorithms.
- Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. Proceedings of the 43rd IEEE Symposium on Foundations of Computer Science (FOCS-02) (pp. 238--247). Google ScholarDigital Library
- Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2003). Learning distance functions using equivalence relations. Proceedings of 20th International Conference on Machine Learning (ICML-2003) (pp. 11--18).Google Scholar
- Basu, S., Banerjee, A., & Mooney, R. J. (2002). Semi-supervised clustering by seeding. Proceedings of 19th International Conference on Machine Learning (ICML-2002) (pp. 19--26). Google ScholarDigital Library
- Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In submission, available at http://www.cs.utexas.edu/~ml/publication. Google ScholarDigital Library
- Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003) (pp. 39--48). Google ScholarDigital Library
- Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models (Tech. Report ICSI-TR-97-021). ICSI.Google Scholar
- Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.Google Scholar
- Cohn, D., Caruana, R., & McCallum, A. (2003). Semi-supervised clustering with user feedback (Tech. Report TR2003-1892). Cornell University.Google Scholar
- Demiriz, A., Bennett, K. P., & Embrechts, M. J. (1999). Semi-supervised clustering using genetic algorithms. Artificial Neural Networks in Engineering (ANNIE-99) (pp. 809--814).Google Scholar
- Domeniconi, C. (2002). Locally adaptive techniques for pattern classification. Doctoral dissertation, University of California, Riverside. Google ScholarDigital Library
- Klein, D., Kamvar, S. D., & Manning, C. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. Proceedings of the The Nineteenth International Conference on Machine Learning (ICML-2002) (pp. 307--314). Google ScholarDigital Library
- Kleinberg, J., & Tardos, E. (1999). Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (FOCS-99) (pp. 14--23). Google ScholarDigital Library
- Saul, L., & Roweis, S. (2003). Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119--155. Google ScholarDigital Library
- Segal, E., Wang, H., & Koller, D. (2003). Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, i264--i272.Google ScholarCross Ref
- Schultz, M., and Joachims, T. (2004). Learning a distance metric from relative comparisons. Advances in Neural Information Processing Systems 16.Google Scholar
- Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained K-Means clustering with background knowledge. Proceedings of 18th International Conference on Machine Learning (ICML-2001) (pp. 577--584). Google ScholarDigital Library
- Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15 (pp. 505--512).Google Scholar
- Integrating constraints and metric learning in semi-supervised clustering
Recommendations
Semi-supervised clustering with metric learning: An adaptive kernel method
Most existing representative works in semi-supervised clustering do not sufficiently solve the violation problem of pairwise constraints. On the other hand, traditional kernel methods for semi-supervised clustering not only face the problem of manually ...
Semi-supervised Hierarchical Clustering
ICDM '11: Proceedings of the 2011 IEEE 11th International Conference on Data MiningSemi-supervised clustering (i.e., clustering with knowledge-based constraints) has emerged as an important variant of the traditional clustering paradigms. However, most existing semi-supervised clustering algorithms are designed for partitional ...
Density-based semi-supervised clustering
Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Comments