skip to main content
10.1145/1015330.1015360acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Integrating constraints and metric learning in semi-supervised clustering

Published:04 July 2004Publication History

ABSTRACT

Semi-supervised clustering employs a small amount of labeled data to aid unsupervised learning. Previous work in the area has utilized supervised data in one of two approaches: 1) constraint-based methods that guide the clustering algorithm towards a better grouping of the data, and 2) distance-function learning methods that adapt the underlying similarity metric used by the clustering algorithm. This paper provides new methods for the two approaches as well as presents a new semi-supervised clustering algorithm that integrates both of these techniques in a uniform, principled framework. Experimental results demonstrate that the unified approach produces better clusters than both individual approaches as well as previously proposed semi-supervised clustering algorithms.

References

  1. Bansal, N., Blum, A., & Chawla, S. (2002). Correlation clustering. Proceedings of the 43rd IEEE Symposium on Foundations of Computer Science (FOCS-02) (pp. 238--247). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bar-Hillel, A., Hertz, T., Shental, N., & Weinshall, D. (2003). Learning distance functions using equivalence relations. Proceedings of 20th International Conference on Machine Learning (ICML-2003) (pp. 11--18).Google ScholarGoogle Scholar
  3. Basu, S., Banerjee, A., & Mooney, R. J. (2002). Semi-supervised clustering by seeding. Proceedings of 19th International Conference on Machine Learning (ICML-2002) (pp. 19--26). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Basu, S., Bilenko, M., & Mooney, R. J. (2004). A probabilistic framework for semi-supervised clustering. In submission, available at http://www.cs.utexas.edu/~ml/publication. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bilenko, M., & Mooney, R. J. (2003). Adaptive duplicate detection using learnable string similarity measures. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003) (pp. 39--48). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bilmes, J. (1997). A gentle tutorial on the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models (Tech. Report ICSI-TR-97-021). ICSI.Google ScholarGoogle Scholar
  7. Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html.Google ScholarGoogle Scholar
  8. Cohn, D., Caruana, R., & McCallum, A. (2003). Semi-supervised clustering with user feedback (Tech. Report TR2003-1892). Cornell University.Google ScholarGoogle Scholar
  9. Demiriz, A., Bennett, K. P., & Embrechts, M. J. (1999). Semi-supervised clustering using genetic algorithms. Artificial Neural Networks in Engineering (ANNIE-99) (pp. 809--814).Google ScholarGoogle Scholar
  10. Domeniconi, C. (2002). Locally adaptive techniques for pattern classification. Doctoral dissertation, University of California, Riverside. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Klein, D., Kamvar, S. D., & Manning, C. (2002). From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. Proceedings of the The Nineteenth International Conference on Machine Learning (ICML-2002) (pp. 307--314). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kleinberg, J., & Tardos, E. (1999). Approximation algorithms for classification problems with pairwise relationships: Metric labeling and Markov random fields. Proceedings of the 40th IEEE Symposium on Foundations of Computer Science (FOCS-99) (pp. 14--23). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Saul, L., & Roweis, S. (2003). Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4, 119--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Segal, E., Wang, H., & Koller, D. (2003). Discovering molecular pathways from protein interaction and gene expression data. Bioinformatics, 19, i264--i272.Google ScholarGoogle ScholarCross RefCross Ref
  15. Schultz, M., and Joachims, T. (2004). Learning a distance metric from relative comparisons. Advances in Neural Information Processing Systems 16.Google ScholarGoogle Scholar
  16. Wagstaff, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained K-Means clustering with background knowledge. Proceedings of 18th International Conference on Machine Learning (ICML-2001) (pp. 577--584). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Xing, E. P., Ng, A. Y., Jordan, M. I., & Russell, S. (2003). Distance metric learning, with application to clustering with side-information. Advances in Neural Information Processing Systems 15 (pp. 505--512).Google ScholarGoogle Scholar
  1. Integrating constraints and metric learning in semi-supervised clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICML '04: Proceedings of the twenty-first international conference on Machine learning
      July 2004
      934 pages
      ISBN:1581138385
      DOI:10.1145/1015330
      • Conference Chair:
      • Carla Brodley

      Copyright © 2004 Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 July 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate140of548submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader