skip to main content
10.1145/2339530.2339738acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Integrating meta-path selection with user-guided object clustering in heterogeneous information networks

Authors Info & Claims
Published:12 August 2012Publication History

ABSTRACT

Real-world, multiple-typed objects are often interconnected, forming heterogeneous information networks. A major challenge for link-based clustering in such networks is its potential to generate many different results, carrying rather diverse semantic meanings. In order to generate desired clustering, we propose to use meta-path, a path that connects object types via a sequence of relations, to control clustering with distinct semantics. Nevertheless, it is easier for a user to provide a few examples ("seeds") than a weighted combination of sophisticated meta-paths to specify her clustering preference. Thus, we propose to integrate meta-path selection with user-guided clustering to cluster objects in networks, where a user first provides a small set of object seeds for each cluster as guidance. Then the system learns the weights for each meta-path that are consistent with the clustering result implied by the guidance, and generates clusters under the learned weights of meta-paths. A probabilistic approach is proposed to solve the problem, and an effective and efficient iterative algorithm, PathSelClus, is proposed to learn the model, where the clustering quality and the meta-path weights are mutually enhancing each other. Our experiments with several clustering tasks in two real networks demonstrate the power of the algorithm in comparison with the baselines.

Skip Supplemental Material Section

Supplemental Material

best_paper_2.mp4

mp4

341.2 MB

References

  1. E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing. Mixed membership stochastic blockmodels. J. Mach. Learn. Res., 9:1981--2014, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with bregman divergences. J. Mach. Learn. Res., 6:1705--1749, December 2005. Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. JMLR, 6, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Basu, A. Banerjee, and R. Mooney. Semi-supervised clustering by seeding. In ICML '02, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. KDD '04, pages 59--68, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bilenko, S. Basu, and R. J. Mooney. Integrating constraints and metric learning in semi-supervised clustering. In ICML '04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Cohn and H. Chang. Learning to probabilistically identify authoritative documents. In ICML '00, pages 167--174, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. S. Dhillon, S. Mallela, and R. Kumar. A divisive information-theoretic feature clustering algorithm for text classification. JMLR, 3:1265--1287, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157--1182, Mar. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Hofmann. Probabilistic latent semantic indexing. SIGIR '99, pages 50--57, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. Kulis, S. Basu, I. Dhillon, and R. Mooney. Semi-supervised graph clustering: a kernel approach. ICML '05, pages 457--464, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factorization. In NIPS '00, pages 556--562, 2000.Google ScholarGoogle Scholar
  13. B. Long, Z. (mark Zhang, X. Wu, and P. S. Yu. Spectral clustering for multi-type relational data. In ICML '06, pages 585--592, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. Long, Z. M. Zhang, and P. S. Yu. A probabilistic framework for relational clustering. In KDD '07, pages 470--479, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. U. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17:395--416, December 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. E. J. Newman. Fast algorithm for detecting community structure in networks. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 69(6), 2004.Google ScholarGoogle Scholar
  17. M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 69(2), 2004.Google ScholarGoogle Scholar
  18. K. Punera and J. Ghosh. Consensus-based ensembles of soft clusterings. Appl. Artif. Intell., 22:780--810, August 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22:888--905, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Strehl, J. Ghosh, and C. Cardie. Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583--617, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB '11, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In EDBT '09, pages 565--576, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In KDD '09, pages 797--806, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding. Community discovery using nonnegative matrix factorization. Data Mining and Knowledge Discovery, 20, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. N. Wang, S. Parthasarathy, K.-L. Tan, and A. K. H. Tung. Csv: visualizing and mining cohesive subgraphs. In SIGMOD '08, pages 445--458, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Xu, N. Yuruk, Z. Feng, and T. A. J. Schweiger. Scan: a structural clustering algorithm for networks. In KDD '07, pages 824--833, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Z. Xu, I. King, M. R.-T. Lyu, and R. Jin. Discriminative semi-supervised feature selection via manifold regularization. Trans. Neur. Netw., 21:1033--1047, July 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. X. Yin, J. Han, and P. S. Yu. Crossclus: user-guided multi-relational clustering. Data Mining and Knowledge Discovery, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Z. Zhao and H. Liu. Semi-supervised feature selection via spectral analysis. In ICDM '07, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. X. Zhu and Z. Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report Carnegie Mellon University-CALD-02-107, Carnegie Mellon University, 2002.Google ScholarGoogle Scholar
  31. X. Zhu, Z. Ghahramani, and J. D. Lafferty. Semi-Supervised learning using gaussian fields and harmonic functions. In ICML '03, pages 912--919, 2003.Google ScholarGoogle Scholar

Index Terms

  1. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 August 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader