skip to main content
10.1145/3097983.3098036acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

metapath2vec: Scalable Representation Learning for Heterogeneous Networks

Published: 04 August 2017 Publication History

Abstract

We study the problem of representation learning in heterogeneous networks. Its unique challenges come from the existence of multiple types of nodes and links, which limit the feasibility of the conventional network embedding techniques. We develop two scalable representation learning models, namely metapath2vec and metapath2vec++. The metapath2vec model formalizes meta-path-based random walks to construct the heterogeneous neighborhood of a node and then leverages a heterogeneous skip-gram model to perform node embeddings. The metapath2vec++ model further enables the simultaneous modeling of structural and semantic correlations in heterogeneous networks. Extensive experiments show that metapath2vec and metapath2vec++ are able to not only outperform state-of-the-art embedding models in various heterogeneous network mining tasks, such as node classification, clustering, and similarity search, but also discern the structural and semantic correlations between diverse network objects.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, and others. 2016. TensorFlow: A system for large-scale machine learning OSDI '16.
[2]
Amr Ahmed, Nino Shervashidze, Shravan Narayanamurthy, Vanja Josifovski, and Alexander J. Smola 2013. Distributed Large-scale Natural Graph Factorization WWW 13. ACM, 37--48.
[3]
Yoshua Bengio, Aaron Courville, and Pierre Vincent. 2013. Representation learning: A review and new perspectives. IEEE TPAMI, Vol. 35, 8 (2013), 1798--1828.
[4]
Shiyu Chang, Wei Han, Jiliang Tang, Guo-Jun Qi, Charu C. Aggarwal, and Thomas S. Huang 2015. Heterogeneous Network Embedding via Deep Architectures KDD '15. ACM, 119--128.
[5]
Ting Chen and Yizhou Sun 2017. Task-Guided and Path-Augmented Heterogeneous Network Embedding for Author Identification WSDM '17. ACM.
[6]
Yuxiao Dong, Jing Zhang, Jie Tang, Nitesh V. Chawla, and Bai Wang 2015. CoupledLP: Link Prediction in Coupled Networks. In KDD '15. ACM, 199--208.
[7]
Yoav Goldberg and Omer Levy 2014. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. CoRR Vol. abs/1402.3722 (2014).
[8]
Aditya Grover and Jure Leskovec 2016. Node2Vec: Scalable Feature Learning for Networks. KDD '16. ACM, 855--864.
[9]
Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang Tong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li 2012. Rolx: structural role extraction & mining in large graphs KDD '12. ACM, 1231--1239.
[10]
Peter D Hoff, Adrian E Raftery, and Mark S Handcock. 2002. Latent space approaches to social network analysis. Journal of the American Statistical association, Vol. 97, 460 (2002), 1090--1098.
[11]
Xiao Huang, Jundong Li, and Xia Hu 2017. Label Informed Attributed Network Embedding. In WSDM '17. na.
[12]
Zhipeng Huang, Yudian Zheng, Reynold Cheng, Yizhou Sun, Nikos Mamoulis, and Xiang Li. 2016. Meta structure: Computing relevance in large heterogeneous information networks KDD '16. ACM, 1595--1604.
[13]
Ming Ji, Jiawei Han, and Marina Danilevsky 2011. Ranking-based classification of heterogeneous information networks KDD '11. ACM, 1298--1306.
[14]
Yehuda Koren. 2008. Factorization meets the neighborhood: a multifaceted collaborative filtering model KDD '08. ACM, 426--434.
[15]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature, Vol. 521, 7553 (2015), 436--444.
[16]
Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King 2011. Recommender systems with social regularization. In WSDM '11. 287--296.
[17]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. CoRR Vol. abs/1301.3781 (2013).
[18]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean 2013. Distributed representations of words and phrases and their compositionality NIPS '13. 3111--3119.
[19]
Jennifer Neville and David Jensen 2005. Leveraging relational autocorrelation with latent group models Proceedings of the 4th international workshop on Multi-relational mining. ACM, 49--55.
[20]
Mingdong Ou, Peng Cui, Jian Pei, Ziwei Zhang, and Wenwu Zhu 2016. Asymmetric Transitivity Preserving Graph Embedding KDD '16. ACM, 1105--1114.
[21]
Siddharth Pal, Yuxiao Dong, Bishal Thapa, Nitesh V Chawla, Ananthram Swami, and Ram Ramanathan. 2016. Deep learning for network analysis: Problems, approaches and challenges Military Communications Conference, MILCOM 2016--2016. IEEE, 588--593.
[22]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. DeepWalk: Online Learning of Social Representations KDD '14. ACM, 701--710.
[23]
Xiang Ren, Wenqi He, Meng Qu, Clare R Voss, Heng Ji, and Jiawei Han. 2016. Label noise reduction in entity typing by heterogeneous partial-label embedding KDD '16. ACM.
[24]
Xin Rong 2014. word2vec Parameter Learning Explained. CoRR Vol. abs/1411.2738 (2014).
[25]
Yizhou Sun and Jiawei Han 2012. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers.
[26]
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu 2011. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks VLDB '11. 992--1003.
[27]
Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. 2012. Integrating Meta-path Selection with User-guided Object Clustering in Heterogeneous Information Networks. In KDD '12. ACM, 1348--1356.
[28]
Yizhou Sun, Yintao Yu, and Jiawei Han 2009. Ranking-based Clustering of Heterogeneous Information Networks with Star Network Schema KDD '09. ACM, 797--806.
[29]
Jian Tang, Meng Qu, and Qiaozhu Mei 2015. PTE: Predictive Text Embedding Through Large-scale Heterogeneous Text Networks KDD '15. ACM, 1165--1174.
[30]
Jian Tang, Meng Qu, Mingzhe Wang, Ming Zhang, Jun Yan, and Qiaozhu Mei. 2015. LINE: Large-scale Information Network Embedding. WWW '15. ACM.
[31]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su 2008. ArnetMiner: Extraction and Mining of Academic Social Networks KDD '08. 990--998.
[32]
Lei Tang and Huan Liu. 2009. Relational learning via latent social dimensions. KDD '09. 817--826.
[33]
Lei Tang and Huan Liu. 2011. Leveraging social media networks for classification. DMKD, Vol. 23, 3 (2011), 447--478.
[34]
Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang Zhang, Qiang Yang, and Stephen Lin. 2007. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE TPAMI, Vol. 29, 1 (2007).
[35]
Jing Zhang, Jie Tang, Cong Ma, Hanghang Tong, Yu Jing, and Juanzi Li. 2015. Panther: Fast top-k similarity search on large networks KDD '15. ACM, 1445--1454.

Cited By

View all
  • (2025)High-Reputation Food Formulas: A Heterogeneous Information Network Representation and Semantic Analysis ApproachApplied Sciences10.3390/app1505237515:5(2375)Online publication date: 23-Feb-2025
  • (2025)Synergistic Multi-Drug Combination Prediction Based on Heterogeneous Network Representation Learning with Contrastive LearningTsinghua Science and Technology10.26599/TST.2023.901014930:1(215-233)Online publication date: Feb-2025
  • (2025)Restage: Relation Structure-Aware Hierarchical Heterogeneous Graph EmbeddingTsinghua Science and Technology10.26599/TST.2023.901014730:1(198-214)Online publication date: Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature learning
  2. heterogeneous information networks
  3. heterogeneous representation learning
  4. latent representations
  5. network embedding

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 605 of 4,597 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2,913
  • Downloads (Last 6 weeks)280
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)High-Reputation Food Formulas: A Heterogeneous Information Network Representation and Semantic Analysis ApproachApplied Sciences10.3390/app1505237515:5(2375)Online publication date: 23-Feb-2025
  • (2025)Synergistic Multi-Drug Combination Prediction Based on Heterogeneous Network Representation Learning with Contrastive LearningTsinghua Science and Technology10.26599/TST.2023.901014930:1(215-233)Online publication date: Feb-2025
  • (2025)Restage: Relation Structure-Aware Hierarchical Heterogeneous Graph EmbeddingTsinghua Science and Technology10.26599/TST.2023.901014730:1(198-214)Online publication date: Feb-2025
  • (2025)Link prediction of heterogeneous complex networks based on an improved embedding learning algorithmPLOS ONE10.1371/journal.pone.031550720:1(e0315507)Online publication date: 7-Jan-2025
  • (2025)EHG: efficient heterogeneous graph transformer for multiclass node classificationAdvances in Continuous and Discrete Models10.1186/s13662-025-03885-02025:1Online publication date: 4-Feb-2025
  • (2025)Predicting drug combination side effects based on a metapath-based heterogeneous graph neural networkBMC Bioinformatics10.1186/s12859-024-06028-626:1Online publication date: 15-Jan-2025
  • (2025)MetapathVis: Inspecting the Effect of Metapath in Heterogeneous Network Embedding via Visual AnalyticsComputer Graphics Forum10.1111/cgf.15285Online publication date: 31-Jan-2025
  • (2025)A Context-Aware Clustering Approach for Assisting Operators in Classifying Security AlertsIEEE Transactions on Software Engineering10.1109/TSE.2024.349758851:1(153-171)Online publication date: 1-Jan-2025
  • (2025)Graph Cross-Correlated Network for RecommendationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.349177837:2(710-723)Online publication date: Feb-2025
  • (2025)Contextual Inference From Sparse Shopping Transactions Based on Motif PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.345263837:2(572-583)Online publication date: Feb-2025
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media