ABSTRACT
With the emergence of web-based social and information applications, entity similarity search in information networks, aiming to find entities with high similarity to a given query entity, has gained wide attention. However, due to the diverse semantic meanings in heterogeneous information networks, which contain multi-typed entities and relationships, similarity measurement can be ambiguous without context. In this paper, we investigate entity similarity search and the resulting ambiguity problems in heterogeneous information networks. We propose to use a meta-path-based ranking model ensemble to represent semantic meanings for similarity queries, exploit the possibility of using using user-guidance to understand users query. Experiments on real-world datasets show that our framework significantly outperforms competitor methods.
- H. Abdi. The kendall rank correlation coefficient. Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage, pages 1--7, 2007.Google Scholar
- S. Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In WWW'07, pages 571--580, 2007. Google ScholarDigital Library
- C. Chang, Y. Du, J. Wang, S. Guo, and P. Thouin. Survey and comparative analysis of entropy and relative entropy thresholding techniques. In Vision, Image and Signal Processing, IEE Proceedings, volume 153, pages 837--850. IET, 2006.Google Scholar
- X. Geng, T. Liu, T. Qin, and H. Li. Feature selection for ranking. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 407--414. ACM, 2007. Google ScholarDigital Library
- S. Gu, J. Yan, L. Ji, S. Yan, J. Huang, N. Liu, Y. Chen, and Z. Chen. Cross domain random walk for query intent pattern mining from search engine log. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 221--230. IEEE, 2011. Google ScholarDigital Library
- G. Jeh and J. Widom. Simrank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543. ACM, 2002. Google ScholarDigital Library
- N. Lao and W. Cohen. Relational retrieval using a combination of path-constrained random walks. Machine learning, 81(1):53--67, 2010. Google ScholarDigital Library
- Y. Sun, R. Barber, M. Gupta, C. Aggarwal, and J. Han. Co-Author Relationship Prediction in Heterogeneous Bibliographic Networks. In Proceedings of 2011 Int. Conf. on Advances in Social Network Analysis and Mining. IEEE, 2011. Google ScholarDigital Library
- Y. Sun, J. Han, X. Yan, S. P. Yu, and T. Wu. PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks. In Proceedings of the 37th International Conference on Very Large Data Bases. ACM, 2011.Google Scholar
- X. Yu, Q. Gu, M. Zhou, and J. Han. Citation prediction in heterogeneous bibliographic networks. In Proc. of Siam International Conference on Data Mining, 2012.Google ScholarCross Ref
- X. Yu, A. Pan, L. Tang, Z. Li, and J. Han. Geo-friends recommendation in gps-based cyber-physical social network. In 2011 International Conference on Advances in Social Networks Analysis and Mining, pages 361--368. IEEE, 2011. Google ScholarDigital Library
Index Terms
- User guided entity similarity search using meta-path selection in heterogeneous information networks
Recommendations
Distant Meta-Path Similarities for Text-Based Heterogeneous Information Networks
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge ManagementMeasuring network similarity is a fundamental data mining problem. The mainstream similarity measures mainly leverage the structural information regarding to the entities in the network without considering the network semantics. In the real world, the ...
Unsupervised meta-path selection for text similarity measure based on heterogeneous information networks
Heterogeneous information network (HIN) is a general representation of many different applications, such as social networks, scholar networks, and knowledge networks. A key development of HIN is called PathSim based on meta-path, which measures the ...
HEEL: exploratory entity linking for heterogeneous information networks
AbstractA heterogeneous information network (HIN) is a ubiquitous data model, consisting of multiple types of entities and relations. Names of entities in HINs are inherently ambiguous, making it difficult to fully disambiguate a HIN. In this paper, we ...
Comments