ABSTRACT
This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.
- L. A. Adamic and E. Adar. How to search a social network. Social Networks, 27:187--203, 2005.Google ScholarCross Ref
- C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 50:5--43, 2003.Google ScholarCross Ref
- R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999. Google ScholarDigital Library
- K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. of SIGIR'06, pages 43--55, 2006. Google ScholarDigital Library
- S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of KDD'04, pages 59--68, 2004. Google ScholarDigital Library
- R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. In Proc. of WWW'05, pages 463--470, 2005. Google ScholarDigital Library
- D. M. Blei and J. D. McAuliffe. Supervised topic models. In Proc. of NIPS'07, 2007.Google Scholar
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003. Google ScholarDigital Library
- D. Brickley and L. Miller. Foaf vocabulary specification. In Namespace Document, http://xmlns.com/foaf/0.1/, September 2004.Google Scholar
- C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR'04, pages 25--32, 2004. Google ScholarDigital Library
- F. Ciravegna. An adaptive algorithm for information extraction from web-related texts. In Proc. of IJCAI'01 Workshop, August 2001.Google Scholar
- C. Cortes and V. Vapnikn. Support-vector networks. Machine Learning, 20:273--297, 1995. Google ScholarDigital Library
- N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC'05, pages 199--205, 2005.Google Scholar
- H. Han, L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proc. of JCDL'04, pages 296--305, 2004. Google ScholarDigital Library
- H. Han, H. Zha, and C. L. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proc. of JCDL'05, pages 334--343, 2005. Google ScholarDigital Library
- T. Hofmann. Collaborative filerting via gaussian probabilistic latent semantic analysis. In Proc.of SIGIR'03, pages 259--266, 1999. Google ScholarDigital Library
- T. Hofmann. Probabilistic latent semantic indexing. In Proc.of SIGIR'99, pages 50--57, 1999. Google ScholarDigital Library
- H. Kautz, B. Selman, and M. Shah. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63--65, 1997. Google ScholarDigital Library
- T. Kristjansson, A. Culotta, P. Viola, and A. McCallum. Interactive information extraction with constrained conditional random fields. In Proc. of AAAI'04, 2004. Google ScholarDigital Library
- J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML'01, 2001. Google ScholarDigital Library
- A. McCallum. Multi-label text classification with a mixture model trained by em. In Proc. of AAAI'99 Workshop, 1999.Google Scholar
- D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In Proc. of KDD'07, pages 500--509, 2007. Google ScholarDigital Library
- T. Minka. Estimating a dirichlet distribution. In Technique Report, http://research.microsoft.com/ minka/papers/dirichlet/, 2003.Google Scholar
- Z. Nie, Y. Ma, S. Shi, J.-R. Wen, and W.-Y. Ma. Web object retrieval. In Proc. of WWW'07, pages 81--90, 2007. Google ScholarDigital Library
- M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proc. of UAI'04, 2004. Google ScholarDigital Library
- M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proc. of SIGKDD'04, 2004. Google ScholarDigital Library
- Y. F. Tan, M.-Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proc. of JCDL'06, pages 314--315, 2006. Google ScholarDigital Library
- J. Tang, D. Zhang, and L. Yao. Social network extraction of academic researchers. In Proc. of ICDM'07, pages 292--301, 2007. Google ScholarDigital Library
- X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proc. of SIGIR'06, pages 178--185, 2006. Google ScholarDigital Library
- E. Xun, C. Huang, and M. Zhou. A unified statistical model for the identification of english basenp. In Proc. of ACL'00, 2000. Google ScholarDigital Library
- X. Yin, J. Han, and P. Yu. Object distinction: Distinguishing objects with identical names. In Proc. of ICDE'2007, pages 1242--1246, 2007.Google ScholarCross Ref
- K. Yu, G. Guan, and M. Zhou. Resume information extraction with cascaded hybrid model. In Proc. of ACL'05, pages 499--506, 2005. Google ScholarDigital Library
Index Terms
- ArnetMiner: extraction and mining of academic social networks
Recommendations
Topic level expertise search over heterogeneous networks
In this paper, we present a topic level expertise search framework for heterogeneous networks. Different from the traditional Web search engines that perform retrieval and ranking at document level (or at object level), we investigate the problem of ...
Extraction and mining of an academic social network
WWW '08: Proceedings of the 17th international conference on World Wide WebThis paper addresses several key issues in extraction and mining of an academic social network: 1) extraction of a researcher social network from the existing Web; 2) integration of the publications from existing digital libraries; 3) expertise search ...
An academic search and analysis prototype for specific domain
APWeb'12: Proceedings of the 14th international conference on Web Technologies and ApplicationsThere exist several powerful and popular academic search engines, such as Microsoft Academic Search, Google Scholar and CiteSeerX, etc. However, query answering is now being required by users in addition to existed keyword and semantic search. Academic ...
Comments