research-article

ArnetMiner: extraction and mining of academic social networks

Authors:

Zhong SuAuthors Info & Claims

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 990 - 998

https://doi.org/10.1145/1401890.1402008

Published: 24 August 2008 Publication History

Abstract

This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

References

[1]

L. A. Adamic and E. Adar. How to search a social network. Social Networks, 27:187--203, 2005.

[2]

C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 50:5--43, 2003.

[3]

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.

Digital Library

[4]

K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. of SIGIR'06, pages 43--55, 2006.

Digital Library

[5]

S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of KDD'04, pages 59--68, 2004.

Digital Library

[6]

R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. In Proc. of WWW'05, pages 463--470, 2005.

Digital Library

[7]

D. M. Blei and J. D. McAuliffe. Supervised topic models. In Proc. of NIPS'07, 2007.

[8]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.

Digital Library

[9]

D. Brickley and L. Miller. Foaf vocabulary specification. In Namespace Document, http://xmlns.com/foaf/0.1/, September 2004.

[10]

C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR'04, pages 25--32, 2004.

Digital Library

[11]

F. Ciravegna. An adaptive algorithm for information extraction from web-related texts. In Proc. of IJCAI'01 Workshop, August 2001.

[12]

C. Cortes and V. Vapnikn. Support-vector networks. Machine Learning, 20:273--297, 1995.

[13]

N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC'05, pages 199--205, 2005.

[14]

H. Han, L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proc. of JCDL'04, pages 296--305, 2004.

Digital Library

[15]

H. Han, H. Zha, and C. L. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proc. of JCDL'05, pages 334--343, 2005.

Digital Library

[16]

T. Hofmann. Collaborative filerting via gaussian probabilistic latent semantic analysis. In Proc.of SIGIR'03, pages 259--266, 1999.

Digital Library

[17]

T. Hofmann. Probabilistic latent semantic indexing. In Proc.of SIGIR'99, pages 50--57, 1999.

Digital Library

[18]

H. Kautz, B. Selman, and M. Shah. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63--65, 1997.

Digital Library

[19]

T. Kristjansson, A. Culotta, P. Viola, and A. McCallum. Interactive information extraction with constrained conditional random fields. In Proc. of AAAI'04, 2004.

Digital Library

[20]

J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML'01, 2001.

Digital Library

[21]

A. McCallum. Multi-label text classification with a mixture model trained by em. In Proc. of AAAI'99 Workshop, 1999.

[22]

D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In Proc. of KDD'07, pages 500--509, 2007.

Digital Library

[23]

T. Minka. Estimating a dirichlet distribution. In Technique Report, http://research.microsoft.com/ minka/papers/dirichlet/, 2003.

[24]

Z. Nie, Y. Ma, S. Shi, J.-R. Wen, and W.-Y. Ma. Web object retrieval. In Proc. of WWW'07, pages 81--90, 2007.

Digital Library

[25]

M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proc. of UAI'04, 2004.

Digital Library

[26]

M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proc. of SIGKDD'04, 2004.

Digital Library

[27]

Y. F. Tan, M.-Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proc. of JCDL'06, pages 314--315, 2006.

Digital Library

[28]

J. Tang, D. Zhang, and L. Yao. Social network extraction of academic researchers. In Proc. of ICDM'07, pages 292--301, 2007.

Digital Library

[29]

X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proc. of SIGIR'06, pages 178--185, 2006.

Digital Library

[30]

E. Xun, C. Huang, and M. Zhou. A unified statistical model for the identification of english basenp. In Proc. of ACL'00, 2000.

Digital Library

[31]

X. Yin, J. Han, and P. Yu. Object distinction: Distinguishing objects with identical names. In Proc. of ICDE'2007, pages 1242--1246, 2007.

[32]

K. Yu, G. Guan, and M. Zhou. Resume information extraction with cascaded hybrid model. In Proc. of ACL'05, pages 499--506, 2005.

Digital Library

Cited By

Perin ESouza MSilva JMatsubara E(2025)DynGraph-BERT: Combining BERT and GNN Using Dynamic Graphs for Inductive Semi-Supervised Text ClassificationInformatics10.3390/informatics1201002012:1(20)Online publication date: 17-Feb-2025
https://doi.org/10.3390/informatics12010020
Meng YLi RLin LLi XWang G(2025)Topology-Preserving Graph Coarsening: An Elementary Collapse-Based ApproachProceedings of the VLDB Endowment10.14778/3704965.370498117:13(4760-4772)Online publication date: 18-Feb-2025
https://doi.org/10.14778/3704965.3704981
Chai LHuang R(2025)Link prediction of heterogeneous complex networks based on an improved embedding learning algorithmPLOS ONE10.1371/journal.pone.031550720:1(e0315507)Online publication date: 7-Jan-2025
https://doi.org/10.1371/journal.pone.0315507
Show More Cited By

Index Terms

ArnetMiner: extraction and mining of academic social networks
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Data mining

Recommendations

Topic level expertise search over heterogeneous networks

In this paper, we present a topic level expertise search framework for heterogeneous networks. Different from the traditional Web search engines that perform retrieval and ranking at document level (or at object level), we investigate the problem of ...
Extraction and mining of an academic social network
WWW '08: Proceedings of the 17th international conference on World Wide Web

This paper addresses several key issues in extraction and mining of an academic social network: 1) extraction of a researcher social network from the existing Web; 2) integration of the publications from existing digital libraries; 3) expertise search on ...
An academic search and analysis prototype for specific domain
APWeb'12: Proceedings of the 14th international conference on Web Technologies and Applications

There exist several powerful and popular academic search engines, such as Microsoft Academic Search, Google Scholar and CiteSeerX, etc. However, query answering is now being required by users in addition to existed keyword and semantic search. Academic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2008

1116 pages

ISBN:9781605581934

DOI:10.1145/1401890

General Chair:
Ying Li
Microsoft adCenter Labs
,
Program Chairs:
Bing Liu
University of Illinois at Chicago
,
Sunita Sarawagi
Indian Institute of Technology, Bombay

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD08

Sponsor:

KDD08: The 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 24 - 27, 2008

Nevada, Las Vegas, USA

Acceptance Rates

KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1,542
Total Citations
View Citations
6,871
Total Downloads

Downloads (Last 12 months)526
Downloads (Last 6 weeks)54

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Perin ESouza MSilva JMatsubara E(2025)DynGraph-BERT: Combining BERT and GNN Using Dynamic Graphs for Inductive Semi-Supervised Text ClassificationInformatics10.3390/informatics1201002012:1(20)Online publication date: 17-Feb-2025
https://doi.org/10.3390/informatics12010020
Meng YLi RLin LLi XWang G(2025)Topology-Preserving Graph Coarsening: An Elementary Collapse-Based ApproachProceedings of the VLDB Endowment10.14778/3704965.370498117:13(4760-4772)Online publication date: 18-Feb-2025
https://doi.org/10.14778/3704965.3704981
Chai LHuang R(2025)Link prediction of heterogeneous complex networks based on an improved embedding learning algorithmPLOS ONE10.1371/journal.pone.031550720:1(e0315507)Online publication date: 7-Jan-2025
https://doi.org/10.1371/journal.pone.0315507
Zhou YGao SGuo DWei XRokne JWang H(2025)A Survey of Change Point Detection in Dynamic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352385737:3(1030-1048)Online publication date: Mar-2025
https://doi.org/10.1109/TKDE.2024.3523857
Ma XLiu FWu JYang JXue SSheng Q(2025)Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and ObjectivesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350130737:2(881-895)Online publication date: Feb-2025
https://doi.org/10.1109/TKDE.2024.3501307
Liu BZheng CSun FWang XPan L(2025)CDCGANNeural Networks10.1016/j.neunet.2024.106933183:COnline publication date: 1-Mar-2025
https://dl.acm.org/doi/10.1016/j.neunet.2024.106933
Khan WEbrahim N(2025)ANOGAT-Sparse-TL: A hybrid framework combining sparsification and graph attention for anomaly detection in attributed networks using the optimized loss function incorporating the twersky loss for improved robustness.Knowledge-Based Systems10.1016/j.knosys.2025.113144(113144)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113144
Kong XLiu JLi HZhang CDu JGuo DShen G(2025)Graph Anomaly Detection via Diffusion Enhanced Multi-View Contrastive LearningKnowledge-Based Systems10.1016/j.knosys.2025.113093311(113093)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.113093
Bai QNie CZhang HDou ZYuan X(2025)Disentangled hyperbolic representation learning for heterogeneous graphsKnowledge-Based Systems10.1016/j.knosys.2025.112976310(112976)Online publication date: Feb-2025
https://doi.org/10.1016/j.knosys.2025.112976
Runhui LYalin LZe JQiqi XXiaoyu C(2025)Quantifying the degree of scientific innovation breakthroughInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10393362:1Online publication date: 1-Jan-2025
https://dl.acm.org/doi/10.1016/j.ipm.2024.103933
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten