skip to main content
10.1145/1401890.1402008acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

ArnetMiner: extraction and mining of academic social networks

Published: 24 August 2008 Publication History

Abstract

This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

References

[1]
L. A. Adamic and E. Adar. How to search a social network. Social Networks, 27:187--203, 2005.
[2]
C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction to mcmc for machine learning. Machine Learning, 50:5--43, 2003.
[3]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, 1999.
[4]
K. Balog, L. Azzopardi, and M. de Rijke. Formal models for expert finding in enterprise corpora. In Proc. of SIGIR'06, pages 43--55, 2006.
[5]
S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In Proc. of KDD'04, pages 59--68, 2004.
[6]
R. Bekkerman and A. McCallum. Disambiguating web appearances of people in a social network. In Proc. of WWW'05, pages 463--470, 2005.
[7]
D. M. Blei and J. D. McAuliffe. Supervised topic models. In Proc. of NIPS'07, 2007.
[8]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[9]
D. Brickley and L. Miller. Foaf vocabulary specification. In Namespace Document, http://xmlns.com/foaf/0.1/, September 2004.
[10]
C. Buckley and E. M. Voorhees. Retrieval evaluation with incomplete information. In Proc. of SIGIR'04, pages 25--32, 2004.
[11]
F. Ciravegna. An adaptive algorithm for information extraction from web-related texts. In Proc. of IJCAI'01 Workshop, August 2001.
[12]
C. Cortes and V. Vapnikn. Support-vector networks. Machine Learning, 20:273--297, 1995.
[13]
N. Craswell, A. P. de Vries, and I. Soboroff. Overview of the trec-2005 enterprise track. In TREC'05, pages 199--205, 2005.
[14]
H. Han, L. Giles, H. Zha, C. Li, and K. Tsioutsiouliklis. Two supervised learning approaches for name disambiguation in author citations. In Proc. of JCDL'04, pages 296--305, 2004.
[15]
H. Han, H. Zha, and C. L. Giles. Name disambiguation in author citations using a k-way spectral clustering method. In Proc. of JCDL'05, pages 334--343, 2005.
[16]
T. Hofmann. Collaborative filerting via gaussian probabilistic latent semantic analysis. In Proc.of SIGIR'03, pages 259--266, 1999.
[17]
T. Hofmann. Probabilistic latent semantic indexing. In Proc.of SIGIR'99, pages 50--57, 1999.
[18]
H. Kautz, B. Selman, and M. Shah. Referral web: Combining social networks and collaborative filtering. Communications of the ACM, 40(3):63--65, 1997.
[19]
T. Kristjansson, A. Culotta, P. Viola, and A. McCallum. Interactive information extraction with constrained conditional random fields. In Proc. of AAAI'04, 2004.
[20]
J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of ICML'01, 2001.
[21]
A. McCallum. Multi-label text classification with a mixture model trained by em. In Proc. of AAAI'99 Workshop, 1999.
[22]
D. Mimno and A. McCallum. Expertise modeling for matching papers with reviewers. In Proc. of KDD'07, pages 500--509, 2007.
[23]
T. Minka. Estimating a dirichlet distribution. In Technique Report, http://research.microsoft.com/ minka/papers/dirichlet/, 2003.
[24]
Z. Nie, Y. Ma, S. Shi, J.-R. Wen, and W.-Y. Ma. Web object retrieval. In Proc. of WWW'07, pages 81--90, 2007.
[25]
M. Rosen-Zvi, T. Griffiths, M. Steyvers, and P. Smyth. The author-topic model for authors and documents. In Proc. of UAI'04, 2004.
[26]
M. Steyvers, P. Smyth, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proc. of SIGKDD'04, 2004.
[27]
Y. F. Tan, M.-Y. Kan, and D. Lee. Search engine driven author disambiguation. In Proc. of JCDL'06, pages 314--315, 2006.
[28]
J. Tang, D. Zhang, and L. Yao. Social network extraction of academic researchers. In Proc. of ICDM'07, pages 292--301, 2007.
[29]
X. Wei and W. B. Croft. Lda-based document models for ad-hoc retrieval. In Proc. of SIGIR'06, pages 178--185, 2006.
[30]
E. Xun, C. Huang, and M. Zhou. A unified statistical model for the identification of english basenp. In Proc. of ACL'00, 2000.
[31]
X. Yin, J. Han, and P. Yu. Object distinction: Distinguishing objects with identical names. In Proc. of ICDE'2007, pages 1242--1246, 2007.
[32]
K. Yu, G. Guan, and M. Zhou. Resume information extraction with cascaded hybrid model. In Proc. of ACL'05, pages 499--506, 2005.

Cited By

View all
  • (2025)DynGraph-BERT: Combining BERT and GNN Using Dynamic Graphs for Inductive Semi-Supervised Text ClassificationInformatics10.3390/informatics1201002012:1(20)Online publication date: 17-Feb-2025
  • (2025)Topology-Preserving Graph Coarsening: An Elementary Collapse-Based ApproachProceedings of the VLDB Endowment10.14778/3704965.370498117:13(4760-4772)Online publication date: 18-Feb-2025
  • (2025)Link prediction of heterogeneous complex networks based on an improved embedding learning algorithmPLOS ONE10.1371/journal.pone.031550720:1(e0315507)Online publication date: 7-Jan-2025
  • Show More Cited By

Index Terms

  1. ArnetMiner: extraction and mining of academic social networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2008
      1116 pages
      ISBN:9781605581934
      DOI:10.1145/1401890
      • General Chair:
      • Ying Li,
      • Program Chairs:
      • Bing Liu,
      • Sunita Sarawagi
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 24 August 2008

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. association search
      2. expertise search
      3. information extraction
      4. name disambiguation
      5. social network
      6. topic modeling

      Qualifiers

      • Research-article

      Conference

      KDD08

      Acceptance Rates

      KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)526
      • Downloads (Last 6 weeks)54
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)DynGraph-BERT: Combining BERT and GNN Using Dynamic Graphs for Inductive Semi-Supervised Text ClassificationInformatics10.3390/informatics1201002012:1(20)Online publication date: 17-Feb-2025
      • (2025)Topology-Preserving Graph Coarsening: An Elementary Collapse-Based ApproachProceedings of the VLDB Endowment10.14778/3704965.370498117:13(4760-4772)Online publication date: 18-Feb-2025
      • (2025)Link prediction of heterogeneous complex networks based on an improved embedding learning algorithmPLOS ONE10.1371/journal.pone.031550720:1(e0315507)Online publication date: 7-Jan-2025
      • (2025)A Survey of Change Point Detection in Dynamic GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.352385737:3(1030-1048)Online publication date: Mar-2025
      • (2025)Rethinking Unsupervised Graph Anomaly Detection With Deep Learning: Residuals and ObjectivesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.350130737:2(881-895)Online publication date: Feb-2025
      • (2025)CDCGANNeural Networks10.1016/j.neunet.2024.106933183:COnline publication date: 1-Mar-2025
      • (2025)ANOGAT-Sparse-TL: A hybrid framework combining sparsification and graph attention for anomaly detection in attributed networks using the optimized loss function incorporating the twersky loss for improved robustness.Knowledge-Based Systems10.1016/j.knosys.2025.113144(113144)Online publication date: Feb-2025
      • (2025)Graph Anomaly Detection via Diffusion Enhanced Multi-View Contrastive LearningKnowledge-Based Systems10.1016/j.knosys.2025.113093311(113093)Online publication date: Feb-2025
      • (2025)Disentangled hyperbolic representation learning for heterogeneous graphsKnowledge-Based Systems10.1016/j.knosys.2025.112976310(112976)Online publication date: Feb-2025
      • (2025)Quantifying the degree of scientific innovation breakthroughInformation Processing and Management: an International Journal10.1016/j.ipm.2024.10393362:1Online publication date: 1-Jan-2025
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media