skip to main content
10.1145/2684822.2685314acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Will This Paper Increase Your h-index?: Scientific Impact Prediction

Published: 02 February 2015 Publication History

Abstract

Scientific impact plays a central role in the evaluation of the output of scholars, departments, and institutions. A widely used measure of scientific impact is citations, with a growing body of literature focused on predicting the number of citations obtained by any given publication. The effectiveness of such predictions, however, is fundamentally limited by the power-law distribution of citations, whereby publications with few citations are extremely common and publications with many citations are relatively rare. Given this limitation, in this work we instead address a related question asked by many academic researchers in the course of writing a paper, namely: "Will this paper increase my h-index?" Using a real academic dataset with over 1.7 million authors, 2 million papers, and 8 million citation relationships from the premier online academic service ArnetMiner, we formalize a novel scientific impact prediction problem to examine several factors that can drive a paper to increase the primary author's h-index. We find that the researcher's authority on the publication topic and the venue in which the paper is published are crucial factors to the increase of the primary author's h-index, while the topic popularity and the co-authors' h-indices are of surprisingly little relevance. By leveraging relevant factors, we find a greater than 87.5% potential predictability for whether a paper will contribute to an author's h-index within five years. As a further experiment, we generate a self-prediction for this paper, estimating that there is a 76% probability that it will contribute to the h-index of the co-author with the highest current h-index in five years. We conclude that our findings on the quantification of scientific impact can help researchers to expand their influence and more effectively leverage their position of "standing on the shoulders of giants."

References

[1]
M. Ahmed, S. Spagna, F. Huici, and S. Niccolini. A peek into the future: Predicting the evolution of popularity in user generated content. In WSDM '13, pages 607--616. ACM, 2013.
[2]
S. Bethard and D. Jurafsky. Who should I cite: Learning literature search models from citation behavior. In CIKM '10, pages 609--618. ACM, 2010.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003.
[4]
C. Castillo, D. Donato, and A. Gionis. Estimating the number of citations using author reputation. In SPIRE '07, pages 107--117. Springer, 2007.
[5]
J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec. Can cascades be predicted? In WWW '14, pages 925--936, 2014.
[6]
E. Garfield. Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159):108--111, 1955.
[7]
J. Gehrke, P. Ginsparg, and J. M. Kleinberg. Overview of the 2003 kdd cup. SIGKDD Explorations, 5(2):149--151, 2003.
[8]
J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102(46):16569--16572, 2005.
[9]
L. Hong, A. S. Doumith, and B. D. Davison. Co-factorization machines: Modeling user interests and predicting individual decisions in Twitter. In WSDM '13, pages 557--566. ACM, 2013.
[10]
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999.
[11]
J. M. Kleinberg and S. Oren. Mechanisms for (mis)allocating scientific credit. In STOC '11, pages 529--538. ACM, 2011.
[12]
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Machematical Statistics, 22(1):79--86, 1951.
[13]
L. Liu, J. Tang, J. Han, M. Jiang, and S. Yang. Mining topic-level influence in heterogeneous networks. In CIKM '10, pages 199--208. ACM, 2010.
[14]
H. Pinto, J. M. Almeida, and M. A. Gonçalves. Using early view patterns to predict the popularity of youtube videos. In WSDM '13, pages 365--374. ACM, 2013.
[15]
F. Radicchi, S. Fortunato, and C. Castellano. Universality of citation distributions: Toward an objective measure of scientific impact. PNAS, 2008.
[16]
X. Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han. ClusCite: Effective citation recommendation by information network-based clustering. In KDD '14, 2014.
[17]
H.-W. Shen and A.-L. Barabási. Collective credit allocation in science. PNAS, 2014.
[18]
H.-W. Shen, D. Wang, C. Song, and A.-L. Barabási. Modeling and predicting popularity dynamics via reinforced poisson processes. In AAAI '14, 2014.
[19]
M. Strathern. Improving ratings: audit in the British university system. European Review, 5(03):305--321, 1997.
[20]
Y. Sun, J. Han, C. C. Aggarwal, and N. V. Chawla. When will it happen?: Relationship prediction in heterogeneous information networks. In WSDM '12, pages 663--672. ACM, 2012.
[21]
J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD '09, pages 807--816, 2009.
[22]
J. Tang and J. Zhang. A discriminative approach to topic-based citation recommendation. Advances in Knowledge Discovery and Data Mining, pages 572--579, 2009.
[23]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD '08, pages 990--998, 2008.
[24]
B. Uzzi, S. Mukherjee, M. Stringer, and B. Jones. Atypical combinations and scientific impact. Science, 342(6157):468--472, 2013.
[25]
D. Vu, A. Asuncion, D. Hunter, and P. Smyth. Dynamic egocentric models for citation networks. In ICML '11, pages 857--864, 2011.
[26]
C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining advisor-advisee relationships from research publication networks. In KDD '10, pages 203--212, 2010.
[27]
D. Wang, C. Song, and A.-L. Barabási. Quantifying long-term scientific impact. Science, 342(6154):127--132, 2013.
[28]
R. Yan, C. Huang, J. Tang, Y. Zhang, and X. Li. To better stand on the shoulder of giants. In JCDL '12, pages 51--60. ACM, 2012.
[29]
R. Yan, J. Tang, X. Liu, D. Shan, and X. Li. Citation count prediction: Learning to estimate future citations for literature. In CIKM '11, pages 1247--1252. ACM, 2011.
[30]
X. Yu, Q. Gu, M. Zhou, and J. Han. Citation prediction in heterogeneous bibliographic networks. In SDM '12, pages 1119--1130, 2012.
[31]
J. Zhang, J. Tang, and J. Li. Expert finding in a social network. In DASFAA '07, pages 1066--1069, 2007.

Cited By

View all
  • (2025)Machine Learning for Financial Data ForecastingMachine Learning and Modeling Techniques in Financial Data Science10.4018/979-8-3693-8186-1.ch018(461-488)Online publication date: 31-Jan-2025
  • (2024)Individual and gender inequality in computer science: A career study of cohorts from 1970 to 2000Quantitative Science Studies10.1162/qss_a_002835:1(128-152)Online publication date: 1-Mar-2024
  • (2024)Predicting Scientific Impact Through Diffusion, Conformity, and Contribution DisentanglementProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679546(2764-2774)Online publication date: 21-Oct-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
WSDM '15: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining
February 2015
482 pages
ISBN:9781450333177
DOI:10.1145/2684822
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 February 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. citation prediction
  2. popularity prediction
  3. science of science
  4. scientific impact

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation
  • Army Research Laboratory
  • the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA) grant

Conference

WSDM 2015

Acceptance Rates

WSDM '15 Paper Acceptance Rate 39 of 238 submissions, 16%;
Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)56
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Machine Learning for Financial Data ForecastingMachine Learning and Modeling Techniques in Financial Data Science10.4018/979-8-3693-8186-1.ch018(461-488)Online publication date: 31-Jan-2025
  • (2024)Individual and gender inequality in computer science: A career study of cohorts from 1970 to 2000Quantitative Science Studies10.1162/qss_a_002835:1(128-152)Online publication date: 1-Mar-2024
  • (2024)Predicting Scientific Impact Through Diffusion, Conformity, and Contribution DisentanglementProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679546(2764-2774)Online publication date: 21-Oct-2024
  • (2024)An Early Evaluation of the Long-Term Influence of Academic Papers Based on Machine Learning AlgorithmsIEEE Access10.1109/ACCESS.2024.337856912(41773-41786)Online publication date: 2024
  • (2024)Advancing sustainability in the steel industry: the key role of the triple helix sectorsEnvironmental Science and Pollution Research10.1007/s11356-024-33983-731:31(43591-43615)Online publication date: 27-Jun-2024
  • (2023)Geo-Awareness of Learnt Citations Prediction for Scientific Publications (Demo Paper)Proceedings of the 7th ACM SIGSPATIAL Workshop on Location-based Recommendations, Geosocial Networks and Geoadvertising10.1145/3615896.3628341(29-32)Online publication date: 13-Nov-2023
  • (2023)Counterfactual Learning on Heterogeneous Graphs with Greedy PerturbationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599289(2988-2998)Online publication date: 4-Aug-2023
  • (2023)Investigating the contribution of author- and publication-specific features to scholars’ h-index predictionEPJ Data Science10.1140/epjds/s13688-023-00421-612:1Online publication date: 6-Oct-2023
  • (2023)CasFlow: Exploring Hierarchical Structures and Propagation Uncertainty for Cascade PredictionIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.312647535:4(3484-3499)Online publication date: 1-Apr-2023
  • (2023)Data, measurement and empirical methods in the science of scienceNature Human Behaviour10.1038/s41562-023-01562-47:7(1046-1058)Online publication date: 1-Jun-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media