ABSTRACT
Scientific impact plays a central role in the evaluation of the output of scholars, departments, and institutions. A widely used measure of scientific impact is citations, with a growing body of literature focused on predicting the number of citations obtained by any given publication. The effectiveness of such predictions, however, is fundamentally limited by the power-law distribution of citations, whereby publications with few citations are extremely common and publications with many citations are relatively rare. Given this limitation, in this work we instead address a related question asked by many academic researchers in the course of writing a paper, namely: "Will this paper increase my h-index?" Using a real academic dataset with over 1.7 million authors, 2 million papers, and 8 million citation relationships from the premier online academic service ArnetMiner, we formalize a novel scientific impact prediction problem to examine several factors that can drive a paper to increase the primary author's h-index. We find that the researcher's authority on the publication topic and the venue in which the paper is published are crucial factors to the increase of the primary author's h-index, while the topic popularity and the co-authors' h-indices are of surprisingly little relevance. By leveraging relevant factors, we find a greater than 87.5% potential predictability for whether a paper will contribute to an author's h-index within five years. As a further experiment, we generate a self-prediction for this paper, estimating that there is a 76% probability that it will contribute to the h-index of the co-author with the highest current h-index in five years. We conclude that our findings on the quantification of scientific impact can help researchers to expand their influence and more effectively leverage their position of "standing on the shoulders of giants."
- M. Ahmed, S. Spagna, F. Huici, and S. Niccolini. A peek into the future: Predicting the evolution of popularity in user generated content. In WSDM '13, pages 607--616. ACM, 2013. Google ScholarDigital Library
- S. Bethard and D. Jurafsky. Who should I cite: Learning literature search models from citation behavior. In CIKM '10, pages 609--618. ACM, 2010. Google ScholarDigital Library
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. JMLR, 3:993--1022, 2003. Google ScholarDigital Library
- C. Castillo, D. Donato, and A. Gionis. Estimating the number of citations using author reputation. In SPIRE '07, pages 107--117. Springer, 2007. Google ScholarDigital Library
- J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec. Can cascades be predicted? In WWW '14, pages 925--936, 2014. Google ScholarDigital Library
- E. Garfield. Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159):108--111, 1955.Google ScholarCross Ref
- J. Gehrke, P. Ginsparg, and J. M. Kleinberg. Overview of the 2003 kdd cup. SIGKDD Explorations, 5(2):149--151, 2003. Google ScholarDigital Library
- J. E. Hirsch. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences, 102(46):16569--16572, 2005.Google ScholarCross Ref
- L. Hong, A. S. Doumith, and B. D. Davison. Co-factorization machines: Modeling user interests and predicting individual decisions in Twitter. In WSDM '13, pages 557--566. ACM, 2013. Google ScholarDigital Library
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604--632, 1999. Google ScholarDigital Library
- J. M. Kleinberg and S. Oren. Mechanisms for (mis)allocating scientific credit. In STOC '11, pages 529--538. ACM, 2011. Google ScholarDigital Library
- S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Machematical Statistics, 22(1):79--86, 1951.Google ScholarCross Ref
- L. Liu, J. Tang, J. Han, M. Jiang, and S. Yang. Mining topic-level influence in heterogeneous networks. In CIKM '10, pages 199--208. ACM, 2010. Google ScholarDigital Library
- H. Pinto, J. M. Almeida, and M. A. Gonçalves. Using early view patterns to predict the popularity of youtube videos. In WSDM '13, pages 365--374. ACM, 2013. Google ScholarDigital Library
- F. Radicchi, S. Fortunato, and C. Castellano. Universality of citation distributions: Toward an objective measure of scientific impact. PNAS, 2008.Google ScholarCross Ref
- X. Ren, J. Liu, X. Yu, U. Khandelwal, Q. Gu, L. Wang, and J. Han. ClusCite: Effective citation recommendation by information network-based clustering. In KDD '14, 2014. Google ScholarDigital Library
- H.-W. Shen and A.-L. Barabási. Collective credit allocation in science. PNAS, 2014.Google ScholarCross Ref
- H.-W. Shen, D. Wang, C. Song, and A.-L. Barabási. Modeling and predicting popularity dynamics via reinforced poisson processes. In AAAI '14, 2014.Google ScholarDigital Library
- M. Strathern. Improving ratings: audit in the British university system. European Review, 5(03):305--321, 1997.Google ScholarCross Ref
- Y. Sun, J. Han, C. C. Aggarwal, and N. V. Chawla. When will it happen?: Relationship prediction in heterogeneous information networks. In WSDM '12, pages 663--672. ACM, 2012. Google ScholarDigital Library
- J. Tang, J. Sun, C. Wang, and Z. Yang. Social influence analysis in large-scale networks. In KDD '09, pages 807--816, 2009. Google ScholarDigital Library
- J. Tang and J. Zhang. A discriminative approach to topic-based citation recommendation. Advances in Knowledge Discovery and Data Mining, pages 572--579, 2009. Google ScholarDigital Library
- J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. In KDD '08, pages 990--998, 2008. Google ScholarDigital Library
- B. Uzzi, S. Mukherjee, M. Stringer, and B. Jones. Atypical combinations and scientific impact. Science, 342(6157):468--472, 2013.Google ScholarCross Ref
- D. Vu, A. Asuncion, D. Hunter, and P. Smyth. Dynamic egocentric models for citation networks. In ICML '11, pages 857--864, 2011.Google Scholar
- C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, and J. Guo. Mining advisor-advisee relationships from research publication networks. In KDD '10, pages 203--212, 2010. Google ScholarDigital Library
- D. Wang, C. Song, and A.-L. Barabási. Quantifying long-term scientific impact. Science, 342(6154):127--132, 2013.Google ScholarCross Ref
- R. Yan, C. Huang, J. Tang, Y. Zhang, and X. Li. To better stand on the shoulder of giants. In JCDL '12, pages 51--60. ACM, 2012. Google ScholarDigital Library
- R. Yan, J. Tang, X. Liu, D. Shan, and X. Li. Citation count prediction: Learning to estimate future citations for literature. In CIKM '11, pages 1247--1252. ACM, 2011. Google ScholarDigital Library
- X. Yu, Q. Gu, M. Zhou, and J. Han. Citation prediction in heterogeneous bibliographic networks. In SDM '12, pages 1119--1130, 2012.Google ScholarCross Ref
- J. Zhang, J. Tang, and J. Li. Expert finding in a social network. In DASFAA '07, pages 1066--1069, 2007.Google ScholarCross Ref
Index Terms
- Will This Paper Increase Your h-index?: Scientific Impact Prediction
Recommendations
HINTS: Citation Time Series Prediction for New Publications via Dynamic Heterogeneous Information Network Embedding
WWW '21: Proceedings of the Web Conference 2021Accurate prediction of scientific impact is important for scientists, academic recommender systems, and granting organizations alike. Existing approaches rely on many years of leading citation values to predict a scientific paper’s citations (a proxy ...
-index: a unified index to quantify individuals across disciplines
AbstractNowadays scientific evaluation is becoming increasingly important and necessary in many cases, such as faculty hiring, funding and promotion. Among existing evaluation metrics for individual performance, h-index is the most famous indicator and ...
Measuring science in our highly digitized world
PCI '18: Proceedings of the 22nd Pan-Hellenic Conference on InformaticsDuring the past two decades the availability of scholarly data repositories such as Google Scholar, Elsevier Scopus offered tremendous opportunities to the field of scientometrics to analyze the dynamics of science and help design indicators to evaluate ...
Comments