skip to main content
10.1145/1401890.1402004acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Anticipating annotations and emerging trends in biomedical literature

Published: 24 August 2008 Publication History

Abstract

The BioJournalMonitor is a decision support system for the analysis of trends and topics in the biomedical literature. Its main goal is to identify potential diagnostic and therapeutic biomarkers for specific diseases. Several data sources are continuously integrated to provide the user with up-to-date information on current research in this field. State-of-the-art text mining technologies are deployed to provide added value on top of the original content, including named entity detection, relation extraction, classification, clustering, ranking, summarization, and visualization. We present two novel technologies that are related to the analysis of temporal dynamics of text archives and associated ontologies. Currently, the MeSH ontology is used to annotate the scientific articles entering the PubMed database with medical terms. Both the maintenance of the ontology as well as the annotation of new articles is performed largely manually. We describe how probabilistic topic models can be used to annotate recent articles with the most likely MeSH terms. This provides our users with a competitive advantage because, when searching for MeSH terms, articles are found long before they are manually annotated. We further present a study on how to predict the inclusion of new terms in the MeSH ontology. The results suggest that early prediction of emerging trends is possible. The trend ranking functions are deployed in our system to enable interactive searches for the hottest new trends relating to a disease.

References

[1]
R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, Addison-Wesley, 1999.
[2]
D. M. Blei, K. Franks, M. I. Jordan, and I. S. Mian. Statistical modeling of biomedical corpora: mining the caenorhabditis genetic center bibliography for genes related to life span. BMC Bioinformatics, 7(1), 2006.
[3]
D. M. Blei and M. I. Jordan. Modeling annotated data. pages 127--134, 2003.
[4]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[5]
M. Bundschus, M. Dejori, S. Yu, V. Tresp, and H.-P. Kriegel. Statistical modeling of medical indexing processes for biomedical knowledge information discovery from text. Submitted, 2008.
[6]
G. P. C. Fung, J. X. Yu, P. S. Yu, and H. Lu. Parameter free bursty events detection in text streams. In Proc. 31st Intl. Conf. on Very large data bases, pages 181--192, 2005.
[7]
C. W. Gay, M. Kayaalp, and A. R. Aronson. Semi-automatic indexing of full text biomedical articles. In AMIA Annu Symp Proc, pages 271--275, 2005.
[8]
T. L. Griffiths and M. Steyvers. Finding scientific topics. Proc Natl Acad Sci U S A, 101 Suppl 1:5228--5235, 2004.
[9]
Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature representation for clustering text streams. In Proc. SIAM Int. Conf. on Data Mining, 2007.
[10]
T. Hofmann. Probabilistic latent semantic analysis. In Proc. of Uncertainty in Artificial Intelligence, Stockholm, 1999.
[11]
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, V42(1):177--196, 2001.
[12]
S. M. Humphrey, T. C. Rindflesch, and A. R. Aronson. Automatic indexing by discipline and high-level categories: methodology and potential applications, 2000.
[13]
T. Joachims. Optimizing search engines using clickthrough data. In Proc. ACM Conf. on Knowledge Discovery and Data Mining, 2002.
[14]
J. Kleinberg. Bursty and hierarchical structure in streams. In Proc. 8th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 91--101, 2002.
[15]
B. Lent, R. Agrawal, and R. Srikant. Discovering trends in text databases. In Proc. 3rd Int. Conf. Knowledge Discovery and Data Mining, pages 227--230, 1997.
[16]
C. D. Manning and H. Schutze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
[17]
A. McCallum, A. Corrada-Emmanuel, and X. Wang. Topic and role discovery in social networks. 2005.
[18]
A. McCallum and K. Nigam. A comparison of event models for naive bayes text classification, 1998.
[19]
Q. Mei and C. Zhai. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge discovery in data mining, pages 198--207, 2005.
[20]
F. Mörchen, K. Brinker, and C. Neubauer. Any-time clustering of high frequency news streams. In Proc. Data Mining Case Studies Workshop, KDD, 2007.
[21]
S. Morinaga and K. Yamanishi. Tracking dynamics of topic trends using a finite mixture model. In Proc. 10th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 811--816, 2004.
[22]
A. Névéol, S. E. Shooshan, S. M. Humphrey, T. C. Rindflesch, and A. R. Aronson. Multiple approaches to fine-grained indexing of the biomedical literature. In Pacific Symp. on Biocomputing, pages 292--303. World Scientific, 2007.
[23]
A. Névéol, S. E. Shooshan, J. G. Mork, and A. R. Aronson. Fine-grained indexing of the biomedical literature: Mesh subheading attachment for a medline indexing tool. In Proc. AMIA Symp, 2007.
[24]
D. Newman, C. Chemudugunta, and P. Smyth. Statistical entity-topic models. In Proc. 12th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 680--686, New York, NY, USA, 2006.
[25]
M. F. Porter. An algorithm for suffix stripping. pages 313--316, 1997.
[26]
R. Schult and M. Spiliopoulou. Discovering emerging topics in unlabelled text collections. In Proc. East European ADBIS Conf., pages 353--366, 2006.
[27]
M. Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths. Probabilistic author-topic models for information discovery. In Proc. of the 10th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 306--315, 2004.
[28]
R. Swan and J. Allan. Automatic generation of overview timelines. In Proc. 23rd Intl. ACM SIGIR Conf. on information retrieval, pages 49--56, 2000.
[29]
X. Wang and A. McCallum. Topics over time: a non-markov continuous-time model of topical trends. In Proc. 12th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining, pages 424--433, 2006.

Cited By

View all
  • (2022)Measuring the innovation of method knowledge elements in scientific literatureScientometrics10.1007/s11192-022-04350-5127:5(2803-2827)Online publication date: 25-Mar-2022
  • (2019)RETRACTED ARTICLE: In text mining: detection of topic and sub-topic using multiple spider hunting modelJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-019-01588-512:3(3571-3580)Online publication date: 22-Nov-2019
  • (2017)Identifying prescription patterns with a topic model of diseases and medicationsJournal of Biomedical Informatics10.1016/j.jbi.2017.09.00375:C(35-47)Online publication date: 1-Nov-2017
  • Show More Cited By

Index Terms

  1. Anticipating annotations and emerging trends in biomedical literature

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2008
    1116 pages
    ISBN:9781605581934
    DOI:10.1145/1401890
    • General Chair:
    • Ying Li,
    • Program Chairs:
    • Bing Liu,
    • Sunita Sarawagi
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2008

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. LDA
    2. MeSH
    3. prediction PubMed
    4. text mining
    5. trends

    Qualifiers

    • Research-article

    Conference

    KDD08

    Acceptance Rates

    KDD '08 Paper Acceptance Rate 118 of 593 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Measuring the innovation of method knowledge elements in scientific literatureScientometrics10.1007/s11192-022-04350-5127:5(2803-2827)Online publication date: 25-Mar-2022
    • (2019)RETRACTED ARTICLE: In text mining: detection of topic and sub-topic using multiple spider hunting modelJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-019-01588-512:3(3571-3580)Online publication date: 22-Nov-2019
    • (2017)Identifying prescription patterns with a topic model of diseases and medicationsJournal of Biomedical Informatics10.1016/j.jbi.2017.09.00375:C(35-47)Online publication date: 1-Nov-2017
    • (2016)Modeling and Analyzing of Research Topic Evolution Associated with Social Networks of ResearchersInternational Journal of Distributed Systems and Technologies10.4018/IJDST.20160701037:3(42-62)Online publication date: 1-Jul-2016
    • (2016)Topic discovery and future trend forecasting for textsJournal of Big Data10.1186/s40537-016-0039-23:1Online publication date: 14-Apr-2016
    • (2016)Generation of topic evolution trees from heterogeneous bibliographic networksJournal of Informetrics10.1016/j.joi.2016.04.00210:2(606-621)Online publication date: May-2016
    • (2016)Analyzing of research patterns based on a temporal tracking and assessing modelPersonal and Ubiquitous Computing10.1007/s00779-016-0965-120:6(933-946)Online publication date: 1-Nov-2016
    • (2014)Semantic Breakthrough in Drug DiscoverySynthesis Lectures on the Semantic Web: Theory and Technology10.2200/S00600ED1V01Y201409WEB0094:2(1-142)Online publication date: 23-Oct-2014
    • (2014)Discovering Health Topics in Social Media Using Topic ModelsPLoS ONE10.1371/journal.pone.01034089:8(e103408)Online publication date: 1-Aug-2014
    • (2014)K-State automaton burst detection model based on KOS: Emerging trends in cancer fieldJournal of Information Science10.1177/016555151455150041:1(16-26)Online publication date: 3-Oct-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media