skip to main content
10.1145/3132847.3132989acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation

Published: 06 November 2017 Publication History

Abstract

We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically, we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.

References

[1]
Alan R Aronson and Franccois-Michel Lang. 2010. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, Vol. 17, 3 (2010), 229--236.
[2]
Wei Bi and James Tin-Yau Kwok. 2013. Efficient Multi-label Classification with Many Labels. ICML (3). 405--413.
[3]
Florian Boudin, Jian-Yun Nie, Joan C Bartlett, Roland Grad, Pierre Pluye, and Martin Dawes. 2010 a. Combining classifiers for robust PICO element detection. BMC medical informatics and decision making, Vol. 10, 1 (2010), 29.
[4]
Florian Boudin, Lixin Shi, and Jian-Yun Nie. 2010 b. Improving medical information retrieval with PICO element detection European Conference on Information Retrieval. Springer, 50--61.
[5]
Rich Caruana. 1998. Multitask learning. Learning to learn. Springer, 95--133.
[6]
Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 507--516.
[7]
Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification Advances in Neural Information Processing Systems. 1529--1537.
[8]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016 a. Doctor ai: Predicting clinical events via recurrent neural networks Machine Learning for Healthcare Conference. 301--318.
[9]
Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016 b. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. Advances in Neural Information Processing Systems 29, bibfieldeditorD. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3504--3512.
[10]
Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research Vol. 12, Aug (2011), 2493--2537.
[11]
Hal Daumé III. 2009. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815 (2009).
[12]
D Demner-Fushman, N Elhadad, et almbox. 2016. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing. IMIA Yearbook (2016), 224--233.
[13]
André Elisseeff, Jason Weston, et almbox. 2001. A kernel method for multi-labelled classification. NIPS, Vol. Vol. 14. 681--687.
[14]
Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker. 2008. Multilabel classification via calibrated label ranking. Machine learning, Vol. 73, 2 (2008), 133--153.
[15]
Shuiwang Ji and Jieping Ye. 2009. Linear Dimensionality Reduction for Multi-label Classification. IJCAI, Vol. 9. 1077--1082.
[16]
Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2008. Multilabel text classification for automated tag suggestion. ECML PKDD discovery challenge Vol. 75 (2008).
[17]
Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).
[18]
Svetlana Kiritchenko, Berry de Bruijn, Simona Carini, Joel Martin, and Ida Sim. 2010. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making, Vol. 10, 1 (2010), 56.

Cited By

View all
  • (2023)Data extraction methods for systematic review (semi)automation: Update of a living systematic reviewF1000Research10.12688/f1000research.51117.210(401)Online publication date: 9-Oct-2023
  • (2022)A survey on clinical natural language processing in the United Kingdom from 2007 to 2022npj Digital Medicine10.1038/s41746-022-00730-65:1Online publication date: 21-Dec-2022
  • (2021)Data extraction methods for systematic review (semi)automation: A living systematic reviewF1000Research10.12688/f1000research.51117.110(401)Online publication date: 19-May-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. biomedical informatics
  2. deep learning
  3. text mining

Qualifiers

  • Research-article

Funding Sources

  • Cochrane 'Transform'
  • NLM
  • MRC (UK)

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Data extraction methods for systematic review (semi)automation: Update of a living systematic reviewF1000Research10.12688/f1000research.51117.210(401)Online publication date: 9-Oct-2023
  • (2022)A survey on clinical natural language processing in the United Kingdom from 2007 to 2022npj Digital Medicine10.1038/s41746-022-00730-65:1Online publication date: 21-Dec-2022
  • (2021)Data extraction methods for systematic review (semi)automation: A living systematic reviewF1000Research10.12688/f1000research.51117.110(401)Online publication date: 19-May-2021
  • (2021)Data structuring of electronic health records: a systematic reviewHealth and Technology10.1007/s12553-021-00607-w11:6(1219-1235)Online publication date: 29-Oct-2021
  • (2020)Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO AnnotationExplainable AI in Healthcare and Medicine10.1007/978-3-030-53352-6_12(131-145)Online publication date: 3-Nov-2020
  • (2019)Improving reference prioritisation with PICO recognitionBMC Medical Informatics and Decision Making10.1186/s12911-019-0992-819:1Online publication date: 5-Dec-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media