research-article

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation

Authors:

Iain J. Marshall,

John Shawe-Taylor,

Byron C. WallaceAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 1519 - 1528

https://doi.org/10.1145/3132847.3132989

Published: 06 November 2017 Publication History

Abstract

We consider the task of automatically annotating free texts describing clinical trials with concepts from a controlled, structured medical vocabulary. Specifically, we aim to build a model to infer distinct sets of (ontological) concepts describing complementary clinically salient aspects of the underlying trials: the populations enrolled, the interventions administered and the outcomes measured, i.e., the PICO elements. This important practical problem poses a few key challenges. One issue is that the output space is vast, because the vocabulary comprises many unique concepts. Compounding this problem, annotated data in this domain is expensive to collect and hence sparse. Furthermore, the outputs (sets of concepts for each PICO element) are correlated: specific populations (e.g., diabetics) will render certain intervention concepts likely (insulin therapy) while effectively precluding others (radiation therapy). Such correlations should be exploited. We propose a novel neural model that addresses these challenges. We introduce a Candidate-Selector architecture in which the model considers setes of candidate concepts for PICO elements, and assesses their plausibility conditioned on the input text to be annotated. This relies on a 'candidate set' generator, which may be learned or relies on heuristics. A conditional discriminative neural model then jointly selects candidate concepts, given the input text. We compare the predictive performance of our approach to strong baselines, and show that it outperforms them. Finally, we perform a qualitative evaluation of the generated annotations by asking domain experts to assess their quality.

References

[1]

Alan R Aronson and Franccois-Michel Lang. 2010. An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, Vol. 17, 3 (2010), 229--236.

[2]

Wei Bi and James Tin-Yau Kwok. 2013. Efficient Multi-label Classification with Many Labels. ICML (3). 405--413.

Digital Library

[3]

Florian Boudin, Jian-Yun Nie, Joan C Bartlett, Roland Grad, Pierre Pluye, and Martin Dawes. 2010 a. Combining classifiers for robust PICO element detection. BMC medical informatics and decision making, Vol. 10, 1 (2010), 29.

[4]

Florian Boudin, Lixin Shi, and Jian-Yun Nie. 2010 b. Improving medical information retrieval with PICO element detection European Conference on Information Retrieval. Springer, 50--61.

Digital Library

[5]

Rich Caruana. 1998. Multitask learning. Learning to learn. Springer, 95--133.

Digital Library

[6]

Zhengping Che, David Kale, Wenzhe Li, Mohammad Taha Bahadori, and Yan Liu. 2015. Deep computational phenotyping. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 507--516.

Digital Library

[7]

Yao-Nan Chen and Hsuan-Tien Lin. 2012. Feature-aware label space dimension reduction for multi-label classification Advances in Neural Information Processing Systems. 1529--1537.

Digital Library

[8]

Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, and Jimeng Sun. 2016 a. Doctor ai: Predicting clinical events via recurrent neural networks Machine Learning for Healthcare Conference. 301--318.

[9]

Edward Choi, Mohammad Taha Bahadori, Jimeng Sun, Joshua Kulas, Andy Schuetz, and Walter Stewart. 2016 b. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. Advances in Neural Information Processing Systems 29, bibfieldeditorD. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, and R. Garnett (Eds.). Curran Associates, Inc., 3504--3512.

Digital Library

[10]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research Vol. 12, Aug (2011), 2493--2537.

Digital Library

[11]

Hal Daumé III. 2009. Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815 (2009).

[12]

D Demner-Fushman, N Elhadad, et almbox. 2016. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing. IMIA Yearbook (2016), 224--233.

[13]

André Elisseeff, Jason Weston, et almbox. 2001. A kernel method for multi-labelled classification. NIPS, Vol. Vol. 14. 681--687.

Digital Library

[14]

Johannes Fürnkranz, Eyke Hüllermeier, Eneldo Loza Mencía, and Klaus Brinker. 2008. Multilabel classification via calibrated label ranking. Machine learning, Vol. 73, 2 (2008), 133--153.

Digital Library

[15]

Shuiwang Ji and Jieping Ye. 2009. Linear Dimensionality Reduction for Multi-label Classification. IJCAI, Vol. 9. 1077--1082.

Digital Library

[16]

Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2008. Multilabel text classification for automated tag suggestion. ECML PKDD discovery challenge Vol. 75 (2008).

[17]

Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014).

[18]

Svetlana Kiritchenko, Berry de Bruijn, Simona Carini, Joel Martin, and Ida Sim. 2010. ExaCT: automatic extraction of clinical trial characteristics from journal publications. BMC Medical Informatics and Decision Making, Vol. 10, 1 (2010), 56.

Cited By

Schmidt LFinnerty Mutlu AElmore ROlorisade BThomas JHiggins J(2023)Data extraction methods for systematic review (semi)automation: Update of a living systematic reviewF1000Research10.12688/f1000research.51117.210(401)Online publication date: 9-Oct-2023
https://doi.org/10.12688/f1000research.51117.2
Wu HWang MWu JFrancis FChang YShavick ADong HPoon MFitzpatrick NLevine ASlater LHandy AKarwath AGkoutos GChelala CShah AStewart RCollier NAlex BWhiteley WSudlow CRoberts ADobson R(2022)A survey on clinical natural language processing in the United Kingdom from 2007 to 2022npj Digital Medicine10.1038/s41746-022-00730-65:1Online publication date: 21-Dec-2022
https://doi.org/10.1038/s41746-022-00730-6
Schmidt LOlorisade BMcGuinness LThomas JHiggins J(2021)Data extraction methods for systematic review (semi)automation: A living systematic reviewF1000Research10.12688/f1000research.51117.110(401)Online publication date: 19-May-2021
https://doi.org/10.12688/f1000research.51117.1
Show More Cited By

Index Terms

A Neural Candidate-Selector Architecture for Automatic Structured Clinical Text Annotation
1. Applied computing
  1. Life and medical sciences
    1. Health informatics
2. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

On the creation of a clinical gold standard corpus in Spanish

Display Omitted Creation of a gold standard of electronic health records in Spanish.Annotation of diseases, drugs and adverse drug reaction (ADR) events.Quality: inter-annotator agreement of 90.53% for entities and 82.86% for events.Development and ...
Collaborative text-annotation resource for disease-centered relation extraction from biomedical text

Agglomerating results from studies of individual biological components has shown the potential to produce biomedical discovery and the promise of therapeutic development. Such knowledge integration could be tremendously facilitated by automated text ...
Cross-Evaluation of Entity Linking and Disambiguation Systems for Clinical Text Annotation
SEMANTiCS 2016: Proceedings of the 12th International Conference on Semantic Systems

In this paper we study whether state-of-the-art techniques for multi-domain and multilingual entity linking can be ported to the clinical domain. To do so, we compare two known entity linking systems, BabelFly and TagMe, that leverage on Wikipedia and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Cochrane 'Transform'
NLM
MRC (UK)

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
284
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Schmidt LFinnerty Mutlu AElmore ROlorisade BThomas JHiggins J(2023)Data extraction methods for systematic review (semi)automation: Update of a living systematic reviewF1000Research10.12688/f1000research.51117.210(401)Online publication date: 9-Oct-2023
https://doi.org/10.12688/f1000research.51117.2
Wu HWang MWu JFrancis FChang YShavick ADong HPoon MFitzpatrick NLevine ASlater LHandy AKarwath AGkoutos GChelala CShah AStewart RCollier NAlex BWhiteley WSudlow CRoberts ADobson R(2022)A survey on clinical natural language processing in the United Kingdom from 2007 to 2022npj Digital Medicine10.1038/s41746-022-00730-65:1Online publication date: 21-Dec-2022
https://doi.org/10.1038/s41746-022-00730-6
Schmidt LOlorisade BMcGuinness LThomas JHiggins J(2021)Data extraction methods for systematic review (semi)automation: A living systematic reviewF1000Research10.12688/f1000research.51117.110(401)Online publication date: 19-May-2021
https://doi.org/10.12688/f1000research.51117.1
de Oliveira Jda Costa CAntunes R(2021)Data structuring of electronic health records: a systematic reviewHealth and Technology10.1007/s12553-021-00607-w11:6(1219-1235)Online publication date: 29-Oct-2021
https://doi.org/10.1007/s12553-021-00607-w
Singh GSabet ZShawe-Taylor JThomas J(2020)Constructing Artificial Data for Fine-Tuning for Low-Resource Biomedical Text Tagging with Applications in PICO AnnotationExplainable AI in Healthcare and Medicine10.1007/978-3-030-53352-6_12(131-145)Online publication date: 3-Nov-2020
https://doi.org/10.1007/978-3-030-53352-6_12
Brockmeier AJu MPrzybyła PAnaniadou S(2019)Improving reference prioritisation with PICO recognitionBMC Medical Informatics and Decision Making10.1186/s12911-019-0992-819:1Online publication date: 5-Dec-2019
https://doi.org/10.1186/s12911-019-0992-8

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten