article

Two-phase learning for biological event extraction and verification

Authors:
Eunju Kim

Pohang University of Science and Technology, Pohang, Korea

Pohang University of Science and Technology, Pohang, Korea
View Profile

,
Yu Song

Pohang University of Science and Technology, Pohang, Korea

Pohang University of Science and Technology, Pohang, Korea
View Profile

,
Cheongjae Lee

Pohang University of Science and Technology, Pohang, Korea

Pohang University of Science and Technology, Pohang, Korea
View Profile

,
Kyoungduk Kim

Pohang University of Science and Technology, Pohang, Korea

Pohang University of Science and Technology, Pohang, Korea
View Profile

,
Gary Geunbae Lee

Pohang University of Science and Technology, Pohang, Korea

Pohang University of Science and Technology, Pohang, Korea
View Profile

,
Byoung-Kee Yi

Pohang University of Science and Technology, Pohang, Korea

Pohang University of Science and Technology, Pohang, Korea
View Profile

,
Jeongwon Cha

Changwon University, Korea

Changwon University, Korea
View Profile

ACM Transactions on Asian Language Information Processing Volume 5 Issue 1pp 61–73https://doi.org/10.1145/1131348.1131353

Published:01 March 2006Publication History

ACM Transactions on Asian Language Information Processing

Abstract

Many previous biological event-extraction systems were based on hand-crafted rules which were specifically tuned to a specific biological application domain. But manually constructing and tuning the rules are time-consuming processes and make the systems less portable. So supervised machine-learning methods were developed to generate the extraction rules automatically, but accepting the trade-off between precision and recall (high recall with low precision, and vice versa) is a barrier to improving performance. To make matters worse, a text in the biological domain is more complex because it often contains more than two biological events in a sentence, and one event in a noun chunk can be an entity for the other event. As a result, there are as yet no systems that give a good performance in extracting events in biological domains by using supervised machine learning.To overcome the limitations of previous systems and the complexity of biological texts, we present the following new ideas. First, we adopted a supervised machine-learning method to reduce the human effort in making extraction rules in order to obtain a highly domain-portable system. Second, we overcame the classical trade-off between precision and recall by using an event component verification method. Thus, machine learning occurs in two phases in our architecture. In the first phase, the system focuses on improving recall in extracting events between biological entities during a supervised machine-learning period. After extracting the biological events with automatically learned rules, in the second phase the system removes incorrect biological events by verifying the extracted event components with a maximum entropy (ME) classification method. In other words, the system targets for high recall in the first phase and tries to achieve high precision with a classifier in the second phase. Finally, we improved a supervised machine-learning algorithm so that it could learn a rule in a noun chunk and a rule extending throughout a sentence at two different levels, separately, for nested biological events.

References

Blaschke, C., Andrade, M. A., Ouzous, C., and Valencia, A. 1999. Automatic extraction of biological information from scientific text: Protein-protein interactions. Intelligent Systems for Molecular Biology, 60--67. Google Scholar
Bunescu, R., Ge, R., Kate, R. J., Marcotte, E. M., Mooney, R. J., Ramani. A. K., and Wong, Y. W. 2004. Comparative experiments on learning information extractors for proteins and their interactions. J. Artif. Intell. Medicine (Dec. 2004). Available online: http://www.sciencedirect.com/.Google Scholar
Daraselia, N., Yuryev, A., Egorov, S., Novichkova, S., Niktin, A., and Mazo, I. 2004. Extracting human protein interactions from MEDLINE using a full-sentence parser. Bioinformatics. 20, 604--611. Google Scholar
Gildea, D. 2001. Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.Google Scholar
Park, J. C., Kim, H. S., and Kim, J. J. 2001. Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. Pac. Symp. Biocomput.Google Scholar
Pustejovsky, J., Castano, J., Kotechi, M., and Cochran, B. 2002. Robust relational parsing over biomedical literature: Extracting inhibit relations. Pac. Symp. Biocomput. 362--373.Google Scholar
Rindflesch, T. C., Rayan, J. V., and Hunter, L. 2000. Extracting molecular binding relationships from biomedical text. In Applied Natural Language Processing. North American Chapter of the Association for Computational Linguistics, 188--195. Google Scholar
Rosenfeld, R. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language 10, 187--228.Google Scholar
Sekimizu, T., Park, H. S., and Tsuijii, J. 1998. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In Proceedings of the Genome Informatics Workshop, 62--71.Google Scholar
Soderland, S. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning 34, 233--272. Google Scholar
Thomas, J., Milward, D., Ousounis, C., Pulman, S., and Carroll, M. 2000. Automatic extraction of protein interactions from scientific abstracts. Pac. Symp. Biocomput. 5, 541--552.Google Scholar
Wu, J. 2002. Maximum entropy language modeling with non-local dependencies. Ph.D. thesis, Johns Hopkins University. Google Scholar
Yakushiji, A., Tateisi, Y., and Miyao, Y. 2001. Event extraction from biomedical papers using a full parser. Pac. Symp. Biocomput.Google Scholar
Zhang, T., Damerau, F., and Johnson, D. 2001. Text chunking using regularized Winnow. In Proceedings of the ACL Conference. Google Scholar

Index Terms

Two-phase learning for biological event extraction and verification
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

Event Extraction for Gene Regulation Network Using Syntactic and Semantic Approaches
Proceedings of the 28th International Conference on Current Approaches in Applied Artificial Intelligence - Volume 9101

Gene Regulation Network GRN is a graphical representation of the relationship between molecular mechanisms and cellular behavior in system biology. This paper examines the extraction of GRN from biological literatures using text mining techniques. The ...
Read More
Domain transformation on biological event extraction by learning methods
Graphical abstract

Display Omitted
Highlights
- General overview of event extraction and Gene Regulatory Networks (GRNs).
- ...
Abstract
Event extraction and annotation has become a significant focus of recent efforts in biological text mining and information extraction (IE). However, event extraction, event annotation methods, and resources have so far focused almost ...
Read More
EXPLORING A SUBGRAPH MATCHING APPROACH FOR EXTRACTING BIOLOGICAL EVENTS FROM LITERATURE

An important task in biological information extraction is to identify descriptions of biological relations and events involving genes or proteins. In this work, we propose a graph-based approach to automatically learn rules for detecting biological ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Asian Language Information Processing Volume 5, Issue 1
March 2006
88 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1131348
Issue’s Table of Contents

Copyright © 2006 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 March 2006
Published in talip Volume 5, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Biological event extraction
event component verification
two-level supervised machine learning
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 790
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Two-phase learning for biological event extraction and verification

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Event Extraction for Gene Regulation Network Using Syntactic and Semantic Approaches

Domain transformation on biological event extraction by learning methods

EXPLORING A SUBGRAPH MATCHING APPROACH FOR EXTRACTING BIOLOGICAL EVENTS FROM LITERATURE

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Two-phase learning for biological event extraction and verification

ACM Transactions on Asian Language Information Processing

Abstract

References

Cited By

Index Terms

Recommendations

Event Extraction for Gene Regulation Network Using Syntactic and Semantic Approaches

Domain transformation on biological event extraction by learning methods

EXPLORING A SUBGRAPH MATCHING APPROACH FOR EXTRACTING BIOLOGICAL EVENTS FROM LITERATURE

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media