ACM Home Page
Please provide us with feedback. Feedback
Two-phase learning for biological event extraction and verification
Full text PdfPdf (598 KB)
Source ACM Transactions on Asian Language Information Processing (TALIP) archive
Volume 5 ,  Issue 1  (March 2006) table of contents
Pages: 61 - 73  
Year of Publication: 2006
ISSN:1530-0226
Authors
Eunju Kim  Pohang University of Science and Technology, Pohang, Korea
Yu Song  Pohang University of Science and Technology, Pohang, Korea
Cheongjae Lee  Pohang University of Science and Technology, Pohang, Korea
Kyoungduk Kim  Pohang University of Science and Technology, Pohang, Korea
Gary Geunbae Lee  Pohang University of Science and Technology, Pohang, Korea
Byoung-Kee Yi  Pohang University of Science and Technology, Pohang, Korea
Jeongwon Cha  Changwon University, Korea
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 60,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1131348.1131353
What is a DOI?

ABSTRACT

Many previous biological event-extraction systems were based on hand-crafted rules which were specifically tuned to a specific biological application domain. But manually constructing and tuning the rules are time-consuming processes and make the systems less portable. So supervised machine-learning methods were developed to generate the extraction rules automatically, but accepting the trade-off between precision and recall (high recall with low precision, and vice versa) is a barrier to improving performance. To make matters worse, a text in the biological domain is more complex because it often contains more than two biological events in a sentence, and one event in a noun chunk can be an entity for the other event. As a result, there are as yet no systems that give a good performance in extracting events in biological domains by using supervised machine learning.To overcome the limitations of previous systems and the complexity of biological texts, we present the following new ideas. First, we adopted a supervised machine-learning method to reduce the human effort in making extraction rules in order to obtain a highly domain-portable system. Second, we overcame the classical trade-off between precision and recall by using an event component verification method. Thus, machine learning occurs in two phases in our architecture. In the first phase, the system focuses on improving recall in extracting events between biological entities during a supervised machine-learning period. After extracting the biological events with automatically learned rules, in the second phase the system removes incorrect biological events by verifying the extracted event components with a maximum entropy (ME) classification method. In other words, the system targets for high recall in the first phase and tries to achieve high precision with a classifier in the second phase. Finally, we improved a supervised machine-learning algorithm so that it could learn a rule in a noun chunk and a rule extending throughout a sentence at two different levels, separately, for nested biological events.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Bunescu, R., Ge, R., Kate, R. J., Marcotte, E. M., Mooney, R. J., Ramani. A. K., and Wong, Y. W. 2004. Comparative experiments on learning information extractors for proteins and their interactions. J. Artif. Intell. Medicine (Dec. 2004). Available online: http://www.sciencedirect.com/.
 
3
 
4
Gildea, D. 2001. Corpus variation and parser performance. In Proceedings of the Conference on Empirical Methods in Natural Language Processing.
 
5
Park, J. C., Kim, H. S., and Kim, J. J. 2001. Bidirectional incremental parsing for automatic pathway identification with combinatory categorical grammar. Pac. Symp. Biocomput.
 
6
Pustejovsky, J., Castano, J., Kotechi, M., and Cochran, B. 2002. Robust relational parsing over biomedical literature: Extracting inhibit relations. Pac. Symp. Biocomput. 362--373.
 
7
 
8
Rosenfeld, R. 1996. A maximum entropy approach to adaptive statistical language modeling. Computer, Speech and Language 10, 187--228.
 
9
Sekimizu, T., Park, H. S., and Tsuijii, J. 1998. Identifying the interaction between genes and gene products based on frequently seen verbs in Medline abstracts. In Proceedings of the Genome Informatics Workshop, 62--71.
 
10
 
11
Thomas, J., Milward, D., Ousounis, C., Pulman, S., and Carroll, M. 2000. Automatic extraction of protein interactions from scientific abstracts. Pac. Symp. Biocomput. 5, 541--552.
 
12
 
13
Yakushiji, A., Tateisi, Y., and Miyao, Y. 2001. Event extraction from biomedical papers using a full parser. Pac. Symp. Biocomput.
 
14

Collaborative Colleagues:
Eunju Kim: colleagues
Yu Song: colleagues
Cheongjae Lee: colleagues
Kyoungduk Kim: colleagues
Gary Geunbae Lee: colleagues
Byoung-Kee Yi: colleagues
Jeongwon Cha: colleagues