ACM Home Page
Please provide us with feedback. Feedback
Optimizing syntax patterns for discovering protein-protein interactions
Full text PdfPdf (188 KB)
Source Symposium on Applied Computing archive
Proceedings of the 2005 ACM symposium on Applied computing table of contents
Santa Fe, New Mexico
SESSION: Bioinformatics (BIO) table of contents
Pages: 195 - 201  
Year of Publication: 2005
ISBN:1-58113-964-0
Authors
Conrad Plake  Humboldt-Universität zu Berlin, Berlin, Germany
Jörg Hakenberg  Humboldt-Universität zu Berlin, Berlin, Germany
Ulf Leser  Humboldt-Universität zu Berlin, Berlin, Germany
Sponsor
SIGAPP: ACM Special Interest Group on Applied Computing
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 53,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066677.1066722
What is a DOI?

ABSTRACT

We propose a method for automated extraction of protein-protein interactions from scientific text. Our system matches sentences against syntax patterns typically describing protein interactions. We define a set of 22 patterns, each a regular expression consisting of anchor positions and parameterizable constraints. This small set is then refined and optimized using a genetic algorithm on a training set. No heuristic definitions are necessary, and the final pattern set can be generated completely without manual curation. Our method can be applied to any syntax pattern-based protein-protein interaction system and thus complements related work on building comprehensive sets of such patterns. The application of different fitness-functions during evolution provides an easy way to tune the system either toward precision, recall, or f-measure. We evaluate our system on two samples, one derived from the BioCreAtIvE corpus, the other from references in the DIP. The automatic refinement of patterns adds up to 16% to the precision, and 5% to the recall of our system. We additionally study the impact of a proper protein name recognition, which could improve precision by about 17% and recall by 12%.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
BioCreAtIvE Evaluation, 2003. http://www.pdg.cnb. uam.es/BioLINK/BioCreative.eval.html.
 
2
G. Bader, D. Betel, and C. H. CW. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Research, 31(1):248--250, Jan 1 2003. http://bind.ca/.
 
3
 
4
 
5
J. Hakenberg, S. Bickel, C. Plake, U. Brefeld, H. Zahn, L. Faulstich, U. Leser, and T. Scheffer. Systematic Feature Evaluation for Gene Name Recognition. BMC Bioinformatics, 2004. To appear.
 
6
L. Issel-Tarver, K. Christie, K. Dolinski, R. Andrada, R. Balakrishnan et al. Saccharomyces Genome Database. Methods Enzymol, 350:329--346, 2002.
 
7
E. Marcotte, I. Xenarios, and D. Eisenberg. Mining Literature for Protein Interactions. Bioinformatics, 17:359--363, April 2001.
 
8
T. Ono, H. Hishigaki, A. Tanigami, and T. Takagi. Automated extraction of information on protein-protein-interactions from the biological literature. Bioinformatics, 17(2): 155--161, 2001.
 
9
 
10
J. Pustejovsky, J. Castano, J. Zhang, M. Kotecki, and B. Cochran. Robust Relational Parsing over Biomedical Literature: Extracting Inhibit Relations. In Proc 7th Pac Symp Biocomput, pages 362--373, 2002.
 
11
L. Salwinski, C. Miller, A. Smith, F. Pettit, J. Bowie, and D. Eisenberg. The Database of Interacting Proteins: 2004 update. Nucleic Acids Research, 32, Database issue:D449--51, 2004.
 
12
T. Sekimizu, H. S. Park, and J. Tsujii. Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. In Proc Genome Informatics, volume 9, pages 62--71, 1998.
 
13
D. Wheeler, D. Church, S. Federhen, A. Lash, T. Madden et al Database Resources of the National Center for Biotechnology Information. Nucleic Acids Research, 31(1):28--33, 2003.
 
14
I. Xenarios, E. Fernandez, L. Salwinski, X. Duan, M. Thompson, E. Marcotte, and D. Eisenberg. DIP: the database of interacting proteins: 2001 update. Nucleic Acids Res, 29(1):239--241, 2001.
 
15
A. Yeh, A. Morgan, M. Colosimo, and L. Hirschman. BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics, 2004.

Collaborative Colleagues:
Conrad Plake: colleagues
Jörg Hakenberg: colleagues
Ulf Leser: colleagues