|
ABSTRACT
Expressions of negation in the biomedical literature often encode information of contrast as a means for explaining significant differences between the objects that are so contrasted. We show that such information gives additional insights into the nature of the structures and/or biological functions of these objects, leading to valuable knowledge for subcategorization of protein families by the properties that the involved proteins do not have in common. Based on the observation that the expressions of negation employ mostly predictable syntactic structures that can be characterized by subclausal coordination and by clause-level parallelism, we present a system that extracts such contrastive information by identifying those syntactic structures with natural language processing techniques and with additional linguistic resources for semantics. The implemented system shows the performance of 85.7% precision and 61.5% recall, including 7.7% partial recall, or an F score of 76.6. We apply the system to the biological interactions as extracted by our biomedical information-extraction system in order to enrich proteome databases with contrastive information.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
Alfarano, C., et al. 2005. The Biomolecular Interaction Network Database and related tools 2005 update. Nucl. Acids. Res. 33(Database Issue), D418--424.
|
| |
2
|
|
| |
3
|
Bader, G., Betel, D., and Hogue, C. 2003. BIND: The biomolecular interaction network database. Nucl. Acids. Res., 31, 1, 248--250.
|
| |
4
|
Boeckmann, B., et al. 2003. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl. Acids. Res., 31, 1, 365--370.
|
| |
5
|
K. Bretonnel Cohen , George K. Acquaah-Mensah , Andrew E. Dolbey , Lawrence Hunter, Contrast and variability in gene names, Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, p.14-20, July 11-11, 2002, Phildadelphia, Pennsylvania
[doi> 10.3115/1118149.1118152]
|
| |
6
|
Donaldson, I., et al. 2003. PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics, 4--11.
|
| |
7
|
Fellbaum, C. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA.
|
| |
8
|
Friedman, C., Alderson, P., Austin, J., Cimino, J., and Johnson, S. 1994. A general natural-language text processor for clinical radiology. J. Am. Med. Inform. Assoc., 1, 2, 161--174.
|
| |
9
|
Horn, L. 1989. A Natural History of Negation. University of Chicago Press, Chicago, IL.
|
| |
10
|
Kim, J. and Park, J. 2004a. Annotation of gene products in the literature with Gene Ontology terms using syntactic dependencies. In Proc. International Joint Conference on Natural Language Processing. 528--34.
|
| |
11
|
Kim, J. and Park, J. 2004b. BioIE: Retargetable information extraction and ontological annotation of biological interactions from the literature. J. Bioinformatics and Computational Biology 2, 3, 551--568.
|
| |
12
|
Marcotte, E., Xenarios, I., and Eisenberg, D. 2001. Mining literature for protein-protein interactions. Bioinformatics 17, 4, 359--363.
|
| |
13
|
Mulder, N., et al. 2003. The InterPro database, 2003 brings increased coverage and new features. Nucl. Acids. Res. 31, 1, 315--318.
|
| |
14
|
Mutalik, P., Deshpande, A., and Nadkarni, P. 2001. Use of general-purpose negation detection to augment concept indexing of medical documents: A quantitative study using the UMLS. J. Am. Med. Inform. Assoc. 8, 6, 598--609.
|
| |
15
|
Ono, T., Hishigaki, H., Tanigami, A., and Takagi, T. 2001. Automated extraction of information on protein-protein interactions from the biological literature. Bioinformatics 17, 2, 155--161.
|
| |
16
|
Prince, E. 1992. The ZPG letter: Subjects, definiteness and information-status. In Discourse Description: Diverse Analyses of a Fund-Raising Text. W. Mann and S. Thompson, Ed., John Benjamins, Amsterdam. 295--325.
|
| |
17
|
Robert, S., et al. 1993. More informative abstracts of articles describing clinical practice guidelines. Annals of Internal Medicine, 118, 9, 731--737.
|
| |
18
|
|
| |
19
|
The Gene Ontology Consortium. 2004. The Gene Ontology (GO) database and informatics resource. Nucl. Acids. Res. 32(Database issue), D258--261.
|
| |
20
|
Thompson, G., Pacheco, E., Melo, E., and Castilho, B. 2000. Conserved sequences in the beta subunit of archaeal and eukaryal translation initiation factor 2 (eIF2), absent from eIF5, mediate interaction with eIF2gamma. Biochem. J. 347, 703--709.
|
| |
21
|
Xenarios, I., et al. 2002. DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucl. Acids. Res. 30, 1, 303--305.
|
|