ABSTRACT
The number of biomedical literatures is increasing at a considerable rate, and the information is growing continuously and fast as well. Accordingly, information retrieval is more and more important to support biomedical researches. However, it often retrieves too many literatures or too few literatures for the target gene/proteins or relations. And extremely various synonyms of the gene and protein names make information retrieval more difficult to support biomedical researches. To overcome these difficulties, we propose a unified biomedical workbench with mining and probing literatures. The proposed workbench is composed of searching/collecting, literature mining, relation probing, and statistics analysis. It provides searching and collecting literatures of Pubmed articles and USPTO patents. And, to extract biomedical relations, the collected literatures are mined using text mining techniques such as named entity recognition, gene/protein name normalization, and relation extraction. Users can probe their target relations using these extracted relation information, shown in form of relation network. Finally, the workbench provides statistics information of literature meta data such as authors, organizations, publication years and so on. That is, the proposed workbench provides unified literature-based functions from searching to probing, and including text mining and statistics analysis.
- M. Stephens, M. Palakal, S. Mukhopadhyay, R. Raje and J. Mostafa, "Detecting gene relations from Medline abstracts," Pacific Symposium on Biocomputing, 6, 2001, pp. 483--495.Google Scholar
- L. J. Jensen, J. Saric and P. Bork, "Literature mining for the biologist: from information retrieval to biological discovery," Nature Reviews Genetics, 7, 2006, pp. 119--129.Google ScholarCross Ref
- M. Krallinger, R. A. Erhardt and A. Valencia, "Text-mining approaches in molecular biology and biomedicine," Drug Discovery Today, v.10, 6, 2005, pp. 439--445.Google ScholarCross Ref
- J. Saric, L. J. Jensen, R. Ouzounova, I. Rojas and P. Bork, "Extraction of regulatory gene/protein networks from Medline," Bioinformatics 22, no.6, 2005, pp. 645--650. Google ScholarDigital Library
- N. Daraselia, A. Yuryev, S. Egorov, S. Novichkova, A. Nikitin and I. Mazo, "Extracting human protein interactions from MEDLINE using a full-sentence parser," Bioinformatics 20, 2004, pp. 604--611. Google ScholarDigital Library
- C. Friedman, P. Kra, H. Yu, M. Krauthammer and A. Rzhetsky, "GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles," Bioinformatics 17, 2001, S74--S82.Google ScholarCross Ref
- A. Rzhetsky et al., "GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data," Journal of Biomedical Informatics, 37, 2004, pp. 43--53. Google ScholarDigital Library
- Joon-Ho Lim, Hyunchul Jang, Jaesoo Lim, Soo-Jun Park, "Normalization of Gene/Protein Names in Biological Literatures using Vector-Space Model", EMBS 2007, pp. 390--393Google Scholar
- H. Jang, Jaesoo Lim, J. Lim, Soo-Jun. Park, S. Park and K. Lee, "Extracting Protein-Protein Interactions in Biomedical Literature Using an Existing Syntactic Parser," KDLL, LNBI, 3886, 2006, pp. 78--90. Google ScholarDigital Library
- Jang, H., Lim, J., Lim, J. H., Park, S. J., Lee, K. C., and Park, S. H., "Finding the evidence for protein-protein interactions from PubMed abstracts", Bioinformatics, 22(14):e220--226, 2006. Google ScholarDigital Library
- Entrez Utilities. Available: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help. htmlGoogle Scholar
- E. Brill, "Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging," Computational Linguistics v.21 n.4 (2002) 543--565 Google ScholarDigital Library
- Stanford Natural Language Processing Group - Stanford Parser: Available: http://www-nlp.stanford.edu/software/lexparser.shtmlGoogle Scholar
- OpenNLP maxent package. Available: http://maxent.sourceforge.net/Google Scholar
Index Terms
- BioProber2.0: a unified biomedical workbench with mining and probing literatures
Recommendations
Web personal name disambiguation based on reference entity tables mined from the web
WIDM '09: Proceedings of the eleventh international workshop on Web information and data managementAmbiguous personal names are common on the Web, which pose a challenge for many different tasks. The traditional disambiguation employs the clustering methods. However, without reference entity tables, the clustering method can only identify whether two ...
Comparison of Methods to Annotate Named Entity Corpora
The authors compared two methods for annotating a corpus for the named entity (NE) recognition task using non-expert annotators: (i) revising the results of an existing NE recognizer and (ii) manually annotating the NEs completely. The annotation time, ...
Named entity recognition and disambiguation using linked data and graph-based centrality scoring
SWIM '12: Proceedings of the 4th International Workshop on Semantic Web Information ManagementNamed Entity Recognition (NER) is a subtask of information extraction and aims to identify atomic entities in text that fall into predefined categories such as person, location, organization, etc. Recent efforts in NER try to extract entities and link ...
Comments