skip to main content
article

Terminology-based knowledge mining for new knowledge discovery

Published: 01 March 2006 Publication History

Abstract

In this article we present an integrated knowledge-mining system for the domain of biomedicine, in which automatic term recognition, term clustering, information retrieval, and visualization are combined. The primary objective of this system is to facilitate knowledge acquisition from documents and aid knowledge discovery through terminology-based similarity calculation and visualization of automatically structured knowledge. This system also supports the integration of different types of databases and simultaneous retrieval of different types of knowledge. In order to accelerate knowledge discovery, we also propose a visualization method for generating similarity-based knowledge maps. The method is based on real-time terminology-based knowledge clustering and categorization and allows users to observe real-time generated knowledge maps, graphically. Lastly, we discuss experiments using the GENIA corpus to assess the practicality and applicability of the system.

References

[1]
Ananiadou, S. and Nenadic, G. 2006. Automatic terminology management in biomedicine. In Text Mining for Biology and Biomedicine, S. Ananiadou and J. McNaught (eds), Artech House, Norwood, MA, Ch.4, 67--98.]]
[2]
Ananiadou, S., Friedman, C., and Tsujii, J. (Eds). 2004. Named entity recognition in biomedicine. J. Biomedical Informatics 37, 6. Special issue.]]
[3]
Berners-Lee, T. 1998. The Semantic Web as a language of logic. Available at: www.w3.org/DesignIssues/Logic.html.]]
[4]
Brickle, D. and Guha, R. 2000. Resource description framework (RDF) schema specification 1.0, W3C Candidate Recommendation. Available at: http://www.w3.org/TR/rdf-schema.]]
[5]
Collier, N., Nobata, C., and Tsujii, J. 2000. Extracting the names of genes and gene products with a hidden Markov model. In Proceedings of the International Conference on Computational Linguistics (COLING 2000, Saarbrücken, Germany), 201--207.]]
[6]
Frantzi, K., Ananiadou, S., and Mima, H. 2000. Automatic recognition of multi-word terms. Int. J. Digital Libraries 3, 2, 117--132. Special issue.]]
[7]
Fukuda, K., Tsunoda, T., Tamura, A., and Takagi, T. 1998. Toward information extraction: Identifying protein names from biological papers. In Proceedings of the PSB-98 (Hawaii), 705--716.]]
[8]
Gaizauskas, R., Demetriou, G., and Humphreys, K. 2000. Term recognition and classification in biological science journal articles. In Proceedings of the Workshop on Computational Terminology for Medical and Biological Applications (NLP-2000, Patras, Greece), 37--44.]]
[9]
Gamper, J., Nejdl, W., and Wolpers, M. 1999. Combining ontologies and terminologies in information systems. In Proceedings of the 5th International Congress on Terminology and Knowledge Engineering, (Innsbruck, Austria), 152--168.]]
[10]
Genia Project. 2002. Genia project home page. www-tsujii.is.s.u-tokyo.ac.jp/GENIA/.]]
[11]
Hatzivassiloglou, V., Duboue, P., and Rzhetsky, A. 2001. Disambiguating proteins, genes, and RNA in text: A machine learning approach. Bioinformatics 17, 1, S97--S106.]]
[12]
Jacquemin, C. 2001. Spotting and Discovering Terms through NLP. MIT Press, Cambridge, MA, 378.]]
[13]
Krauthammer, M., Rzhetsky, A., Morozov, P., and Friedman, C. 2000. Using BLAST for identifying gene and protein names in journal articles. Gene 259, 245--252.]]
[14]
Krauthammer, M. and Nenadic, G. 2004. Term identification in the biomedical literature. J. Biomedical Informatics. Special issue on named entity recognition in biomedicine.]]
[15]
Medline (National Library of Medicine). 2002. http://www.ncbi.nlm.nih.gov/ /.]]
[16]
Mima, H., Ananiadou, S., and Nenadic, G. 2001a. ATRACT workbench: An automatic term recognition and clustering of terms. In Text, Speech and Dialogue, V. Matoušek et al. (eds.), LNAI 2166, Springer Verlag, 126--133.]]
[17]
Mima, H. and Ananiadou, S. 2001b. An application and evaluation of the C/NC-value approach for the automatic term recognition of multi-word units in Japanese. Int. J. Terminology 6/2, 175--194.]]
[18]
Nenadic, G., Ananiadou, S., and McNaught, J. 2004. Enhancing automatic term recognition through term variation, In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004, Geneva, Switzerland).]]
[19]
Spasic, I., Ananiadou, S., and Tsujii, J. 2005a. Masterclass: A case-based reasoning system for the classification of biomedical terms. Bioinformatics 21, 11, 2748--2758.]]
[20]
Spasic, I., Ananiadou, S., McNaught, J., and Kumar, A. 2005b. Text mining and ontologies in biomedicine: Making sense of raw text. Briefings in Bioinformatics 6, 3, 239--251.]]
[21]
TinySVM. 2004. http://chasen.org/~taku/software/TinySVM/.]]
[22]
UMLS. 2004. http://www.nlm.nih.gov/research/umls/.]]
[23]
Ushioda, A. 1996. Hierarchical clustering of words. In Proceedings of the International Conference on Computational Linguistics (COLING 1996, Copenhagen, Denmark), 1159--1162.]]
[24]
Visser, P. R. S., Jones, D. M., Bench-Capon, T. J. M., and Shave, M. J. R. 1997. An analysis of ontology mismatches---Heterogeneity versus interoperability. In Proceedings of the AAAI 1997 Spring Symposium on Ontological Engineering (Stanford University, Stanford, CA), 164--172.]]
[25]
Voutilainen, A. and Heikkila, J. 1993. An English constraint grammar (ENGCG), a surface-syntactic parser of English. In Creating and Using English Language Corpora, U. Fries et al. (eds.), Rodopi, Amsterdam, 189--199.]]

Cited By

View all
  • (2017)Designing Research for Monitoring Humanities-based Interdisciplinary Studies: A Case of Cultural Resources Studies (Bunkashigengaku 文化資源学) in JapanJournal of the Japanese Association for Digital Humanities10.17928/jjadh.2.1_602:1(60-72)Online publication date: 2017
  • (2014)Research on the Knowledge Character and Classification of Intangible Cultural HeritageApplied Mechanics and Materials10.4028/www.scientific.net/AMM.643.153643(153-158)Online publication date: Sep-2014
  • (2014)A Weighted Density-Based Approach for Identifying Standardized Items that are Significantly Related to the Biological LiteratureData Mining for Service10.1007/978-3-642-45252-9_6(79-96)Online publication date: 4-Jan-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Asian Language Information Processing
ACM Transactions on Asian Language Information Processing  Volume 5, Issue 1
March 2006
88 pages
ISSN:1530-0226
EISSN:1558-3430
DOI:10.1145/1131348
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2006
Published in TALIP Volume 5, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Automatic term recognition
  2. biomedicine
  3. natural language processing
  4. structuring knowledge
  5. terminology
  6. visualization

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Designing Research for Monitoring Humanities-based Interdisciplinary Studies: A Case of Cultural Resources Studies (Bunkashigengaku 文化資源学) in JapanJournal of the Japanese Association for Digital Humanities10.17928/jjadh.2.1_602:1(60-72)Online publication date: 2017
  • (2014)Research on the Knowledge Character and Classification of Intangible Cultural HeritageApplied Mechanics and Materials10.4028/www.scientific.net/AMM.643.153643(153-158)Online publication date: Sep-2014
  • (2014)A Weighted Density-Based Approach for Identifying Standardized Items that are Significantly Related to the Biological LiteratureData Mining for Service10.1007/978-3-642-45252-9_6(79-96)Online publication date: 4-Jan-2014
  • (2010)Developing Creative Mindset Through Engineering Experiments with a Wiki-based Knowledge Sharing SystemJournal of JSEE10.4307/jsee.58.4_11558:4(115-120)Online publication date: 2010
  • (2009)Design and Implementation of an Issue-oriented Automatic Syllabus Categorization SystemJournal of Natural Language Processing10.5715/jnlp.16.4_9116:4(91-106)Online publication date: 2009
  • (2009)Knowledge Representation of "Funeral Dance" Based on CIDOC CRMProceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling - Volume 0110.1109/KAM.2009.163(39-42)Online publication date: 30-Nov-2009
  • (2008)Knowledge Discovery from Virtual Enterprise Model Based on Semantic AnnotationProceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 0510.1109/FSKD.2008.150(546-551)Online publication date: 18-Oct-2008

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media