skip to main content
10.1145/2837689.2837701acmotherconferencesArticle/Chapter ViewAbstractPublication PagesgirConference Proceedingsconference-collections
research-article

Tagging of temporal expressions and geological features in scientific articles

Published:26 November 2015Publication History

ABSTRACT

We investigate tagging figure and table captions in scientific articles from geology to support visualization of research findings on maps and time-lines. Our proposed approach comprises identifying geological time expressions and geographic and geologic locations without requiring large pre-annotated data. Different tagging approaches are tested and evaluated on a corpus of captions extracted from scientific geological articles. Our baseline method builds on geologic timescale ontologies and GeoNames as gazetteers to facilitate lookup of times and location names. The baseline is evaluated on a development set of captions from 20 documents and the results are analyzed manually to identify causes for tagging errors. We found that the poor performance of the baseline approach is mainly due to i) lack of coverage in the gazetteers, ii) incorrect tagging of person names as location names, and iii) a simplistic gazetteer lookup for capitalized words. We augmented the baseline approach by extending the gazetteers, by adding reference identification to block person names being tagged as locations, by filtering trivial matches, and by augmenting the lookup by correcting capitalization using true casing of words. The different configurations of our extended approach were evaluated on a test set of 80 documents, achieving an improved precision and recall of more than 90%.

References

  1. K. Cohen, S. Finney, and P. Gibbard. International chronostratigraphic chart. Technical report, International Commission on Stratigraphy, 2015.Google ScholarGoogle Scholar
  2. S. Cox and S. Richard. A geologic timescale ontology and service. Earth Science Informatics, 8(1):5--19, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  3. L. Ferro, L. Gerber, I. Mani, B. Sundheim, and G. Wilson. TIDES 2005 standard for the annotation of temporal expressions. Technical report, Mitre, 2005.Google ScholarGoogle Scholar
  4. J. R. Finkel, T. Grenager, and C. Manning. Incorporating non-local information into information extraction systems by Gibbs sampling. In Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pages 363--370. ACL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Gey, R. Larson, J. Machado, and M. Yoshioka. NTCIR 9-GeoTime overview - Evaluating geographic and temporal search: Round 2. In Proceedings of NTCIR-9 Workshop Meeting. NTCIR, Tokyo, Japan, 2011.Google ScholarGoogle Scholar
  6. L. V. Lita, A. Ittycheriah, S. Roukos, and N. Kambhatla. tRuEcasIng. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL '03, pages 152--159, Stroudsburg, PA, USA, 2003. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. X. Ma and P. Fox. Recent progress on geologic time ontologies and considerations for future works. Earth Science Informatics, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  8. T. Mandl, P. Carvalho, G. M. D. Nunzio, F. C. Gey, R. R. Larson, D. Santos, and C. Womser-Hacker. GeoClef 2008: The CLEF 2008 cross-language geographic information retrieval track overview. In C. Peters, T. Deselaers, N. Ferro, J. Gonzalo, G. J. F. Jones, M. Kurimo, T. Mandl, A. Peñas, and V. Petras, editors, Evaluating Systems for Multilingual and Multimodal Information Access, 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, 2008, Revised Selected Papers, volume 5706 of Lecture Notes in Computer Science, pages 808--821. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. Mitchell, S. Guptill, K. Anderson, R. Fegeas, and C. Hallam. GIRAS; a geographic information retrieval and analysis system for handling land use and land cover data. Technical Report 1059, USGS, 1977.Google ScholarGoogle Scholar
  10. A. Rae, V. Murdock, A. Popescu, and H. Bouchard. Mining the web for points of interest. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '12, pages 711--720. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Santos, N. Cardoso, P. Carvalho, I. Dornescu, S. Hartrumpf, J. Leveling, and Y. Skalban. GikiP at GeoCLEF 2008: Joining GIR and QA forces for querying Wikipedia. In C. Peters, T. Deselaers, N. Ferro, J. Gonzalo, G. J. F. Jones, M. Kurimo, T. Mandl, A. Peñas, and V. Petras, editors, Evaluating Systems for Multilingual and Multimodal Information Access: 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Aarhus, Denmark, September 17-19, Revised Selected Papers, volume 5706 of Lecture Notes in Computer Science (LNCS), pages 894--905. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Sobhana, P. Mitra, and S. Ghosh. Conditional random field based named entity recognition in geological text. International Journal of Computer Applications, 1(3):143--147, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. G. Woodruff and C. Plaunt. GIPSY: Automated geographic indexing of text documents. J. Am. Soc. Inf. Sci., 45(9):645--655, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tagging of temporal expressions and geological features in scientific articles

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            GIR '15: Proceedings of the 9th Workshop on Geographic Information Retrieval
            November 2015
            90 pages
            ISBN:9781450339377
            DOI:10.1145/2837689

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 26 November 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate46of61submissions,75%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader