skip to main content
10.1145/2736277.2741626acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

GERBIL: General Entity Annotator Benchmarking Framework

Authors Info & Claims
Published:18 May 2015Publication History

ABSTRACT

We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.

References

  1. K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. Describing linked datasets with the void vocabulary, 2011. http://www.w3.org/TR/void/.Google ScholarGoogle Scholar
  2. M. Brummer, C. Baron, I. Ermilov, M. Freudenberg, D. Kontokostas, and S. Hellmann. DataID: Towards semantically rich metadata for complex datasets. In 10th International Conference on Semantic Systems 2014, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. E. Cano Basave, G. Rizzo, A. Varga, M. Rowe, M. Stankovic, and A.-S. Dadzie. Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In Proceedings of 4th Workshop on Making Sense of Microposts (#Microposts2014), 2014.Google ScholarGoogle Scholar
  4. S. Capadisli, S. Auer, and A.-C. Ngonga Ngomo. Linked SDMX data. Semantic Web Journal, 2013.Google ScholarGoogle Scholar
  5. D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J. P. Hsu, and K. Wang. ERD 2014: Entity recognition and disambiguation challenge. SIGIR Forum, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, and S. Trani. Dexter: an open source framework for entity linking. In Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Cornolti, P. Ferragina, and M. Ciaramita. A framework for benchmarking entity-annotation systems. In 22nd World Wide Web Conference, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Conference on Empirical Methods in Natural Language Processing-CoNLL, 2007.Google ScholarGoogle Scholar
  9. R. Cyganiak, D. Reynolds, and J. Tennison. The RDF Data Cube Vocabulary, 2014. http://www.w3.org/TR/vocab-data-cube/.Google ScholarGoogle Scholar
  10. G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel, and R. M. Weischedel. The automatic content extraction (ace) program-tasks, data, and evaluation. In LREC, 2004.Google ScholarGoogle Scholar
  11. M. V. Erp, G. Rizzo, and R. Troncy. Learning with the web: Spotting named entities on the intersection of NERD and machine learning. In Proceedings of the Making Sense of Microposts (#MSM2013) Concept Extraction Challenge, 2013.Google ScholarGoogle Scholar
  12. C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  13. P. Ferragina and U. Scaiella. Fast and Accurate Annotation of Short Texts with Wikipedia Pages. IEEE software, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Gil. Semantic challenges in getting work done, 2014. Invited Talk at ISWC.Google ScholarGoogle Scholar
  15. S. Hellmann, J. Lehmann, S. Auer, and M. Brummer. Integrating NLP using Linked Data. In 12th International Semantic Web Conference, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum. KORE: keyphrase overlap relatedness for entity disambiguation. In Proceedings of CIKM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In Conference on Empirical Methods in Natural Language Processing, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Jermyn, M. Dixon, and B. J. Read. Preparing clean views of data for data mining. ERCIM Work. on Database Res, 1999.Google ScholarGoogle Scholar
  19. A. Kilgarriff. Senseval: An exercise in evaluating word sense disambiguation programs. 1st LREC, 1998.Google ScholarGoogle Scholar
  20. T. Lebo, S. Sahoo, D. McGuinness, K. Belhajjame, J. Cheney, D. Corsar, D. Garijo, S. Soiland-Reyes, S. Zednik, and J. Zhao. PROV-O: The PROV Ontology, 2013. http://www.w3.org/TR/prov-o/.Google ScholarGoogle Scholar
  21. F. Maali, J. Erickson, and P. Archer. Data Catalog Vocabulary (DCAT), 2014. http://www.w3.org/TR/vocab-dcat/.Google ScholarGoogle Scholar
  22. P. McNamee. Overview of the tac 2009 knowledge base population track. 2009.Google ScholarGoogle Scholar
  23. M. McRoberts and V. Rodriguez-Doncel. Open Digital Rights Language (ODRL) Ontology, 2014. http://www.w3.org/ns/odrl/2/.Google ScholarGoogle Scholar
  24. P. N. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents. In 7th International Conference on Semantic Systems (I-Semantics), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Milne and I. H. Witten. Learning to link with wikipedia. In 17th ACM CIKM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Moro, F. Cecconi, and R. Navigli. Multilingual word sense disambiguation and entity linking for everybody. In Proc. of ISWC (P&D), 2014.Google ScholarGoogle Scholar
  27. A. Moro, A. Raganato, and R. Navigli. Entity Linking meets Word Sense Disambiguation: A Unified Approach. TACL, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  28. R. Navigli, D. Jurgens, and D. Vannella. SemEval-2013 Task 12: Multilingual Word Sense Disambiguation. In Proceedings of SemEval-2013, 2013.Google ScholarGoogle Scholar
  29. R. Navigli, K. C. Litkowski, and O. Hargraves. SemEval-2007 Task 07: Coarse-Grained English All-Words Task. In Proc. of SemEval-2007, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Navigli and S. P. Ponzetto. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. D. Peng. Reproducible research in computational science. Science (New York, Ny), 2011.Google ScholarGoogle Scholar
  32. F. Piccinno and P. Ferragina. From TagME to WAT: a new entity annotator. In Proceedings of the first international workshop on Entity recognition & disambiguation, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S. S. Pradhan, E. Loper, D. Dligach, and M. Palmer. SemEval-2007 task 17: English lexical sample, SRL and all words. In Proc. of SemEval-2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Rizzo, M. van Erp, and R. Troncy. Benchmarking the extraction and disambiguation of named entities on the semantic web. In Proceedings of the 9th International Conference on Language Resources and Evaluation, 2014.Google ScholarGoogle Scholar
  36. M. Roder, R. Usbeck, S. Hellmann, D. Gerber, and A. Both. N3 - a collection of datasets for named entity recognition and disambiguation in the nlp interchange format. In 9th LREC, 2014.Google ScholarGoogle Scholar
  37. M. Rowe, M. Stankovic, and A.-S. Dadzie, editors. Proceedings, 4th Workshop on Making Sense of Microposts (#Microposts2014): Big things come in small packages, Seoul, Korea, 7th April 2014, 2014.Google ScholarGoogle Scholar
  38. B. Snyder and M. Palmer. The English all-words task. In Proc. of Senseval-3, pages 41--43, 2004.Google ScholarGoogle Scholar
  39. R. Speck and A.-C. N. Ngomo. Ensemble learning for named entity recognition. In The Semantic Web -- ISWC 2014. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. N. Steinmetz, M. Knuth, and H. Sack. Statistical analyses of named entity disambiguation benchmarks. In 1st Workshop on NLP&DBpedia 2013, 2013.Google ScholarGoogle Scholar
  41. N. Steinmetz and H. Sack. Semantic multimedia information retrieval based on contextual descriptions. In P. Cimiano, O. Corcho, V. Presutti, L. Hollink, and S. Rudolph, editors, The Semantic Web: Semantics and Big Data. 2013.Google ScholarGoogle Scholar
  42. B. M. Sundheim. Tipster/muc-5: Information extraction system evaluation. In Proceedings of the 5th Conference on Message Understanding, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. E. F. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL-2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. Usbeck, A.-C. Ngonga Ngomo, M. Roder, D. Gerber, S. Coelho, S. Auer, and A. Both. Agdistis - graph-based disambiguation of named entities using linked data. In The Semantic Web -- ISWC 2014. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GERBIL: General Entity Annotator Benchmarking Framework

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '15: Proceedings of the 24th International Conference on World Wide Web
      May 2015
      1460 pages
      ISBN:9781450334693

      Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      • Published: 18 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader