research-article

GERBIL: General Entity Annotator Benchmarking Framework

Authors:
Ricardo Usbeck

Leipzig University, IFI/AKSW, Unister GmbH, Leipzig, Germany

Leipzig University, IFI/AKSW, Unister GmbH, Leipzig, Germany
View Profile

,
Michael Röder

Leipzig University, IFI/AKSW, Unister GmbH, Leipzig, Germany

Leipzig University, IFI/AKSW, Unister GmbH, Leipzig, Germany
View Profile

,
Axel-Cyrille Ngonga Ngomo

Leipzig University, IFI/AKSW, Leipzig, Germany

Leipzig University, IFI/AKSW, Leipzig, Germany
View Profile

,
Ciro Baron

Leipzig University, IFI/AKSW, Leipzig, Germany

Leipzig University, IFI/AKSW, Leipzig, Germany
View Profile

,
Andreas Both

Unister GmbH, Leizpig, Germany

Unister GmbH, Leizpig, Germany
View Profile

,
Martin Brümmer

Leipzig University, IFI/AKSW, Leipzig, Germany

Leipzig University, IFI/AKSW, Leipzig, Germany
View Profile

,
Diego Ceccarelli

ISTI-CNR, Pisa, Italy

ISTI-CNR, Pisa, Italy
View Profile

,
Marco Cornolti

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Didier Cherix

Unister GmbH, Leipzig, Germany

Unister GmbH, Leipzig, Germany
View Profile

,
Bernd Eickmann

Unister GmbH, Leipzig, Germany

Unister GmbH, Leipzig, Germany
View Profile

,
Paolo Ferragina

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Christiane Lemke

Unister GmbH, Leipzig, Germany

Unister GmbH, Leipzig, Germany
View Profile

,
Andrea Moro

Sapienza University of Rome, Rome, Italy

Sapienza University of Rome, Rome, Italy
View Profile

,
Roberto Navigli

Sapienza University of Rome, Rome, Italy

Sapienza University of Rome, Rome, Italy
View Profile

,
Francesco Piccinno

University of Pisa, Pisa, Italy

University of Pisa, Pisa, Italy
View Profile

,
Giuseppe Rizzo

Eurecom, Biot, France

Eurecom, Biot, France
View Profile

,
Harald Sack

HPI Potsdam, Potsdam, Germany

HPI Potsdam, Potsdam, Germany
View Profile

,
René Speck

Leipzig University, IFI/AKSW, Leipzig, Germany

Leipzig University, IFI/AKSW, Leipzig, Germany
View Profile

,
Raphaël Troncy

Eurecom, Biot, France

Eurecom, Biot, France
View Profile

,
Jörg Waitelonis

HPI Potsdam, Potsdam, Germany

HPI Potsdam, Potsdam, Germany
View Profile

,
Lars Wesemann

Unister GmbH, Leipzig, Germany

Unister GmbH, Leipzig, Germany
View Profile

WWW '15: Proceedings of the 24th International Conference on World Wide WebMay 2015Pages 1133–1143https://doi.org/10.1145/2736277.2741626

Published:18 May 2015Publication History

WWW '15: Proceedings of the 24th International Conference on World Wide Web

Pages 1133–1143

ABSTRACT

We present GERBIL, an evaluation framework for semantic entity annotation. The rationale behind our framework is to provide developers, end users and researchers with easy-to-use interfaces that allow for the agile, fine-grained and uniform evaluation of annotation tools on multiple datasets. By these means, we aim to ensure that both tool developers and end users can derive meaningful insights pertaining to the extension, integration and use of annotation applications. In particular, GERBIL provides comparable results to tool developers so as to allow them to easily discover the strengths and weaknesses of their implementations with respect to the state of the art. With the permanent experiment URIs provided by our framework, we ensure the reproducibility and archiving of evaluation results. Moreover, the framework generates data in machine-processable format, allowing for the efficient querying and post-processing of evaluation results. Finally, the tool diagnostics provided by GERBIL allows deriving insights pertaining to the areas in which tools should be further refined, thus allowing developers to create an informed agenda for extensions and end users to detect the right tools for their purposes. GERBIL aims to become a focal point for the state of the art, driving the research agenda of the community by presenting comparable objective evaluation results.

References

K. Alexander, R. Cyganiak, M. Hausenblas, and J. Zhao. Describing linked datasets with the void vocabulary, 2011. http://www.w3.org/TR/void/.Google Scholar
M. Brummer, C. Baron, I. Ermilov, M. Freudenberg, D. Kontokostas, and S. Hellmann. DataID: Towards semantically rich metadata for complex datasets. In 10th International Conference on Semantic Systems 2014, 2014. Google ScholarDigital Library
A. E. Cano Basave, G. Rizzo, A. Varga, M. Rowe, M. Stankovic, and A.-S. Dadzie. Making sense of microposts (#microposts2014) named entity extraction & linking challenge. In Proceedings of 4th Workshop on Making Sense of Microposts (#Microposts2014), 2014.Google Scholar
S. Capadisli, S. Auer, and A.-C. Ngonga Ngomo. Linked SDMX data. Semantic Web Journal, 2013.Google Scholar
D. Carmel, M.-W. Chang, E. Gabrilovich, B.-J. P. Hsu, and K. Wang. ERD 2014: Entity recognition and disambiguation challenge. SIGIR Forum, 2014. Google ScholarDigital Library
D. Ceccarelli, C. Lucchese, S. Orlando, R. Perego, and S. Trani. Dexter: an open source framework for entity linking. In Proceedings of the sixth international workshop on Exploiting semantic annotations in information retrieval, 2013. Google ScholarDigital Library
M. Cornolti, P. Ferragina, and M. Ciaramita. A framework for benchmarking entity-annotation systems. In 22nd World Wide Web Conference, 2013. Google ScholarDigital Library
S. Cucerzan. Large-scale named entity disambiguation based on wikipedia data. In Conference on Empirical Methods in Natural Language Processing-CoNLL, 2007.Google Scholar
R. Cyganiak, D. Reynolds, and J. Tennison. The RDF Data Cube Vocabulary, 2014. http://www.w3.org/TR/vocab-data-cube/.Google Scholar
G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. Strassel, and R. M. Weischedel. The automatic content extraction (ace) program-tasks, data, and evaluation. In LREC, 2004.Google Scholar
M. V. Erp, G. Rizzo, and R. Troncy. Learning with the web: Spotting named entities on the intersection of NERD and machine learning. In Proceedings of the Making Sense of Microposts (#MSM2013) Concept Extraction Challenge, 2013.Google Scholar
C. Fellbaum. WordNet: An Electronic Lexical Database. MIT Press, 1998.Google ScholarCross Ref
P. Ferragina and U. Scaiella. Fast and Accurate Annotation of Short Texts with Wikipedia Pages. IEEE software, 2012. Google ScholarDigital Library
Y. Gil. Semantic challenges in getting work done, 2014. Invited Talk at ISWC.Google Scholar
S. Hellmann, J. Lehmann, S. Auer, and M. Brummer. Integrating NLP using Linked Data. In 12th International Semantic Web Conference, 2013. Google ScholarDigital Library
J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum. KORE: keyphrase overlap relatedness for entity disambiguation. In Proceedings of CIKM, 2012. Google ScholarDigital Library
J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In Conference on Empirical Methods in Natural Language Processing, 2011. Google ScholarDigital Library
P. Jermyn, M. Dixon, and B. J. Read. Preparing clean views of data for data mining. ERCIM Work. on Database Res, 1999.Google Scholar
A. Kilgarriff. Senseval: An exercise in evaluating word sense disambiguation programs. 1st LREC, 1998.Google Scholar
T. Lebo, S. Sahoo, D. McGuinness, K. Belhajjame, J. Cheney, D. Corsar, D. Garijo, S. Soiland-Reyes, S. Zednik, and J. Zhao. PROV-O: The PROV Ontology, 2013. http://www.w3.org/TR/prov-o/.Google Scholar
F. Maali, J. Erickson, and P. Archer. Data Catalog Vocabulary (DCAT), 2014. http://www.w3.org/TR/vocab-dcat/.Google Scholar
P. McNamee. Overview of the tac 2009 knowledge base population track. 2009.Google Scholar
M. McRoberts and V. Rodriguez-Doncel. Open Digital Rights Language (ODRL) Ontology, 2014. http://www.w3.org/ns/odrl/2/.Google Scholar
P. N. Mendes, M. Jakob, A. Garcia-Silva, and C. Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents. In 7th International Conference on Semantic Systems (I-Semantics), 2011. Google ScholarDigital Library
D. Milne and I. H. Witten. Learning to link with wikipedia. In 17th ACM CIKM, 2008. Google ScholarDigital Library
A. Moro, F. Cecconi, and R. Navigli. Multilingual word sense disambiguation and entity linking for everybody. In Proc. of ISWC (P&D), 2014.Google Scholar
A. Moro, A. Raganato, and R. Navigli. Entity Linking meets Word Sense Disambiguation: A Unified Approach. TACL, 2014.Google ScholarCross Ref
R. Navigli, D. Jurgens, and D. Vannella. SemEval-2013 Task 12: Multilingual Word Sense Disambiguation. In Proceedings of SemEval-2013, 2013.Google Scholar
R. Navigli, K. C. Litkowski, and O. Hargraves. SemEval-2007 Task 07: Coarse-Grained English All-Words Task. In Proc. of SemEval-2007, 2007. Google ScholarDigital Library
R. Navigli and S. P. Ponzetto. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 2012. Google ScholarDigital Library
R. D. Peng. Reproducible research in computational science. Science (New York, Ny), 2011.Google Scholar
F. Piccinno and P. Ferragina. From TagME to WAT: a new entity annotator. In Proceedings of the first international workshop on Entity recognition & disambiguation, 2014. Google ScholarDigital Library
S. S. Pradhan, E. Loper, D. Dligach, and M. Palmer. SemEval-2007 task 17: English lexical sample, SRL and all words. In Proc. of SemEval-2007. Google ScholarDigital Library
L. Ratinov, D. Roth, D. Downey, and M. Anderson. Local and global algorithms for disambiguation to wikipedia. In ACL, 2011. Google ScholarDigital Library
G. Rizzo, M. van Erp, and R. Troncy. Benchmarking the extraction and disambiguation of named entities on the semantic web. In Proceedings of the 9th International Conference on Language Resources and Evaluation, 2014.Google Scholar
M. Roder, R. Usbeck, S. Hellmann, D. Gerber, and A. Both. N3 - a collection of datasets for named entity recognition and disambiguation in the nlp interchange format. In 9th LREC, 2014.Google Scholar
M. Rowe, M. Stankovic, and A.-S. Dadzie, editors. Proceedings, 4th Workshop on Making Sense of Microposts (#Microposts2014): Big things come in small packages, Seoul, Korea, 7th April 2014, 2014.Google Scholar
B. Snyder and M. Palmer. The English all-words task. In Proc. of Senseval-3, pages 41--43, 2004.Google Scholar
R. Speck and A.-C. N. Ngomo. Ensemble learning for named entity recognition. In The Semantic Web -- ISWC 2014. 2014. Google ScholarDigital Library
N. Steinmetz, M. Knuth, and H. Sack. Statistical analyses of named entity disambiguation benchmarks. In 1st Workshop on NLP&DBpedia 2013, 2013.Google Scholar
N. Steinmetz and H. Sack. Semantic multimedia information retrieval based on contextual descriptions. In P. Cimiano, O. Corcho, V. Presutti, L. Hollink, and S. Rudolph, editors, The Semantic Web: Semantics and Big Data. 2013.Google Scholar
B. M. Sundheim. Tipster/muc-5: Information extraction system evaluation. In Proceedings of the 5th Conference on Message Understanding, 1993. Google ScholarDigital Library
E. F. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL-2003. Google ScholarDigital Library
R. Usbeck, A.-C. Ngonga Ngomo, M. Roder, D. Gerber, S. Coelho, S. Auer, and A. Both. Agdistis - graph-based disambiguation of named entities using linked data. In The Semantic Web -- ISWC 2014. 2014. Google ScholarDigital Library

Index Terms

GERBIL: General Entity Annotator Benchmarking Framework
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources

Recommendations

GERBIL – Benchmarking Named Entity Recognition and Linking consistently

The ability to compare systems from the same domain is of central importance for their introduction into complex applications. In the domains of named entity recognition and entity linking, the large number of systems and their orthogonal evaluation ...
Read More
EUPEG: Towards an Extensible and Unified Platform for Evaluating Geoparsers
GIR'18: Proceedings of the 12th Workshop on Geographic Information Retrieval

Geoparsing, namely recognizing and geo-locating place mentions from unstructured texts, is a critical task in geographic information retrieval (GIR). While a number of geoparsers have been developed, they were often tested on different datasets using ...
Read More
Evaluating Entity Annotators Using GERBIL
The Semantic Web: ESWC 2015 Satellite Events
Abstract
The need to bridge between the unstructured data on the Document Web and the structured data on the Web of Data has led to the development of a considerable number of annotation tools. However, these tools are hard to compare due to the diversity ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '15: Proceedings of the 24th International Conference on World Wide Web
May 2015
1460 pages
ISBN:9781450334693
General Chairs:
Aldo Gangemi
National Research Council, Italy & Paris 13 University-CNRS, France
,
Stefano Leonardi
Sapienza University of Rome, Italy
,
Alessandro Panconesi
Sapienza University of Rome, Italy
Copyright © 2015 Copyright is held by the International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 18 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
archivability
benchmarking framework
reusability
semantic entity annotation system
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '15 Paper Acceptance Rate131of929submissions,14%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 103
  Total Citations
  View Citations
- 726
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GERBIL: General Entity Annotator Benchmarking Framework

WWW '15: Proceedings of the 24th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

GERBIL – Benchmarking Named Entity Recognition and Linking consistently

EUPEG: Towards an Extensible and Unified Platform for Evaluating Geoparsers

Evaluating Entity Annotators Using GERBIL