skip to main content
10.1145/2611040.2611085acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Configuring Named Entity Extraction through Real-Time Exploitation of Linked Data

Published: 02 June 2014 Publication History

Abstract

Named Entity Extraction is the process of identifying entities (like persons, locations, organizations, etc.) in texts and linking them to related semantic resources. This task is useful in several applications, e.g. for question answering, annotating documents, post-processing of search results, etc. However, existing named entity extraction tools lack an open or easy configuration, although this is very important for building domain-specific applications. For example, supporting a new category of entities, or updating an existing category with additional entities, is either impossible or very laborious. In this paper we show how we can exploit semantic information (Linked Data) at real-time for configuring (handily) a named entity extraction system. We also present X-Link, a fully configurable named entity extraction tool that realizes this approach. Contrary to the existing tools, X-Link allows the user to easily define the categories of entities that are interesting for the application at hand by exploiting one or more (on-line) semantic Knowledge Bases. The user is also able to update a category and specify how to semantically link and enrich the identified entities. This enhanced configurability allows X-Link to be configured for different contexts, for building domain-specific applications (e.g. for identifying drugs in a medical search system or for annotating and exploring fish species in a marine-related web page). To test the approach, we conducted a task-based evaluation with users that demonstrates the usability of the proposed approach, and a case study that demonstrates its feasibility.

References

[1]
AlchemyAPI. http://www.alchemyapi.com/.
[2]
FAO Fisheries Linked Open Data. http://www.fao.org/figis/flod/.
[3]
Lupedia Enrichment Service, Ontotext. http://lupedia.ontotext.com/.
[4]
OpenCalais, Thomson Reuters. http://www.opencalais.com/.
[5]
SPARQL 1.1 Federated Query, W3C Recommendation, 21 March 2013. http://www.w3.org/TR/sparql11-federated-query/.
[6]
Wikimeta. http://www.wikimeta.com/.
[7]
O. Ambrus, K. Möller, and S. Handschuh. Konduit VQB: a Visual Query Builder for SPARQL on the Social Semantic Desktop. In Workshop on Visual Interfaces to the Social and Semantic Web, 2010.
[8]
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives. Dbpedia: A Nucleus for a Web of Open Data. In The Semantic Web, pages 722--735. Springer, 2007.
[9]
B. Bishop, A. Kiryakov, D. Ognyanov, I. Peikov, Z. Tashev, and R. Velkov. Factforge: A Fast Track to the Web of Data. Semantic Web, 2(2):157--166, 2011.
[10]
C. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The Story so Far. International Journal on Semantic Web and Information Systems (IJSWIS), 5(3):1--22, 2009.
[11]
K. Bontcheva, V. Tablan, D. Maynard, and H. Cunningham. Evolving GATE to Meet New Challenges in Language Engineering. Natural Language Engineering, 10(3-4):349--373, 2004.
[12]
V. Cardellini, M. Colajanni, and P. S. Yu. Dynamic Load Balancing on Web-Server Systems. Internet Computing, IEEE, 3(3):28--39, 1999.
[13]
E. Charton, M. Gagnon, and B. Ozell. Automatic Semantic Web Annotation of Named Entities. In Advances in Artificial Intelligence, pages 74--85. Springer, 2011.
[14]
H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), 2002.
[15]
M. T. Enrico Franconi, Paolo Guagliardo. Quelo: a NL-based Intelligent Query Interface. In Procs of the Second Workshop on Controlled Natural Languages (CNL 2010), 2010.
[16]
P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis, and Y. Tzitzikas. Web Searching with Entity Mining at Query Time. In Proceedings of the 5th Information Retrieval Facility Conference, 2012.
[17]
P. Fafalios, M. Salampasis, and Y. Tzitzikas. Exploratory Patent Search with Faceted Search and Configurable Entity Mining. In Proceedings of the 1st International Workshop on Integrating IR technologies for Professional Search (ECIR'13 Workshop), 2013.
[18]
P. Fafalios and Y. Tzitzikas. X-ENS: Semantic Enrichment of Web Search Results at Real-Time. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 28 - August 01 2013.
[19]
L. Faulkner. Beyond the Five-User Assumption: Benefits of Increased Sample Sizes in Usability Testing. Behavior Research Methods, Instruments, & Computers, 35(3):379--383, 2003.
[20]
M. Gagnon, A. Zouaq, and L. Jean-Louis. Can We Use Linked Data Semantic Annotators for the Extraction of Domain-Relevant Expressions? In Proceedings of the 22nd international conference on World Wide Web companion, pages 1239--1246. International World Wide Web Conferences Steering Committee, 2013.
[21]
J. Hoffart, F. M. Suchanek, K. Berberich, E. Lewis-Kelham, G. De Melo, and G. Weikum. YAGO2: Exploring and Querying World Knowledge in Time, Space, Context, and Many Languages. In Proceedings of the 20th international conference companion on World wide web, pages 229--232. ACM, 2011.
[22]
S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti. Collective Annotation of Wikipedia Entities in Web Text. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 457--466. ACM, 2009.
[23]
P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia Spotlight: Shedding Light on the Web of Documents. In Proceedings of the 7th International Conference on Semantic Systems, pages 1--8. ACM, 2011.
[24]
D. Mollá, M. Van Zaanen, and D. Smith. Named Entity Recognition for Question Answering. Proceedings of ALTW, pages 51--58, 2006.
[25]
G. Navarro. A Guided Tour to Approximate String Matching. ACM computing surveys (CSUR), 33(1):31--88, 2001.
[26]
G. Rizzo and R. Troncy. NERD: Evaluating Named Entity Recognition Tools in the Web of Data. In ISWC 2011, Workshop on Web Scale Knowledge Extraction (WEKEX'11), October 23-27, 2011, Bonn, Germany, Bonn, GERMANY, 10 2011.
[27]
G. Rizzo and R. Troncy. NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Extraction Tools. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics, pages 73--76. Association for Computational Linguistics, 2012.
[28]
G. Rizzo, R. Troncy, S. Hellmann, and M. Bruemmer. NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud. LDOW, 937, 2012.
[29]
A. Russell, P. R. Smart, D. Braines, and N. R. Shadbolt. NITELIGHT: A Graphical Tool for Semantic Query Construction. In Semantic Web User Interaction Workshop (SWUI 2008), April 2008.
[30]
Y. Tzitzikas, C. Alloca, C. Bekiari, Y. Marketakis, P. Fafalios, M. Doerr, N. Minadakis, T. Patkos, and L. Candela. Integrating Heterogeneous and Distributed Information about Marine Species through a Top Level Ontology. In Proceedings of the 7th Metadata and Semantic Research Conference (MTSR'13), Thessaloniki, Greece, November 2013.
[31]
J. Umbrich, M. Karnstedt, A. Hogan, and J. X. Parreira. Hybrid SPARQL Queries: Fresh vs. Fast Results. In The Semantic Web--ISWC 2012, pages 608--624. Springer, 2012.
[32]
R. A. Virzi. Refining the Test Phase of Usability Evaluation: How Many Subjects is Enough? Human Factors: The Journal of the Human Factors and Ergonomics Society, 34(4):457--468, 1992.
[33]
M. A. Yosef, J. Hoffart, I. Bordino, M. Spaniol, and G. Weikum. AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables. Proceedings of the VLDB Endowment, 4(12):1450--1453, 2011.

Cited By

View all
  • (2017)How Linked Data can Aid Machine Learning-Based TasksResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-67008-9_13(155-168)Online publication date: 2-Sep-2017
  • (2017)Stochastic reranking of biomedical search results based on extracted entitiesJournal of the Association for Information Science and Technology10.1002/asi.2387768:11(2572-2586)Online publication date: 1-Nov-2017
  • (2014)NaviSoc: A Socially Enhanced Real-Time Navigator2014 IEEE International Conference on Data Mining Workshop10.1109/ICDMW.2014.77(797-803)Online publication date: Dec-2014

Index Terms

  1. Configuring Named Entity Extraction through Real-Time Exploitation of Linked Data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)
      June 2014
      506 pages
      ISBN:9781450325387
      DOI:10.1145/2611040
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      In-Cooperation

      • Aristotle University of Thessaloniki

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 June 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Entity Mining
      2. Linked Data
      3. Named Entity Extraction

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Conference

      WIMS '14

      Acceptance Rates

      WIMS '14 Paper Acceptance Rate 41 of 90 submissions, 46%;
      Overall Acceptance Rate 140 of 278 submissions, 50%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)How Linked Data can Aid Machine Learning-Based TasksResearch and Advanced Technology for Digital Libraries10.1007/978-3-319-67008-9_13(155-168)Online publication date: 2-Sep-2017
      • (2017)Stochastic reranking of biomedical search results based on extracted entitiesJournal of the Association for Information Science and Technology10.1002/asi.2387768:11(2572-2586)Online publication date: 1-Nov-2017
      • (2014)NaviSoc: A Socially Enhanced Real-Time Navigator2014 IEEE International Conference on Data Mining Workshop10.1109/ICDMW.2014.77(797-803)Online publication date: Dec-2014

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media