skip to main content
10.1145/1321440.1321512acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections

An experimental study of the impact of information extraction accuracy on semantic search performance

Published: 06 November 2007 Publication History


Researchers have shown that various natural language processing techniques can be used in document analysis to impact search performance. For the most part, they examined how an analysis system with certain performance characteristics can be leveraged to improve document and/or passage search results. We have previously shown that semantic queries which utilize named entity and relation information extracted from the corpus can lead to significant improvement in search performance. In this paper, we extend our previous efforts and examine how search performance degrades in the face of imperfect named entity and relation extraction. Our study was carried out by developing gold standard annotated corpora and applying different error models to the gold standard annotations to simulate errors made by automatic recognizers. We identify automatic recognizer characteristics that make them more amenable to our search tasks, show that recognizer recall has more significant impact on semantic search performance than its precision, and demonstrate that significant improvement in both MAP and Exact Precision scores can be achieved by adopting automatic named entity and relation recognizers with near state-of-the-art performance.


A. Z. Broder, Y. S. Maarek, M. Mandelbrod, and Y. Mass. Using XML to query XML - from theory to practice. In Proceedings of RIAO, 2004.
D. Carmel, Y. S. Maarek, M. Mandelbrod, Y. Mass, and A. Soffer. Searching XML documents via XML fragments. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2003.
J. Chu-Carroll, J. Prager, K. Czuba, D. Ferrucci, and P. Duboue. Semantic search via XML fragments: A high-precision approach to IR. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006.
W. B. Croft and D. D. Lewis. An approach to natural language processing for document retrieval. In Proceedings of the Tenth ACM SIGIR Conference, pages 26--32, 1987.
A. Culotta and J. Sorensen. Dependency tree kernels for relation extraction. In Proceedings of the 42nd Annual Meeting of the ACL, 2004.
R. Florian, H. Jing, N. Kambhatla, and I. Zitouni. Factorizing complex models: A case study in mention detection. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 473--480, 2006.
J. Gonzalo, F. Verdejo, I. Chugur, and J. Cigarran. Indexing with WordNet synsets can improve text retrieval. In Proceedings of the COLING/ACL Workshop on Usage of WordNet for NLP, 1998.
B. Katz and J. Lin. Selectively using relations to improve precision in question answering. In Proceedings of the EACL Workshop on Natural Language Processing for Question Answering, 2003.
R. Mihalcea and D. Moldovan. Semantic indexing using WordNet senses. In Proceedings of the ACL Workshop on IR and NLP, 2000.
R. Mihalcea and D. Moldovan. Document indexing using named entities. Studies in Informatics and Control, 10(1), 2001.
J. Prager, E. Brown, A. Coden, and D. Radev. Question-answering by predictive annotation. In Proceedings of the 23rd SIGIR Conference, 2000.
D. Roth and W.-T. Yih. A linear programming formulation for global inference in natural language tasks. In Proceedings of CoNLL-2004, pages 1--8. Boston, MA, USA, 2004.
M. Sanderson. Word sense disambiguation and information retrieval. In Proceedings of the 17th ACM SIGIR Conference, 1994.
A. F. Smeaton, R. O'Donnell, and F. Kelledy. Indexing structures derived from syntax in TREC-3: System description. In Proceedings of the 3rd Text REtrieval Conference, 1995.
J. Tiedemann. Integrating linguistic knowledge in passage retrieval for question answering. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 2005.
E. F. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In W. Daelemans and M. Osborne, editors, Proceedings of CoNLL-2003, pages 142--147. Edmonton, Canada, 2003.
E. M. Voorhees. Using WordNet to disambiguate word sense for text retrieval. In Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993.
E. M. Voorhees and H. T. Dang. Overview of the TREC 2005 question answering track. In Proceedings of the TREC 2005 Conference, 2006.
C. Zhai, X. Tong, N. Milic-Frayling, and D. Evans. Evaluation of syntactic phrase indexing - CLARIT NLP track report. In Proceedings of the 5th TExt Retrieval Conference, 1997.
S. Zhao and R. Grishman. Extracting relations with integrated information using kernel methods. In Proceedings of the 43rd Annual Meeting of the ACL, pages 419--426, 2005.

Cited By

View all
  • (2023)AI Driving Game Changing Trends in Project Delivery and Enterprise PerformanceProceedings of World Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-99-5881-8_4(35-49)Online publication date: 2-Nov-2023
  • (2015)Cost-Effective Conceptual Design for Information ExtractionACM Transactions on Database Systems10.1145/271632140:2(1-39)Online publication date: 30-Jun-2015
  • (2014)Exploiting semantic annotations for entity-based information retrievalProceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 127210.5555/2878453.2878561(429-432)Online publication date: 21-Oct-2014
  • Show More Cited By

Index Terms

  1. An experimental study of the impact of information extraction accuracy on semantic search performance



    Information & Contributors


    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007


    Request permissions for this article.

    Check for updates

    Author Tags

    1. named entity recognition
    2. relationship recognition
    3. semantic search


    • Research-article



    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 22 Feb 2025

    Other Metrics


    Cited By

    View all
    • (2023)AI Driving Game Changing Trends in Project Delivery and Enterprise PerformanceProceedings of World Conference on Artificial Intelligence: Advances and Applications10.1007/978-981-99-5881-8_4(35-49)Online publication date: 2-Nov-2023
    • (2015)Cost-Effective Conceptual Design for Information ExtractionACM Transactions on Database Systems10.1145/271632140:2(1-39)Online publication date: 30-Jun-2015
    • (2014)Exploiting semantic annotations for entity-based information retrievalProceedings of the 2014 International Conference on Posters & Demonstrations Track - Volume 127210.5555/2878453.2878561(429-432)Online publication date: 21-Oct-2014
    • (2014)Question AnsweringNatural Language Processing of Semitic Languages10.1007/978-3-642-45358-8_11(335-370)Online publication date: 25-Mar-2014
    • (2014)Kuphi – an Investigation Tool for Searching for and via Semantic RelationsThe Semantic Web: ESWC 2014 Satellite Events10.1007/978-3-319-11955-7_47(349-354)Online publication date: 16-Oct-2014
    • (2014)New Dimensions in Semantic Knowledge ManagementTowards the Internet of Services: The THESEUS Research Program10.1007/978-3-319-06755-1_4(37-50)Online publication date: 2-Jul-2014
    • (2013)Repeatable and reliable semantic search evaluationWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2013.05.00521(14-29)Online publication date: 1-Aug-2013
    • (2013)Collaboratively built semi-structured content and Artificial IntelligenceArtificial Intelligence10.1016/j.artint.2012.10.002194(2-27)Online publication date: 1-Jan-2013
    • (2012)On the Voice-Activated Question AnsweringIEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews10.1109/TSMCC.2010.208962042:1(75-85)Online publication date: 1-Jan-2012
    • (2011)Language modelization and categorization for voice-activated QAProceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications10.1007/978-3-642-25085-9_56(475-482)Online publication date: 15-Nov-2011
    • Show More Cited By

    View Options

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.







    Share this Publication link

    Share on social media