research-article

Lightweight web-based fact repositories for textual question answering

Author:

Marius PaşcaAuthors Info & Claims

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 87 - 96

https://doi.org/10.1145/1321440.1321455

Published: 06 November 2007 Publication History

Abstract

Since answers to fact-seeking questions usually reside within small factual text nuggets, often "hidden" within full-length documents, their relevance to a question is not necessarily correlated to the relevance of the full-length document to the question. Yet previous approaches to open-domain textual question answering from large document collections quasi-unanimously employ a document retrieval stage, in order to apply widely different, often expensive answer mining techniques to only a small subset of documents. Depending on the collection size, 95% or more of the documents in the collection (much more in the case of the Web) are left out of the selected subset for any given query, and thus become invisible to subsequent processing stages for actual answer mining. This paper introduces a new model for answer retrieval for question answering. The collection is distilled offline into large repositories of facts. Each fact constitutes a potential direct answer to questions seeking a particular kind of entity or relation, such as questions asking about the date of particular events. Question answering becomes equivalent to online fact retrieval, which greatly simplifies the de-facto system architecture for fact-seeking question answering. In addition to simplicity, experiments on a fact repository acquired from approximately a billion Web documents illustrate the impact of fact repositories in extracting accurate answers to a standard evaluation set of open-domain test questions and additional sets of domain-specific questions.

References

[1]

S. Abney, M. Collins, and A. Singhal. Answer extraction. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP-00), pages 296--301, Seattle, Washington, 2000.

Digital Library

[2]

J. Allan, V. Khandelwal, and R. Gupta. Temporal summaries of news topics. In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR-01), pages 10--18, New Orleans, Louisiana, 2001.

Digital Library

[3]

T. Brants. TnT - a statistical part of speech tagger. In Proceedings of the 6th Conference on Applied Natural Language Processing (ANLP-00), pages 224--231, Seattle, Washington, 2000.

Digital Library

[4]

S. Brin and L. Page. The anatomy of a large scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference, Brisbane, Australia, 1998.

Digital Library

[5]

H. Chieu and Y. Lee. Query based event extraction along a timeline. In Proceedings of the 27th ACM Conference on Research and Development in Information Retrieval (SIGIR-04), Sheffield, United Kingdom, 2004.

Digital Library

[6]

S. Dumais, M. Banko, E. Brill, J. Lin, and A. Ng. Web question answering: Is more always better? In Proceedings of the 24th ACM Conference on Research and Development in Information Retrieval (SIGIR-02), pages 207--214, Tampere, Finland, 2002.

Digital Library

[7]

A. Echihabi and D. Marcu. A noisy-channel approach to question answering. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 16--23, Sapporo, Japan, 2003.

Digital Library

[8]

O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. Weld, and A. Yates. Web-scale information extraction in KnowItAll. In Proceedings of the 13th World Wide Web Conference (WWW-04), pages 100--110, New York, 2004.

Digital Library

[9]

E. Filatova and E. Hovy. Assigning time-stamps to event-clauses. In Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL-01), pages 88--95, Toulouse, France, 2001.

Digital Library

[10]

M. Fleischman, E. Hovy, and A. Echihabi. Offline strategies for online question answering: Answering questions before they are asked. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL-03), pages 1--7, Sapporo, Japan, 2003.

Digital Library

[11]

M. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING-92), pages 539--545, Nantes, France, 1992.

Digital Library

[12]

J. Ko, T. Mitamura, and E. Nyberg. Language independent probabilistic answer ranking for question answering. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL-07), pages 784--791, Prague, Czech Republic, 2007.

[13]

J. Kupiec. MURAX: A robust linguistic approach for question answering using an on-line encyclopedia. In Proceedings of the 16th ACM Conference on Research and Development in Information Retrieval (SIGIR-93), pages 181--190, Pittsburgh, Philadelphia, 1993.

Digital Library

[14]

C. Kwok, O. Etzioni, and D. Weld. Scaling question answering to the web. ACM Transactions on Information Systems, 19(3):242--262, 2001.

Digital Library

[15]

J. Lin. An exploration of the principles underlying redundancy-based factoid question answering. ACM Transactions on Information Systems, 25(2), 2007.

Digital Library

[16]

J. Lin and B. Katz. Question answering from the Web using knowledge annotation and knowledge mining techniques. In Proceedings of the 12th International Conference on Information and Knowledge Management (CIKM-03), pages 116--123, New Orleans, Louisiana, 2003.

Digital Library

[17]

L. Lita and J. Carbonell. Instance-based question answering: A data driven approach. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04), pages 396--403, Barcelona, Spain, 2004.

[18]

J. Pustejovsky, J. Castano, R. Ingria, R. Sauri, R. Gaizauskas, A. Setzer, and G. Katz. TimeML: Robust specification of event and temporal expressions in text. In Proceedings of the 5th International Workshop on Computational Semantics (IWCS-5), Tilburg, Netherlands, 2003.

[19]

D. Radev, W. Fan, H. Qi, H. Wu, and A. Grewal. Probabilistic question answering on the web. Journal of the American Society for Information Science and Technology, 56(3), 2005.

Digital Library

[20]

D. Ravichandran and E. Hovy. Learning surface text patterns for a question answering system. In Proceedings of the 40th Annual Meeting of the Association of Computational Linguistics (ACL-02), Philadelphia, Pennsylvania, 2002.

Digital Library

[21]

E. Saquete, P. Martinez-Barco, R. Munoz, and J. Vicedo-Gonzalez. Splitting complex temporal questions for question answering systems. In Proceedings of the 42nd Annual Meeting of the Association of Computational Linguistics (ACL-04), pages 566--573, Barcelona, Spain, 2004.

Digital Library

[22]

S. Tellex, B. Katz, J. Lin, A. Fernandez, and G. Marton. Quantitative evaluation of passage retrieval algorithms for question answering. In Proceedings of the 26th ACM Conference on Research and Development in Information Retrieval (SIGIR-03), pages 41--47, Toronto, Canada, 2003.

Digital Library

[23]

E. Voorhees and D. Tice. Building a question-answering test collection. In Proceedings of the 23rd International Conference on Research and Development in Information Retrieval (SIGIR-00), pages 200--207, Athens, Greece, 2000.

Digital Library

[24]

Y. Wu, R. Zhang, X. Hu, and H. Kashioka. Learning unsupervised SVM classifier for answer selection in Web question answering. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL-07), pages 33--41, Prague, Czech Republic, 2007.

[25]

H. Yang and T. Chua. Web-based list question answering. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 1277--1283, Geneva, Switzerland, 2004.

Digital Library

Cited By

Bakır DAktas M(2022)A Systematic Literature Review of Question Answering: Research Trends, Datasets, MethodsComputational Science and Its Applications – ICCSA 2022 Workshops10.1007/978-3-031-10536-4_4(47-62)Online publication date: 4-Jul-2022
https://dl.acm.org/doi/10.1007/978-3-031-10536-4_4
(2010)ReferencesThe Handbook of Computational Linguistics and Natural Language Processing10.1002/9781444324044.refs(655-741)Online publication date: 29-Jun-2010
https://doi.org/10.1002/9781444324044.refs
Pera MQumsiyeh RShaikh MNg YChan CMitra P(2009)Retrieving good, better, and best answers to questions in advertisementsProceedings of the eleventh international workshop on Web information and data management10.1145/1651587.1651590(3-6)Online publication date: 2-Nov-2009
https://dl.acm.org/doi/10.1145/1651587.1651590
Show More Cited By

Index Terms

Lightweight web-based fact repositories for textual question answering
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Document representation

Recommendations

Towards temporal web search
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

This paper shifts the focus of Web search towards finding and exploiting small text nuggets, rather than full-length documents, assumming that the type of targeted information (e.g., date) is specified in the queries. Each nugget is a document sentence ...
An answer passage retrieval strategy for web-based question answering
InfoScale '07: Proceedings of the 2nd international conference on Scalable information systems

A passage retrieval strategy for our web-based Question Answering (QA) system is proposed in this paper. We utilize Google to retrieve web documents for answer passage finding. We propose a new method to rewrite the query for passage retrieval. We ...
Quality-aware collaborative question answering: methods and evaluation
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

Community Question Answering (QA) portals contain questions and answers contributed by hundreds of millions of users. These databases of questions and answers are of great value if they can be used directly to answer questions from any user. In this ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

November 2007

1048 pages

ISBN:9781595938039

DOI:10.1145/1321440

Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM07

Sponsor:

CIKM07: Conference on Information and Knowledge Management

November 6 - 10, 2007

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
695
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bakır DAktas M(2022)A Systematic Literature Review of Question Answering: Research Trends, Datasets, MethodsComputational Science and Its Applications – ICCSA 2022 Workshops10.1007/978-3-031-10536-4_4(47-62)Online publication date: 4-Jul-2022
https://dl.acm.org/doi/10.1007/978-3-031-10536-4_4
(2010)ReferencesThe Handbook of Computational Linguistics and Natural Language Processing10.1002/9781444324044.refs(655-741)Online publication date: 29-Jun-2010
https://doi.org/10.1002/9781444324044.refs
Pera MQumsiyeh RShaikh MNg YChan CMitra P(2009)Retrieving good, better, and best answers to questions in advertisementsProceedings of the eleventh international workshop on Web information and data management10.1145/1651587.1651590(3-6)Online publication date: 2-Nov-2009
https://dl.acm.org/doi/10.1145/1651587.1651590
Jatowt AKanazawa KOyama STanaka KHeath FRice-Lively MFuruta R(2009)Supporting analysis of future-related information in news archives and the webProceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries10.1145/1555400.1555420(115-124)Online publication date: 15-Jun-2009
https://dl.acm.org/doi/10.1145/1555400.1555420
Wu YKashioka H(2009)An Unsupervised Model of Exploiting the Web to Answer Definitional QuestionsProceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 0110.1109/WI-IAT.2009.12(28-33)Online publication date: 15-Sep-2009
https://dl.acm.org/doi/10.1109/WI-IAT.2009.12
Pasca MWainwright RHaddad H(2008)Towards temporal web searchProceedings of the 2008 ACM symposium on Applied computing10.1145/1363686.1363946(1117-1121)Online publication date: 16-Mar-2008
https://dl.acm.org/doi/10.1145/1363686.1363946

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten