article

Learning to find answers to questions on the Web

Authors:
Eugene Agichtein

Columbia University

Columbia University
View Profile

,
Steve Lawrence

NEC Research Institute

NEC Research Institute
View Profile

,
Luis Gravano

Columbia University

Columbia University
View Profile

Authors Info & Claims

ACM Transactions on Internet Technology Volume 4 Issue 2pp 129–162https://doi.org/10.1145/990301.990303

Published:01 May 2004Publication History

ACM Transactions on Internet Technology

Abstract

We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from existing information retrieval systems. The method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transformations on target information retrieval systems such as real-world general purpose search engines. At run-time, questions are transformed into a set of queries, and reranking is performed on the documents retrieved. We present a prototype search engine, Tritus, that applies the method to Web search engines. Blind evaluation on a set of real queries from a Web search engine log shows that the method significantly outperforms the underlying search engines, and outperforms a commercial search engine specializing in question answering. Our methodology cleanly supports combining documents retrieved from different search engines, resulting in additional improvement with a system that combines search results from multiple Web search engines.

References

Abney, S., Collins, M., and Singhal, A. 2000. Answer extraction. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 296--301. Google Scholar
Agichtein, E., Lawrence, S., and Gravano, L. 2001. Learning search engine specific query transformations for question answering. In Proceedings of the World Wide Web Conference (WWW-10). 169--178. Google Scholar
Aliod, D., Berri, J., and Hess, M. 1998. A real world implementation of answer extraction. In Proceedings of the 9th International Workshop on Database and Expert Systems, Workshop: Natural Language and Information Systems (NLIS-98). 143--148. Google Scholar
Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. O. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference. 192--199. Google Scholar
Brill, E. 1992. A simple rule-based part of speech tagger. In Proceedings of the Applied Natural Language Processing Conference (ANLP-92). 152--155. Google Scholar
Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the TREC-10 Question Answering Track. 393--400.Google Scholar
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarDigital Library
Burke, R., Hammond, K., and Kozlovsky, J. 1995. Knowledge-based information retrieval for semi-structured text. In AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval. 19--24.Google Scholar
Cardie, C., Ng, V., Pierce, D., and Buckley, C. 2000. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question answering system. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 180--187. Google ScholarDigital Library
Croft, W. B. 2000. Combining approaches to information retrieval. Advan. Info. Retrieval. 1--36.Google Scholar
Glover, E., Flake, G., Lawrence, S., Birmingham, W. P., Kruger, A., Giles, C. L., and Pennock, D. 2001. Improving category specific Web search by learning query modifications. In Symposium on Applications and the Internet (SAINT-2001). 23--31. Google Scholar
Harabagiu, S. M., Pasca, M. A., and Maiorano, S. J. 2000. Experiments with open-domain textual question answering. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 292--298. Google ScholarCross Ref
Hawking, D., Craswell, N., Thistlewaite, P., and Harman, D. 1999. Results and challenges in Web search evaluation. Computer Networks (Amsterdam, Netherlands) 31, 11--16, 1321--1330. Google Scholar
Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the TREC-9 Question Answering Track. 655--672.Google Scholar
Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the TREC-9 Question Answering Track. 231--234.Google Scholar
Joho, H. and Sanderson, M. 2000. Retrieving descriptive phrases from large amounts of free text. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 180--186. Google Scholar
Klavans, J. L. and Kan, M.-Y. 1998. Role of verbs in document analysis. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). 680--686. Google Scholar
Kwok, C. C. T., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. In Proceedings of the World Wide Web Conference (WWW-10). 150--161. Google Scholar
Lawrence, S., Bollacker, K., and Giles, C. L. 1999. Indexing and retrieval of scientific literature. In Proceedings of the International Conference on Information and Knowledge Management (CIKM-99). 139--146. Google Scholar
Lawrence, S. and Giles, C. L. 1998. Context and page analysis for improved web search. IEEE Internet Comput. 2, 4, 38--46. Google ScholarDigital Library
Mann, G. 2002. Learning how to answer questions using trivia games. In Proceedings of the International Conference on Computational Linguistics (COLING-2002). Google Scholar
Miller, G. A. 1995. Wordnet: A lexical database for English. Comm. ACM. 39--41. Google Scholar
Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference. 206--214. Google Scholar
Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V. 1999. Lasso: A tool for surfing the answer net. In Proceedings of the TREC-8 Question Answering Track. 175--184.Google Scholar
Prager, J., Chu-Caroll, J., and Czuba, K. 2002. Statistical answer-type identification in open-domain question answering. In Proceedings of the Human Language Technology Conference (HLT-2002). 137--143. Google Scholar
Radev, D., Fan, W., Qi, H., Wu, H., and Grewal, A. 2002. Probabilistic question answering on the Web. In Proceedings of the World Wide Web Conference (WWW-2002). 408--419. Google Scholar
Radev, D. R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Fan, Z. Z. W., and Prager, J. M. 2001. Mining the web for answers to natural language questions. In Proceedings of the International Conference on Knowledge Management (CIKM-2001). 143--150. Google Scholar
Robertson, S. 1990. On term selection for query expansion. In J. Document. 46, 359--364. Google ScholarDigital Library
Robertson, S. and Sparck-Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 129--146.Google ScholarCross Ref
Robertson, S. and Walker, S. 1997. On relevance weights with little relevance information. In Proceedings of the ACM SIGIR Conference. 16--24. Google Scholar
Robertson, S., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In TREC-7 Proceedings. 253--264.Google Scholar
Rocchio, J. 1971. Relevance feedback in information retrieval, G. Salton, Ed. The SMART Retrieval System--Experiments in Automatic Document Processing. 313--323.Google Scholar
Salton, G. 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley. Google Scholar
Schiffman, B. and McKeown, K. R. 2000. Experiments in automated lexicon building for text searching. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 719--725. Google ScholarDigital Library
Spink, A., Milchak, S., Sollenberger, M., and Hurson, A. 2000. Elicitation queries to the Excite Web search engine. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 134--140. Google Scholar
Voorhees, E. 1999a. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8. 1--24.Google ScholarCross Ref
Voorhees, E. 1999b. The TREC-8 question answering track report. In Proceedings of TREC-8. 77--82.Google Scholar
Voorhees, E. 2000. Overview of the TREC-9 question answering track. In Proceedings of TREC-9. 71--80.Google Scholar
Voorhees, E. 2001. Overview of the TREC-2001 question answering track. In Proceedings of TREC-10. 42--51.Google Scholar
Voorhees, E. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of TREC-8. 84--106.Google Scholar
Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Info. Syst. (TOIS) 18, 1, 79--112. Google ScholarDigital Library

Index Terms

Learning to find answers to questions on the Web
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems

Recommendations

Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval Technology

Web users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Read More
Searching with context
WWW '06: Proceedings of the 15th international conference on World Wide Web

Contextual search refers to proactively capturing the information need of a user by automatically augmenting the user query with information extracted from the search context; for example, by using terms from the web page the user is currently browsing ...
Read More
Query routing for Web search engines: architecture and experiments
Abstract
General-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Internet Technology Volume 4, Issue 2
May 2004
113 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/990301
Issue’s Table of Contents

Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 May 2004
Published in toit Volume 4, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Web search
information retrieval
meta-search
query expansion
question answering
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 55
  Total Citations
  View Citations
- 2,863
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to find answers to questions on the Web

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Searching with context

Query routing for Web search engines: architecture and experiments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning to find answers to questions on the Web

ACM Transactions on Internet Technology

Abstract

References

Cited By

Index Terms

Recommendations

Identifying popular search goals behind search queries to improve web search ranking

Searching with context

Query routing for Web search engines: architecture and experiments

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media