skip to main content
article

Learning to find answers to questions on the Web

Published:01 May 2004Publication History
Skip Abstract Section

Abstract

We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from existing information retrieval systems. The method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transformations on target information retrieval systems such as real-world general purpose search engines. At run-time, questions are transformed into a set of queries, and reranking is performed on the documents retrieved. We present a prototype search engine, Tritus, that applies the method to Web search engines. Blind evaluation on a set of real queries from a Web search engine log shows that the method significantly outperforms the underlying search engines, and outperforms a commercial search engine specializing in question answering. Our methodology cleanly supports combining documents retrieved from different search engines, resulting in additional improvement with a system that combines search results from multiple Web search engines.

References

  1. Abney, S., Collins, M., and Singhal, A. 2000. Answer extraction. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 296--301. Google ScholarGoogle Scholar
  2. Agichtein, E., Lawrence, S., and Gravano, L. 2001. Learning search engine specific query transformations for question answering. In Proceedings of the World Wide Web Conference (WWW-10). 169--178. Google ScholarGoogle Scholar
  3. Aliod, D., Berri, J., and Hess, M. 1998. A real world implementation of answer extraction. In Proceedings of the 9th International Workshop on Database and Expert Systems, Workshop: Natural Language and Information Systems (NLIS-98). 143--148. Google ScholarGoogle Scholar
  4. Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. O. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference. 192--199. Google ScholarGoogle Scholar
  5. Brill, E. 1992. A simple rule-based part of speech tagger. In Proceedings of the Applied Natural Language Processing Conference (ANLP-92). 152--155. Google ScholarGoogle Scholar
  6. Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the TREC-10 Question Answering Track. 393--400.Google ScholarGoogle Scholar
  7. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Burke, R., Hammond, K., and Kozlovsky, J. 1995. Knowledge-based information retrieval for semi-structured text. In AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval. 19--24.Google ScholarGoogle Scholar
  9. Cardie, C., Ng, V., Pierce, D., and Buckley, C. 2000. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question answering system. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 180--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Croft, W. B. 2000. Combining approaches to information retrieval. Advan. Info. Retrieval. 1--36.Google ScholarGoogle Scholar
  11. Glover, E., Flake, G., Lawrence, S., Birmingham, W. P., Kruger, A., Giles, C. L., and Pennock, D. 2001. Improving category specific Web search by learning query modifications. In Symposium on Applications and the Internet (SAINT-2001). 23--31. Google ScholarGoogle Scholar
  12. Harabagiu, S. M., Pasca, M. A., and Maiorano, S. J. 2000. Experiments with open-domain textual question answering. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 292--298. Google ScholarGoogle ScholarCross RefCross Ref
  13. Hawking, D., Craswell, N., Thistlewaite, P., and Harman, D. 1999. Results and challenges in Web search evaluation. Computer Networks (Amsterdam, Netherlands) 31, 11--16, 1321--1330. Google ScholarGoogle Scholar
  14. Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the TREC-9 Question Answering Track. 655--672.Google ScholarGoogle Scholar
  15. Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the TREC-9 Question Answering Track. 231--234.Google ScholarGoogle Scholar
  16. Joho, H. and Sanderson, M. 2000. Retrieving descriptive phrases from large amounts of free text. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 180--186. Google ScholarGoogle Scholar
  17. Klavans, J. L. and Kan, M.-Y. 1998. Role of verbs in document analysis. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). 680--686. Google ScholarGoogle Scholar
  18. Kwok, C. C. T., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. In Proceedings of the World Wide Web Conference (WWW-10). 150--161. Google ScholarGoogle Scholar
  19. Lawrence, S., Bollacker, K., and Giles, C. L. 1999. Indexing and retrieval of scientific literature. In Proceedings of the International Conference on Information and Knowledge Management (CIKM-99). 139--146. Google ScholarGoogle Scholar
  20. Lawrence, S. and Giles, C. L. 1998. Context and page analysis for improved web search. IEEE Internet Comput. 2, 4, 38--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mann, G. 2002. Learning how to answer questions using trivia games. In Proceedings of the International Conference on Computational Linguistics (COLING-2002). Google ScholarGoogle Scholar
  22. Miller, G. A. 1995. Wordnet: A lexical database for English. Comm. ACM. 39--41. Google ScholarGoogle Scholar
  23. Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference. 206--214. Google ScholarGoogle Scholar
  24. Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V. 1999. Lasso: A tool for surfing the answer net. In Proceedings of the TREC-8 Question Answering Track. 175--184.Google ScholarGoogle Scholar
  25. Prager, J., Chu-Caroll, J., and Czuba, K. 2002. Statistical answer-type identification in open-domain question answering. In Proceedings of the Human Language Technology Conference (HLT-2002). 137--143. Google ScholarGoogle Scholar
  26. Radev, D., Fan, W., Qi, H., Wu, H., and Grewal, A. 2002. Probabilistic question answering on the Web. In Proceedings of the World Wide Web Conference (WWW-2002). 408--419. Google ScholarGoogle Scholar
  27. Radev, D. R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Fan, Z. Z. W., and Prager, J. M. 2001. Mining the web for answers to natural language questions. In Proceedings of the International Conference on Knowledge Management (CIKM-2001). 143--150. Google ScholarGoogle Scholar
  28. Robertson, S. 1990. On term selection for query expansion. In J. Document. 46, 359--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robertson, S. and Sparck-Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 129--146.Google ScholarGoogle ScholarCross RefCross Ref
  30. Robertson, S. and Walker, S. 1997. On relevance weights with little relevance information. In Proceedings of the ACM SIGIR Conference. 16--24. Google ScholarGoogle Scholar
  31. Robertson, S., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In TREC-7 Proceedings. 253--264.Google ScholarGoogle Scholar
  32. Rocchio, J. 1971. Relevance feedback in information retrieval, G. Salton, Ed. The SMART Retrieval System--Experiments in Automatic Document Processing. 313--323.Google ScholarGoogle Scholar
  33. Salton, G. 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley. Google ScholarGoogle Scholar
  34. Schiffman, B. and McKeown, K. R. 2000. Experiments in automated lexicon building for text searching. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 719--725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Spink, A., Milchak, S., Sollenberger, M., and Hurson, A. 2000. Elicitation queries to the Excite Web search engine. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 134--140. Google ScholarGoogle Scholar
  36. Voorhees, E. 1999a. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8. 1--24.Google ScholarGoogle ScholarCross RefCross Ref
  37. Voorhees, E. 1999b. The TREC-8 question answering track report. In Proceedings of TREC-8. 77--82.Google ScholarGoogle Scholar
  38. Voorhees, E. 2000. Overview of the TREC-9 question answering track. In Proceedings of TREC-9. 71--80.Google ScholarGoogle Scholar
  39. Voorhees, E. 2001. Overview of the TREC-2001 question answering track. In Proceedings of TREC-10. 42--51.Google ScholarGoogle Scholar
  40. Voorhees, E. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of TREC-8. 84--106.Google ScholarGoogle Scholar
  41. Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Info. Syst. (TOIS) 18, 1, 79--112. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Learning to find answers to questions on the Web

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  • Published in

                    cover image ACM Transactions on Internet Technology
                    ACM Transactions on Internet Technology  Volume 4, Issue 2
                    May 2004
                    113 pages
                    ISSN:1533-5399
                    EISSN:1557-6051
                    DOI:10.1145/990301
                    Issue’s Table of Contents

                    Copyright © 2004 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 1 May 2004
                    Published in toit Volume 4, Issue 2

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • article

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader