Abstract
We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from existing information retrieval systems. The method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a training set of question/answer pairs, and automatically evaluating the candidate transformations on target information retrieval systems such as real-world general purpose search engines. At run-time, questions are transformed into a set of queries, and reranking is performed on the documents retrieved. We present a prototype search engine, Tritus, that applies the method to Web search engines. Blind evaluation on a set of real queries from a Web search engine log shows that the method significantly outperforms the underlying search engines, and outperforms a commercial search engine specializing in question answering. Our methodology cleanly supports combining documents retrieved from different search engines, resulting in additional improvement with a system that combines search results from multiple Web search engines.
- Abney, S., Collins, M., and Singhal, A. 2000. Answer extraction. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 296--301. Google Scholar
- Agichtein, E., Lawrence, S., and Gravano, L. 2001. Learning search engine specific query transformations for question answering. In Proceedings of the World Wide Web Conference (WWW-10). 169--178. Google Scholar
- Aliod, D., Berri, J., and Hess, M. 1998. A real world implementation of answer extraction. In Proceedings of the 9th International Workshop on Database and Expert Systems, Workshop: Natural Language and Information Systems (NLIS-98). 143--148. Google Scholar
- Berger, A., Caruana, R., Cohn, D., Freitag, D., and Mittal, V. O. 2000. Bridging the lexical chasm: statistical approaches to answer-finding. In Proceedings of the ACM SIGIR Conference. 192--199. Google Scholar
- Brill, E. 1992. A simple rule-based part of speech tagger. In Proceedings of the Applied Natural Language Processing Conference (ANLP-92). 152--155. Google Scholar
- Brill, E., Lin, J., Banko, M., Dumais, S., and Ng, A. 2001. Data-intensive question answering. In Proceedings of the TREC-10 Question Answering Track. 393--400.Google Scholar
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 1--7, 107--117. Google ScholarDigital Library
- Burke, R., Hammond, K., and Kozlovsky, J. 1995. Knowledge-based information retrieval for semi-structured text. In AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval. 19--24.Google Scholar
- Cardie, C., Ng, V., Pierce, D., and Buckley, C. 2000. Examining the role of statistical and linguistic knowledge sources in a general-knowledge question answering system. In Proceedings of the Applied Natural Language Processing Conference (ANLP-2000). 180--187. Google ScholarDigital Library
- Croft, W. B. 2000. Combining approaches to information retrieval. Advan. Info. Retrieval. 1--36.Google Scholar
- Glover, E., Flake, G., Lawrence, S., Birmingham, W. P., Kruger, A., Giles, C. L., and Pennock, D. 2001. Improving category specific Web search by learning query modifications. In Symposium on Applications and the Internet (SAINT-2001). 23--31. Google Scholar
- Harabagiu, S. M., Pasca, M. A., and Maiorano, S. J. 2000. Experiments with open-domain textual question answering. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 292--298. Google ScholarCross Ref
- Hawking, D., Craswell, N., Thistlewaite, P., and Harman, D. 1999. Results and challenges in Web search evaluation. Computer Networks (Amsterdam, Netherlands) 31, 11--16, 1321--1330. Google Scholar
- Hovy, E., Gerber, L., Hermjakob, U., Junk, M., and Lin, C.-Y. 2000. Question answering in Webclopedia. In Proceedings of the TREC-9 Question Answering Track. 655--672.Google Scholar
- Ittycheriah, A., Franz, M., Zhu, W.-J., and Ratnaparkhi, A. 2000. IBM's statistical question answering system. In Proceedings of the TREC-9 Question Answering Track. 231--234.Google Scholar
- Joho, H. and Sanderson, M. 2000. Retrieving descriptive phrases from large amounts of free text. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 180--186. Google Scholar
- Klavans, J. L. and Kan, M.-Y. 1998. Role of verbs in document analysis. In Proceedings of the International Conference on Computational Linguistics (COLING/ACL-98). 680--686. Google Scholar
- Kwok, C. C. T., Etzioni, O., and Weld, D. S. 2001. Scaling question answering to the Web. In Proceedings of the World Wide Web Conference (WWW-10). 150--161. Google Scholar
- Lawrence, S., Bollacker, K., and Giles, C. L. 1999. Indexing and retrieval of scientific literature. In Proceedings of the International Conference on Information and Knowledge Management (CIKM-99). 139--146. Google Scholar
- Lawrence, S. and Giles, C. L. 1998. Context and page analysis for improved web search. IEEE Internet Comput. 2, 4, 38--46. Google ScholarDigital Library
- Mann, G. 2002. Learning how to answer questions using trivia games. In Proceedings of the International Conference on Computational Linguistics (COLING-2002). Google Scholar
- Miller, G. A. 1995. Wordnet: A lexical database for English. Comm. ACM. 39--41. Google Scholar
- Mitra, M., Singhal, A., and Buckley, C. 1998. Improving automatic query expansion. In Proceedings of the ACM SIGIR Conference. 206--214. Google Scholar
- Moldovan, D., Harabagiu, S., Pasca, M., Mihalcea, R., Goodrum, R., Girju, R., and Rus, V. 1999. Lasso: A tool for surfing the answer net. In Proceedings of the TREC-8 Question Answering Track. 175--184.Google Scholar
- Prager, J., Chu-Caroll, J., and Czuba, K. 2002. Statistical answer-type identification in open-domain question answering. In Proceedings of the Human Language Technology Conference (HLT-2002). 137--143. Google Scholar
- Radev, D., Fan, W., Qi, H., Wu, H., and Grewal, A. 2002. Probabilistic question answering on the Web. In Proceedings of the World Wide Web Conference (WWW-2002). 408--419. Google Scholar
- Radev, D. R., Qi, H., Zheng, Z., Blair-Goldensohn, S., Fan, Z. Z. W., and Prager, J. M. 2001. Mining the web for answers to natural language questions. In Proceedings of the International Conference on Knowledge Management (CIKM-2001). 143--150. Google Scholar
- Robertson, S. 1990. On term selection for query expansion. In J. Document. 46, 359--364. Google ScholarDigital Library
- Robertson, S. and Sparck-Jones, K. 1976. Relevance weighting of search terms. J. Amer. Soc. Info. Sci. 27, 129--146.Google ScholarCross Ref
- Robertson, S. and Walker, S. 1997. On relevance weights with little relevance information. In Proceedings of the ACM SIGIR Conference. 16--24. Google Scholar
- Robertson, S., Walker, S., and Beaulieu, M. 1998. Okapi at TREC-7: automatic ad hoc, filtering, VLC and interactive track. In TREC-7 Proceedings. 253--264.Google Scholar
- Rocchio, J. 1971. Relevance feedback in information retrieval, G. Salton, Ed. The SMART Retrieval System--Experiments in Automatic Document Processing. 313--323.Google Scholar
- Salton, G. 1989. Automatic Text Processing: The transformation, analysis, and retrieval of information by computer. Addison-Wesley. Google Scholar
- Schiffman, B. and McKeown, K. R. 2000. Experiments in automated lexicon building for text searching. In Proceedings of the International Conference on Computational Linguistics (COLING-2000). 719--725. Google ScholarDigital Library
- Spink, A., Milchak, S., Sollenberger, M., and Hurson, A. 2000. Elicitation queries to the Excite Web search engine. In Proceedings of the International Conference on Knowledge Management (CIKM-2000). 134--140. Google Scholar
- Voorhees, E. 1999a. Overview of the Eighth Text REtrieval Conference (TREC-8). In Proceedings of TREC-8. 1--24.Google ScholarCross Ref
- Voorhees, E. 1999b. The TREC-8 question answering track report. In Proceedings of TREC-8. 77--82.Google Scholar
- Voorhees, E. 2000. Overview of the TREC-9 question answering track. In Proceedings of TREC-9. 71--80.Google Scholar
- Voorhees, E. 2001. Overview of the TREC-2001 question answering track. In Proceedings of TREC-10. 42--51.Google Scholar
- Voorhees, E. and Tice, D. M. 1999. The TREC-8 question answering track evaluation. In Proceedings of TREC-8. 84--106.Google Scholar
- Xu, J. and Croft, W. B. 2000. Improving the effectiveness of information retrieval with local context analysis. ACM Trans. Info. Syst. (TOIS) 18, 1, 79--112. Google ScholarDigital Library
Index Terms
- Learning to find answers to questions on the Web
Recommendations
Identifying popular search goals behind search queries to improve web search ranking
AIRS'11: Proceedings of the 7th Asia conference on Information Retrieval TechnologyWeb users usually have a certain search goal before they submit a search query. However, many laypersons can't transform their search goals into suitable queries. Thus, understanding original search goals behind a query is very important for search ...
Searching with context
WWW '06: Proceedings of the 15th international conference on World Wide WebContextual search refers to proactively capturing the information need of a user by automatically augmenting the user query with information extracted from the search context; for example, by using terms from the web page the user is currently browsing ...
Query routing for Web search engines: architecture and experiments
AbstractGeneral-purpose search engines such as AltaVista and Lycos are notorious for returning irrelevant results in response to user queries. Consequently, thousands of specialized, topic-specific search engines (from VacationSpot.com to ...
Comments