ABSTRACT
Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
- G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Query difficulty, robustness and selective application of query expansion. In ECIR, pages 127--137, 2004.Google ScholarCross Ref
- L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of Web spam. In AIRWeb, 2006.Google Scholar
- A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
- D. Carmel and E. Yom-Tov. Estimating the query difficulty for information retrieval. In SIGIR, page 911, 2010. Google ScholarDigital Library
- B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In CIKM, pages 1287--1296, 2009. Google ScholarDigital Library
- B. Carterette, V. Pavluz, H. Fangx, and E. Kanoulas. Million Query track 2009 overview. In TREC, 2009.Google Scholar
- O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM, pages 621--630, 2009. Google ScholarDigital Library
- H. Chen and D. R. Karger. Less is more: Probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In TREC, 2009.Google Scholar
- C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 Web track. In TREC, 2010.Google Scholar
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- C. L. A. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. In ICTIR, pages 188--199, 2009. Google ScholarDigital Library
- P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. Multiple approaches to analysing query diversity. In SIGIR, pages 734--735, 2009. Google ScholarDigital Library
- G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large Web datasets. Inf. Retr., 2011. Google ScholarDigital Library
- N. Craswell and D. Hawking. Overview of the TREC 2004 Web track. In TREC, 2004.Google Scholar
- N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In SIGIR, pages 416--423, 2005. Google ScholarDigital Library
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, pages 299--306, 2002. Google ScholarDigital Library
- X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, pages 115--122, 2008. Google ScholarDigital Library
- B. He and I. Ounis. Query performance prediction. Inf. Syst., 31(7):585--594, 2006. Google ScholarDigital Library
- I.-H. Kang and G. Kim. Query type classification for Web document retrieval. In SIGIR, pages 64--71, 2003. Google ScholarDigital Library
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.Google ScholarCross Ref
- D. A. Metzler. Automatic feature selection in the Markov random field model for information retrieval. In CIKM, pages 253--262, 2007. Google ScholarDigital Library
- I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A high performance and scalable information retrieval platform. In SIGIR, OSIR Workshop, 2006.Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical Report 1999--66, Stanford, 1999.Google Scholar
- J. Peng, C. Macdonald, and I. Ounis. Learning to select a ranking function. In ECIR, pages 114--126, 2010. Google ScholarDigital Library
- V. Plachouras, I. Ounis, and G. Amati. The static absorbing model for the Web. J. Web Eng., 4(2):165--186, 2005. Google ScholarDigital Library
- T. Qin, T.-Y. Liu, J. Xu, and H. Li. LETOR: A benchmark collection for research on learning to rank for information retrieval. Inf. Retr., 13(4):346--374, 2010. Google ScholarDigital Library
- S. E. Robertson. The probability ranking principle in IR. J. Doc., 33(4):294--304, 1977.Google ScholarCross Ref
- D. E. Rose and D. Levinson. Understanding user goals in Web search. In WWW, pages 13--19, 2004. Google ScholarDigital Library
- M. Sanderson. Ambiguous queries: Test collections need more sense. In SIGIR, pages 499--506, 2008. Google ScholarDigital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for Web search result diversification. In WWW, pages 881--890, 2010. Google ScholarDigital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Selectively diversifying Web search results. In CIKM, pages 1179--1188, 2010. Google ScholarDigital Library
- R. L. T. Santos and I. Ounis. Diversifying for multiple information needs. In ECIR, DDR Workshop, pages 37--41, 2011.Google Scholar
- R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, pages 87--99, 2010. Google ScholarDigital Library
- R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon. Identification of ambiguous queries in Web search. Inf. Process. Manage., 45(2):216--229, 2009. Google ScholarDigital Library
- K. Spärck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: Implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007. Google ScholarDigital Library
- J. Wang and J. Zhu. Portfolio theory of information retrieval. In SIGIR, pages 115--122, 2009. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools. 2005. Google ScholarDigital Library
- C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In SIGIR, pages 543--550, 2007. Google ScholarDigital Library
Index Terms
- Intent-aware search result diversification
Recommendations
Exploiting query reformulations for web search result diversification
WWW '10: Proceedings of the 19th international conference on World wide webWhen a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result ...
On the role of novelty for search result diversification
AbstractRe-ranking the search results in order to promote novel ones has traditionally been regarded as an intuitive diversification strategy. In this paper, we challenge this common intuition and thoroughly investigate the actual role of novelty for ...
Intent-based diversification of web search results: metrics and algorithms
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...
Comments