ABSTRACT
When a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particular, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the ranking as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In Proc. of WSDM, pages 5--14, 2009. Google ScholarDigital Library
- G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G. Gambosi. FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog track. In Proc. of TREC, 2007.Google Scholar
- R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In Proc. of EDBT Workshops, pages 588--596, 2004. Google ScholarDigital Library
- P. Boldi, F. Bonchi, C. Castillo, and S. Vigna. From 'Dango' to 'Japanese cakes': query reformulation models and patterns. In Proc. of WI--IAT, pages 183--190, 2009. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR, pages 335--336, 1998. Google ScholarDigital Library
- B. Carterette. An analysis of NP-completeness in novelty and diversity ranking. In Proc. of ICTIR, pages 200--211, 2009. Google ScholarDigital Library
- B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In Proc. of CIKM, pages 1287--1296, 2009. Google ScholarDigital Library
- H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In Proc. of SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- C. L. A. Clarke, N. Craswell, and I. Soboroff. Preliminary report on the TREC 2009 Web track. In Proc. of TREC, 2009.Google Scholar
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In Proc. of SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- C. L. A. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. In Proc. of ICTIR, pages 188--199, 2009. Google ScholarDigital Library
- W. S. Cooper. The inadequacy of probability of usefulness as a ranking criterion for retrieval system output. Technical report, Univ. of California, 1971.Google Scholar
- W. Goffman. On relevance as a measure. IP&M, 2(3):201--203, 1964.Google Scholar
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In Proc. of WWW, pages 381--390, 2009. Google ScholarDigital Library
- B. He, C. Macdonald, I. Ounis, J. Peng, and R. L. T. Santos. University of Glasgow at TREC 2008: experiments in Blog, Enterprise, and Relevance Feedback tracks with Terrier. In Proc. of TREC, 2008.Google Scholar
- M. A. Hearst. Search User Interfaces. Cambridge University Press, 2009. Google ScholarDigital Library
- D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Univ. of Twente, 2001.Google Scholar
- D. S. Hochbaum, editor. Approximation algorithms for NP-hard problems. PWS Publishing Co., 1997. Google ScholarDigital Library
- B. J. Jansen, A. Spink, J. Bateman, and T. Saracevic. Real life information retrieval: a study of user queries on the Web. SIGIR Forum, 32(1):5--17, 1998. Google ScholarDigital Library
- K. Jarvelin and J. Kekalainen. Cumulated gain-based evaluation of IR techniques. ACM TOIS, 20(4):422--446, 2002. Google ScholarDigital Library
- I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: a high performance and scalable information retrieval platform. In Proc. of SIGIR, OSIR Workshop, 2006.Google Scholar
- J. Peng, C. Macdonald, B. He, V. Plachouras, and I. Ounis. Incorporating term dependency in the DFR framework. In Proc. of SIGIR, pages 843--844, 2007. Google ScholarDigital Library
- F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In Proc. of SIGIR, pages 691--692, 2006. Google ScholarDigital Library
- S. E. Robertson. The probability ranking principle in IR. Journal of Documentation, 33(4):294--304, 1977.Google ScholarCross Ref
- S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In Proc. of TREC, 1994.Google Scholar
- J. J. Rocchio. Relevance feedback in information retrieval. In The SMART Retrieval System, pages 313--323. 1971.Google Scholar
- R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In Proc. of ECIR, 2010. Google ScholarDigital Library
- M. Shokouhi. Central-rank-based collection selection in uncooperative distributed information retrieval. In Proc. of ECIR, pages 160--172, 2007. Google ScholarDigital Library
- K. Sparck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007. Google ScholarDigital Library
- J. Wang and J. Zhu. Portfolio theory of information retrieval. In Proc. of SIGIR, pages 115--122, 2009. Google ScholarDigital Library
- J. Yi and F. Maghoul. Query clustering using click-through graph. In Proc. of WWW, pages 1055--1056, 2009. Google ScholarDigital Library
- H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma. Learning to cluster Web search results. In Proc. of SIGIR, pages 210--217, 2004. Google ScholarDigital Library
- C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In Proc. of SIGIR, pages 10--17, 2003. Google ScholarDigital Library
Index Terms
- Exploiting query reformulations for web search result diversification
Recommendations
Intent-aware search result diversification
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalSearch result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved ...
Selectively diversifying web search results
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementSearch result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more ...
Intent-based diversification of web search results: metrics and algorithms
We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...
Comments