skip to main content
10.1145/2009916.2009997acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article

Intent-aware search result diversification

Published: 24 July 2011 Publication History

Abstract

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches.

References

[1]
R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009.
[2]
G. Amati, C. Carpineto, G. Romano, and F. U. Bordoni. Query difficulty, robustness and selective application of query expansion. In ECIR, pages 127--137, 2004.
[3]
L. Becchetti, C. Castillo, D. Donato, S. Leonardi, and R. Baeza-Yates. Link-based characterization and detection of Web spam. In AIRWeb, 2006.
[4]
A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3--10, 2002.
[5]
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998.
[6]
D. Carmel and E. Yom-Tov. Estimating the query difficulty for information retrieval. In SIGIR, page 911, 2010.
[7]
B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In CIKM, pages 1287--1296, 2009.
[8]
B. Carterette, V. Pavluz, H. Fangx, and E. Kanoulas. Million Query track 2009 overview. In TREC, 2009.
[9]
O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM, pages 621--630, 2009.
[10]
H. Chen and D. R. Karger. Less is more: Probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006.
[11]
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In TREC, 2009.
[12]
C. L. A. Clarke, N. Craswell, I. Soboroff, and G. V. Cormack. Overview of the TREC 2010 Web track. In TREC, 2010.
[13]
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008.
[14]
C. L. A. Clarke, M. Kolla, and O. Vechtomova. An effectiveness measure for ambiguous and underspecified queries. In ICTIR, pages 188--199, 2009.
[15]
P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. Multiple approaches to analysing query diversity. In SIGIR, pages 734--735, 2009.
[16]
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large Web datasets. Inf. Retr., 2011.
[17]
N. Craswell and D. Hawking. Overview of the TREC 2004 Web track. In TREC, 2004.
[18]
N. Craswell, S. Robertson, H. Zaragoza, and M. Taylor. Relevance weighting for query independent evidence. In SIGIR, pages 416--423, 2005.
[19]
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, pages 299--306, 2002.
[20]
X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, pages 115--122, 2008.
[21]
B. He and I. Ounis. Query performance prediction. Inf. Syst., 31(7):585--594, 2006.
[22]
I.-H. Kang and G. Kim. Query type classification for Web document retrieval. In SIGIR, pages 64--71, 2003.
[23]
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.
[24]
D. A. Metzler. Automatic feature selection in the Markov random field model for information retrieval. In CIKM, pages 253--262, 2007.
[25]
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: A high performance and scalable information retrieval platform. In SIGIR, OSIR Workshop, 2006.
[26]
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical Report 1999--66, Stanford, 1999.
[27]
J. Peng, C. Macdonald, and I. Ounis. Learning to select a ranking function. In ECIR, pages 114--126, 2010.
[28]
V. Plachouras, I. Ounis, and G. Amati. The static absorbing model for the Web. J. Web Eng., 4(2):165--186, 2005.
[29]
T. Qin, T.-Y. Liu, J. Xu, and H. Li. LETOR: A benchmark collection for research on learning to rank for information retrieval. Inf. Retr., 13(4):346--374, 2010.
[30]
S. E. Robertson. The probability ranking principle in IR. J. Doc., 33(4):294--304, 1977.
[31]
D. E. Rose and D. Levinson. Understanding user goals in Web search. In WWW, pages 13--19, 2004.
[32]
M. Sanderson. Ambiguous queries: Test collections need more sense. In SIGIR, pages 499--506, 2008.
[33]
R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for Web search result diversification. In WWW, pages 881--890, 2010.
[34]
R. L. T. Santos, C. Macdonald, and I. Ounis. Selectively diversifying Web search results. In CIKM, pages 1179--1188, 2010.
[35]
R. L. T. Santos and I. Ounis. Diversifying for multiple information needs. In ECIR, DDR Workshop, pages 37--41, 2011.
[36]
R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, pages 87--99, 2010.
[37]
R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon. Identification of ambiguous queries in Web search. Inf. Process. Manage., 45(2):216--229, 2009.
[38]
K. Spärck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: Implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007.
[39]
J. Wang and J. Zhu. Portfolio theory of information retrieval. In SIGIR, pages 115--122, 2009.
[40]
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools. 2005.
[41]
C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003.
[42]
Y. Zhou and W. B. Croft. Query performance prediction in web search environments. In SIGIR, pages 543--550, 2007.

Cited By

View all
  • (2021)How do Online Learning to Rank Methods Adapt to Changes of Intent?Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462937(911-920)Online publication date: 11-Jul-2021
  • (2020)Using an Inverted Index Synopsis for Query Latency and Performance PredictionACM Transactions on Information Systems10.1145/338979538:3(1-33)Online publication date: 18-May-2020
  • (2020)Health Information RetrievalSignal Processing Techniques for Computational Health Informatics10.1007/978-3-030-54932-9_8(193-207)Online publication date: 8-Oct-2020
  • Show More Cited By

Index Terms

  1. Intent-aware search result diversification

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
    July 2011
    1374 pages
    ISBN:9781450307574
    DOI:10.1145/2009916
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 July 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. diversity
    2. relevance
    3. web search

    Qualifiers

    • Research-article

    Conference

    SIGIR '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)39
    • Downloads (Last 6 weeks)6
    Reflects downloads up to 20 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)How do Online Learning to Rank Methods Adapt to Changes of Intent?Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3462937(911-920)Online publication date: 11-Jul-2021
    • (2020)Using an Inverted Index Synopsis for Query Latency and Performance PredictionACM Transactions on Information Systems10.1145/338979538:3(1-33)Online publication date: 18-May-2020
    • (2020)Health Information RetrievalSignal Processing Techniques for Computational Health Informatics10.1007/978-3-030-54932-9_8(193-207)Online publication date: 8-Oct-2020
    • (2019)Does Diversity Affect User Satisfaction in Image SearchACM Transactions on Information Systems10.1145/332011837:3(1-30)Online publication date: 8-May-2019
    • (2019)Low-cost, bottom-up measures for evaluating search result diversificationInformation Retrieval Journal10.1007/s10791-019-09356-xOnline publication date: 20-Apr-2019
    • (2019)The impact of result diversification on search behaviour and performanceInformation Retrieval Journal10.1007/s10791-019-09353-0Online publication date: 16-May-2019
    • (2018)Seed-Guided Topic Model for Document Filtering and ClassificationACM Transactions on Information Systems10.1145/323825037:1(1-37)Online publication date: 6-Dec-2018
    • (2018)Search Result Diversity Evaluation Based on Intent HierarchiesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.272955930:1(156-169)Online publication date: 1-Jan-2018
    • (2018)Scalable Aspects Learning for Intent-Aware Diversified Search on Social NetworksIEEE Access10.1109/ACCESS.2018.28509356(37124-37137)Online publication date: 2018
    • (2018)User session level diverse reranking of search resultsNeurocomputing10.1016/j.neucom.2016.05.087274:C(66-79)Online publication date: 24-Jan-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media