research-article

Efficient diversity-aware search

Authors:

Nick KoudasAuthors Info & Claims

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Pages 781 - 792

https://doi.org/10.1145/1989323.1989405

Published: 12 June 2011 Publication History

Abstract

Typical approaches of ranking information in response to a user's query that return the most relevant results ignore important factors contributing to user satisfaction; for instance, the contents of a result document may be redundant given the results already examined. Motivated by emerging applications, in this work we study the problem of Diversity-Aware Search, the essence of which is ranking search results based on both their relevance, as well as their dissimilarity to other results reported.

Diversity-Aware Search is generally a hard problem, and even tractable instances thereof cannot be efficiently solved by adapting existing approaches. We propose DIVGEN, an efficient algorithm for diversity-aware search, which achieves significant performance improvements via novel data access primitives. Although selecting the optimal schedule of data accesses is a hard problem, we devise the first low-overhead data access prioritization scheme with theoretical quality guarantees, and good performance in practice. A comprehensive evaluation on real and synthetic large-scale corpora demonstrates the efficiency and effectiveness of our approach.

References

[1]

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, 2009.

Digital Library

[2]

A. Angel, S. Chaudhuri, G. Das, and N. Koudas. Ranking objects based on relationships and fixed associations. In EDBT, 2009.

Digital Library

[3]

A. Angel and N. Koudas. Efficient diversity-aware search. Tr., 2010. Available at http://tinyurl.com/diversityaware.

Digital Library

[4]

A. Angel, N. Koudas, N. Sarkas, and D. Srivastava. What's on the grapevine ? In SIGMOD, 2009.

Digital Library

[5]

D. Appelt and D. Israel. Introduction to information extraction. In IJCAI Tutorial, 1999.

[6]

H. Bast, D. Majumdar, R. Schenkel, M. Theobald, and G. Weikum. Io-top-k: Index-access optimized top-k query processing. In VLDB, 2006.

Digital Library

[7]

J. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998.

Digital Library

[8]

O. Chapelle, D. Metlzer, Y. Zhang, and P. Grinspan. Expected reciprocal rank for graded relevance. In CIKM, 2009.

Digital Library

[9]

H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, 2006.

Digital Library

[10]

C. L. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, 2008.

Digital Library

[11]

K. El-Arini, G. Veda, D. Shahaf, and C. Guestrin. Turning down the noise in the blogosphere. In KDD, 2009.

Digital Library

[12]

R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS, 2001.

Digital Library

[13]

K. Golenberg, B. Kimelfeld, and Y. Sagiv. Keyword proximity search in complex data graphs. In SIGMOD, 2008.

Digital Library

[14]

S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, 2009.

Digital Library

[15]

A. Jain, P. Sarda, and J. R. Haritsa. Providing diversity in k-nearest neighbor query results. In PAKDD, 2004.

[16]

T. Joachims, L. Granka, B. Pan, H. Hembrooke, F. Radlinski, and G. Gay. Evaluating the accuracy of implicit feedback from clicks and query reformulations in web search. ACM Trans. Inf. Syst., 2007.

Digital Library

[17]

C. Manning, P. Raghavan, and H. Shutze. Introduction to Information Retrieval. Cambridge UP, 2008.

Digital Library

[18]

S. T. McCormick. Submodular function minimization. In Discrete Optimization, volume 12, pages 321 -- 391. Elsevier, 2005.

[19]

F. Radlinski, P. N. Bennett, B. Carterette, and T. Joachims. Redundancy, diversity and interdependent document relevance. SIGIR Forum, 2009.

Digital Library

[20]

F. Radlinski and S. Dumais. Improving personalized web search using result diversification. In SIGIR, 2006.

Digital Library

[21]

T. Roelleke and J. Wang. Tf-idf uncovered: a study of theories and probabilities. In SIGIR, 2008.

Digital Library

[22]

N. Sarkas, A. Angel, N. Koudas, and D. Srivastava. Efficient identification of coupled entities in document collections. In ICDE, 2010.

[23]

A. Suzuki and T. Tokuyama. Dense subgraph problems with output-density conditions. ACM Trans. Algorithms, 4(4), 2008.

Digital Library

[24]

E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. Amer-Yahia. Efficient computation of diverse query results. In ICDE, 2008.

Digital Library

[25]

C. Yu, L. Lakshmanan, and S. Amer-Yahia. It takes variety to make a world: diversification in recommender systems. In EDBT, 2009.

Digital Library

[26]

C. Zhai. Risk Minimization and Language Modeling in Information Retrieval. PhD thesis, Carnegie Mellon University, 2002.

[27]

B. Zhang, H. Li, Y. Liu, L. Ji, W. Xi, W. Fan, Z. Chen, and W.-Y. Ma. Improving web search results using affinity graph. In SIGIR, 2005.

Digital Library

[28]

Y. Zhang, J. P. Callan, and T. P. Minka. Novelty and redundancy detection in adaptive filtering. In SIGIR, 2002.

Digital Library

[29]

X. Zhu, A. B. Goldberg, J. Van, and G. D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In HLT-NAACL, 2007.

[30]

C.-N. Ziegler, S. M. McNee, J. A. Konstan, and G. Lausen. Improving recommendation lists through topic diversification. In WWW, 2005.

Digital Library

Cited By

Huang KCui YYe QZhao YZhao XTian YZheng KHu HZhou X(2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3312566
Mahabadi STrajanovski SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Core-sets for fair and diverse data summarizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669576(78987-79011)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669576
Li JMoskovitch YStoyanovich JJagadish H(2023)Query Refinement for Diversity Constraint SatisfactionProceedings of the VLDB Endowment10.14778/3626292.362629517:2(106-118)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.14778/3626292.3626295
Show More Cited By

Index Terms

Efficient diversity-aware search
1. Information systems
  1. Information retrieval

Recommendations

Sponsored Search: Is Money a Motivator for Providing Relevant Results?

Analysis of data from a major metasearch engine reveals that sponsored-link click-through rates appear lower than previously reported. Combining sponsored and nonsponsored links in a single listing, while providing some benefits to users, does not ...
Actively predicting diverse search intent from user browsing behaviors
WWW '10: Proceedings of the 19th international conference on World wide web

This paper is concerned with actively predicting search intent from user browsing behavior data. In recent years, great attention has been paid to predicting user search intent. However, the prediction was mostly passive because it was performed only ...
Intent-based diversification of web search results: metrics and algorithms

We study the problem of web search result diversification in the case where intent based relevance scores are available. A diversified search result will hopefully satisfy the information need of user-L.s who may have different intents. In this context, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

June 2011

1364 pages

ISBN:9781450306614

DOI:10.1145/1989323

General Chair:
Timos Sellis
IMIS/RC Athena
,
Program Chair:
Renée J. Miller
University of Toronto
,
Publications Chairs:
Anastasios Kementsietsidis
IBM T.J. Watson Research Center
,
Yannis Velegrakis
University of Trento

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '11

Sponsor:

SIGMOD

SIGMOD/PODS '11: International Conference on Management of Data

June 12 - 16, 2011

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

85
Total Citations
View Citations
733
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)2

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Huang KCui YYe QZhao YZhao XTian YZheng KHu HZhou X(2024)TED$^+$: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.3312566(1-14)Online publication date: 2024
https://doi.org/10.1109/TKDE.2023.3312566
Mahabadi STrajanovski SOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Core-sets for fair and diverse data summarizationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3669576(78987-79011)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3669576
Li JMoskovitch YStoyanovich JJagadish H(2023)Query Refinement for Diversity Constraint SatisfactionProceedings of the VLDB Endowment10.14778/3626292.362629517:2(106-118)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.14778/3626292.3626295
Huang KHu HYe QTian KZheng BZhou X(2023)TED: Towards Discovering Top-k Edge-Diversified Patterns in a Graph DatabaseProceedings of the ACM on Management of Data10.1145/35887361:1(1-26)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588736
Zhu HLiu WYin JCui NXu JHuang XLee W(2023)Keyword-based Socially Tenuous Group Queries2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00079(965-977)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00079
Fujita YHayashi TKuwahara M(2023)Topic-Based Search: Dataset Search without Metadata and Users’ Knowledge about Data2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386387(5629-5638)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigData59044.2023.10386387
Wang YMary ASagot MSinaimeri B(2023)A General Framework for Enumerating Equivalence Classes of SolutionsAlgorithmica10.1007/s00453-023-01131-185:10(3003-3023)Online publication date: 4-May-2023
https://doi.org/10.1007/s00453-023-01131-1
Haldar NLi JAli MCai TChen YSellis TReynolds M(2022)Top-k Socio-Spatial Co-engaged Location Selection for Social UsersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3151095(1-1)Online publication date: 2022
https://doi.org/10.1109/TKDE.2022.3151095
Wang KWang SCao XQin L(2022)Efficient Radius-Bounded Community Search in Geo-Social NetworksIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304017234:9(4186-4200)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TKDE.2020.3040172
Huang JHuang XXu J(2022)Truss-Based Structural Diversity Search in Large GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.302795034:8(4037-4051)Online publication date: 1-Aug-2022
https://doi.org/10.1109/TKDE.2020.3027950
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten