Abstract
Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D, a query Q, and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q(D) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval, and operations research.
This article investigates the complexity of result diversification for relational queries. (1) We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given k-element set, and to count how many k-element sets are ranked above a given bound based on an objective function. (2) We study these problems for a variety of query languages and for the three objective functions proposed in Gollapudi and Sharma [2009]. We establish the upper and lower bounds of these problems, all matching, for both combined complexity and data complexity. (3) We also investigate several special settings of these problems, identifying tractable cases. Moreover, (4) we reinvestigate these problems in the presence of compatibility constraints commonly found in practice, and provide their complexity in all these settings.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, On the Complexity of Query Result Diversification
- S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- G. Adomavicius and A. Tuzhilin. 2005. Towards the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Engin. 17, 6, 734--749. Google ScholarDigital Library
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Leong. 2009. Diversifying search results. In Proceedings of the International Conference on Web Search and Web Data Mining (WSDM'09). 5--14. Google ScholarDigital Library
- S. Amer-Yahia. 2011. Recommendation projects at yahoo! IEEE Data Engin. Bull. 34, 2, 69--77.Google Scholar
- S. Amer-Yahia, F. Bonchi, C. Castillo, E. Feuerstein, I. Mendez-Diaz, and P. Zabala. 2013. Complexity and algorithms for composite retrieval. In Proceedings of the International Conference on World Wide Web (WWW'13). 79--80. Google ScholarDigital Library
- G. Berbeglia and G. Hahn. 2010. Counting feasible solutions of the traveling salesman problem with pickups and deliveries is #P-complete. Discr. Appl. Math. 157, 11, 2541--2547. Google ScholarDigital Library
- A. Borodin, H. C. Lee, and Y. Ye. 2012. Max-sum diversification, monotone submodular functions and dynamic updates. In Proceedings of the ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS'12). 155--166. Google ScholarDigital Library
- G. Capannini, F. M. Nardini, R. Perego, and F. Silvestri. 2011. Efficient diversification of web search results. Proc. VLDB Endow. 4, 7, 451--459. Google ScholarDigital Library
- Z. Chen and T. Li. 2007. Addressing diverse user preferences in SQL-query-resul navigation. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). 641--652. Google ScholarDigital Library
- E. Demidova, P. Fankhauser, X. Zhou, and W. Nejdl. 2010. DivQ: Diversification for keyword search over structured databases. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'10). 331--338. Google ScholarDigital Library
- T. Deng and W. Fan. 2013. On the complexity of query result diversification. Proc. VLDB Endow. 6, 8, 577--588. Google ScholarDigital Library
- T. Deng, W. Fan, and F. Geerts. 2012. On the complexity of package recommendation problems. In Proceedings of the ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS'12). 261--272. Google ScholarDigital Library
- M. Drosou and E. Pitoura. 2009. Diversity over continuous data. IEEE Data Engin. Bull. 32, 4.Google Scholar
- M. Drosou and E. Pitoura. 2010. Search result diversification. SIGMOD Rec. 39, 1, 41--47. Google ScholarDigital Library
- A. Durand, M. Hermann, and P. G. Kolaitis. 2005. Subtractive reductions and complete problems for counting complexity classes. Theor. Comput. Sci. 340, 3, 496--513. Google ScholarDigital Library
- R. Fagin, A. Lotem, and M. Naor. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4, 614--656. Google ScholarDigital Library
- E. Feuerstein, P. A. Heiber, J. Martinez-Viademonte, and R. A. Baeza-Yates. 2007. New stochastic algorithms for scheduling ads in sponsored search. In Proceedings of the 5th Latin American Web Congress (LA-WEB'07). 22--31. Google ScholarDigital Library
- P. Fraternali, D. Martinenghi, and M. Tagliasacchi. 2012. Top-k bounded diversification. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'12). 421--432. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. 2009. An axiomatic approach for result diversification. In Proceedings of the 18th International Conference on World Wide Web (WWW'09). 381--390. Google ScholarDigital Library
- L. A. Hemaspaandra and H. Vollmer. 1995. The satanic notations: Counting classes beyond #P and other definitional adventures. SIGACT News 26, 1, 2--13. Google ScholarDigital Library
- I. F. Ilyas, G. Beskales, and M. A. Soliman. 2008. A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40, 4, 11:1--11:58. Google ScholarDigital Library
- W. Jin and J. M. Patel. 2011. Efficient and generic evaluation of ranked queries. In Proceedings of the ACM SIGMOD International Conference on Management of data (SIGMOD'11). 601--612. Google ScholarDigital Library
- G. Koutrika, B. Bercovitz, and H. Garcia-Molina. 2009. FlexRecs: Expressing and combining flexible recommendations. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'09). 745--758. Google ScholarDigital Library
- R. E. Ladner. 1989. Polynomial space counting problems. SIAM J. Comput. 18, 6, 1087--1097. Google ScholarDigital Library
- T. Lappas, K. Liu, and E. Terzi. 2009. Finding a team of experts in social networks. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'09). 467--476. Google ScholarDigital Library
- C. Li, M. A. Soliman, K. C.-C. Chang, and I. F. Ilyas. 2005. RankSQL: Supporting ranking queries in relational database management systems. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). 1342--1345. Google ScholarDigital Library
- Z. Liu, P. Sun, and Y. Chen. 2009. Structured search result differentiation. Proc. VLDB Endow. 2, 1, 313--324. Google ScholarDigital Library
- E. Minack, G. Demartini, and W. Nejdl. 2009. Current approaches to search result diversification. In Proceedings of the 1st International Workshop on Living Web.Google Scholar
- C. H. Papadimitriou. 1994. Computational Complexity. Addison-Wesley.Google Scholar
- A. G. Parameswaran, H. Garcia-Molina, and J. D. Ullman. 2010. Evaluating, combining and generalizing recommendations with prerequisites. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM'10). 919--928. Google ScholarDigital Library
- A. G. Parameswaran, P. Venetis, and H. Garcia-Molina. 2011. Recommendation systems with complex constraints: A course recommendation perspective. ACM Trans. Inf. Syst. 29, 4. Google ScholarDigital Library
- O. A. Prokopyev, N. Kong, and D. L. Martinez-Torres. 2009. The equitable dispersion problem. Euro. J. Oper. Res. 197, 1, 59--67.Google ScholarCross Ref
- K. Schnaitter and N. Polyzotis. 2008. Evaluating rank joins with optimal cost. In Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS'08). 43--52. Google ScholarDigital Library
- K. Stefanidis, M. Drosou, and E. Pitoura. 2010. Perk: Personalized keyword search in relational databases through preferences. In Proceedings of the 13th International Conference on Extending Database Technology (EDBT'10). 585--596 Google ScholarDigital Library
- L. Valiant. 1979. The complexity of computing the permanent. Theor. Comput. Sci. 8, 2, 189--201.Google ScholarCross Ref
- M. Y. Vardi. 1982. The complexity of relational query languages. In Proceedings of the 14th Annual ACM Symposium on Theory of Computing (STOC'82). 137--146. Google ScholarDigital Library
- E. Vee, U. Srivastava, J. Shanmugasundaram, P. Bhat, and S. A. Yahia. 2008. Efficient computation of diverse query results. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE'08). 228--236. Google ScholarDigital Library
- M. R. Vieira, H. L. Razente, M. C. N. Barioni, M. Hadjieleftheriou, D. Srivastava, Traina., and V. J. Tsotras. 2011. On query result diversification. In Proceedings of the 27th IEEE International Conference on Data Engineering (ICDE'11). 1163--1174. Google ScholarDigital Library
- M. Xie, L. V. S. Lakshmanan, and P. T. Wood. 2012. Composite recommendations: From items to packages. Frontiers Comput. Sci. 6, 3, 264--277.Google ScholarCross Ref
- C. Yu, L. Lakshmanan, and S. Amer-Yahia. 2009a. It takes variety to make a world: Diversification in recommender systems. In Proceedings of the 12th International Conference on Extending Database Technology (EDBT'09). 368--378. Google ScholarDigital Library
- C. Yu, L. V. Lakshmanan, and S. Amer-Yahia. 2009b. Recommendation diversification using explanations. In Proceedings of the IEEE International Conference on Data Engineering (ICDE'09). 1299--1302. Google ScholarDigital Library
- M. Zhang and N. Hurley. 2008. Avoiding monotony: Improving the diversity of recommendation lists. In Proceedings of the ACM Conference on Recommender Systems (RecSys'08). 123--130. Google ScholarDigital Library
- C.-N. Ziegler, S. M. Mcnee, J. A. Konstan, and G. Lausen. 2005. Improving recommendation lists through topic diversification. In Proceedings of the 14th International Conference on World Wide Web (WWW'05). 22--32. Google ScholarDigital Library
Index Terms
On the Complexity of Query Result Diversification
Recommendations
On the complexity of query result diversification
Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D, a query Q and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query,...
Multidimensional search result diversification: diverse search results for diverse users
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalHundreds of millions of people today rely on Web based Search Engines to satisfy their information needs. In order to meet the expectations of this vast and diverse user population, the search engine should present a list of results such that the ...
Recency ranking by diversification of result set
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementIn this paper, we propose a web search retrieval approach which automatically detects recency sensitive queries and increases the freshness of the ordinary document ranking by a degree proportional to the probability of the need in recent content. We ...
Comments