skip to main content
research-article

A survey of top-k query processing techniques in relational database systems

Published:15 October 2008Publication History
Skip Abstract Section

Abstract

Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search, and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.

References

  1. Acharya, S., Gibbons, P. B., and Poosala, V. 2000. Congressional samples for approximate answering of group-by queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 487--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amato, G., Rabitti, F., Savino, P., and Zezula, P. 2003. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inform. Syst. 21, 2, 192--227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005. Structure and content scoring for xml. In Proceedings of the 31st International Conference on Very Large Data Bases. 361--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Aref, W. G., Catlin, A. C., Elmagarmid, A. K., Fan, J., Hammad, M. A., Ilyas, I. F., Marzouk, M., Prabhakar, S., and Zhu, X. 2004. VDBMS: A testbed facility for research in video database benchmarking. ACM Multimed. Syst. J. (Special Issue on Multimedia Document Management Systems) 9, 6, 575--585.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arrow, K. 1951. Social Choice and Individual Values. Wiley, New York, NY.Google ScholarGoogle Scholar
  6. Babcock, B., Chaudhuri, S., and Das, G. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barbará, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4, 5, 487--502. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Black, D. 1958. The Theory of Committees and Elections. Cambridge University Press, London, U.K.Google ScholarGoogle Scholar
  9. Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. 421. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brams, S. J. and Fishburn, P. C. 1983. Approval Voting. Birkhauser, Boston, MA.Google ScholarGoogle Scholar
  11. Bruno, N., Chaudhuri, S., and Gravano, L. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27, 2, 153--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over Web-accessible databases. In Proceedings of the 18th International Conference on Data Engineering. 369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chakrabarti, K., Garofalakis, M., Rastogi, R., and Shim, K. 2001. Approximate query processing using wavelets. VLDB J. 10, 2--3, 199--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chang, K. C. and Hwang, S. 2002. Minimal probing: supporting expensive predicates for top-k queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Chang, Y., Bergman, L. D., Castelli, V., Li, C., Lo, M., and Smith, J. R. 2000. The Onion Technique: Indexing for linear optimization queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 391--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Chaudhuri, S., Das, G., Datar, M., Motwani, R., and Narasayya, V. R. 2001a. Overcoming limitations of sampling for aggregation queries. In Proceedings of the 17th International Conference on Data Engineering. 534--542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chaudhuri, S., Das, G., and Narasayya, V. 2001b. A robust, optimization-based approach for approximate answering of aggregate queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough? In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. 436--447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Copeland, A. H. 1951. A reasonable social welfare function. Mimeo. University of Michigan, Ann Arbor, MI.Google ScholarGoogle Scholar
  20. Cranor, L. 1996. Declared-strategy voting: An instrument for group decision-making. Ph.D. dissertation. Washington University, St. Louis, MO. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Das, G., Gunopulos, D., Koudas, N., and Tsirogiannis, D. 2006. Answering top-k queries using views. In Proceedings of the 32nd International Conference on Very Large Data Bases. 451--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Diaconis, P. 1998. Group Representation in Probability and Statistics. Institute of Mathematical Statistics. Web site: www.imstat.org.Google ScholarGoogle Scholar
  23. Diaconis, P. and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Statist. Soc. Series B 39, 2, 262--268.Google ScholarGoogle ScholarCross RefCross Ref
  24. Donjerkovic, D. and Ramakrishnan, R. 1999. Probabilistic optimization of top N queries. In Proceedings of the 25th International Conference on Very Large Data Bases. 411--422. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Dwork, C., Kumar, S. R., Naor, M., and Sivakumar, D. 2001. Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on World Wide Web. 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., and Vee, E. 2004. Comparing and aggregating rankings with ties. In Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 47--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms. 28--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 1, 1, 614--656. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Fuhr, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the 16th International Conference on Very Large Data Bases. 696--707. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ganti, V., Lee, M., and Ramakrishnan, R. 2000. ICICLES: Self-tuning samples for approximate query answering. In Proceedings of the 26th International Conference on Very Large Data Bases. 176--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Getoor, L. and Diehl, C. P. 2005. Link mining: A survey. ACM SIGKDD Explor. Newsl. 7, 2, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Güntzer, U., Balke, W., and Kießling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Data Bases. 419--428. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Güntzer, U., Balke, W., and Kießling, W. 2001. Towards efficient multi-feature queries in heterogeneous environments. In Proceedings of the International Conference on Information Technology: Coding and Computing. 622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 16--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. 171--182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, 13--30.Google ScholarGoogle Scholar
  37. Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 259--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Hristidis, V. and Papakonstantinou, Y. 2004. Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13, 1, 49--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hwang, S. and Chang, K. C. 2007a. Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Trans. Database Syst. 32, 1, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Hwang, S. and Chang, K. C. 2007b. Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates. IEEE Trans. Knowl. Data Eng. 19, 5, 646--662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2002. Joining ranked inputs in practice. In Proceedings of the 28th International Conference on Very Large Data Bases. 950--961. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. Supporting top-k join queries in relational databases. VLDB J. 13, 3, 207--221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ilyas, I. F., Aref, W. G., Elmagarmid, A. K., Elmongui, H. G., Shah, R., and Vitter, J. S. 2006. Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31, 4, 1257--1304. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. Rank-aware query optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ilyas, F. I., Walid, G. A., and Elmagarmid, A. K. 2003. Supporting top-k join queries in relational databases. In Proceedings of the 29th International Conference on Very Large Databases. 754--765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Imielinski, T. and Lipski Jr., W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Kendall, M. G. 1945. The treatment of ties in ranking problems. Biometrika 33, 3, 239--251.Google ScholarGoogle ScholarCross RefCross Ref
  48. Klamler, C. 2003. A comparison of the dodgson method and the copeland rule. Econ. Bull. 4, 8, 1--7.Google ScholarGoogle Scholar
  49. Li, C., Chang, K. C., and Ilyas, I. F. 2006. Supporting ad-hoc ranking aggregates. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 61--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Li, C., Chang, K. C., Ilyas, I. F., and Song, S. 2005. RankSQL: Query algebra and optimization for relational top-k queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 131--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mamoulis, N., Cheng, K. H., Yiu, M. L., and Cheung, D. W. 2006. Efficient aggregation of ranked inputs. In Proceedings of the 22rd International Conference on Data Engineering. 72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Marian, A., Bruno, N., and Gravano, L. 2004. Evaluating top-k queries over Web-accessible databases. ACM Trans. Database Syst. 29, 2, 319--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Michel, S., Triantafillou, P., and Weikum, G. 2005. KLEE: A framework for distributed top-k query algorithms. In Proceedings of the 31st International Conference on Very Large Data Bases. 637--648. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Natsev, A., Chang, Y., Smith, J. R., Li, C., and Vitter, J. S. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International Conference on Very Large Data Bases. 281--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Nurmi, H. 1987. Comparing Voting Systems. D. Reidel Publishing Company, Dordrecht, Germany.Google ScholarGoogle Scholar
  56. Ré, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the 23rd International Conference on Data Engineering. 886--895.Google ScholarGoogle Scholar
  57. Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 232--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Salton, G. and McGill, M. J. 1983. Introduction to Modern IR. McGraw-Hill, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Silberstein, A., Braynard, R., Ellis, C. S., Munagala, K., and Yang, J. 2006. A sampling-based approach to optimizing top-k queries in sensor networks. In Proceedings of the 22nd International Conference on Data Engineering. 68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Soliman, M. A., Ilyas, I. F., and Chang, K. C.-C. 2007. Top-k query processing in uncertain databases. In Proceedings of the 23rd International Conference on Data Engineering. 896--905.Google ScholarGoogle Scholar
  61. Theobald, M., Schenkel, R., and Weikum, G. 2005. An efficient and versatile query engine for TopX search. In Proceedings of the 31st International Conference on Very Large Data Bases. 625--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Theobald, M., Weikum, G., and Schenkel, R. 2004. Top-k query evaluation with probabilistic guarantees. In Proceedings of the 30th International Conference on Very Large Data Bases. 648--659. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Truchon, M. 1998. An extension of the Condorcet criterion and Kemeny orders. Cahier 98-15. Centre de Recherche en Economie et Finance Appliquees, Université Laval, Québec, Canada.Google ScholarGoogle Scholar
  64. Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., and Srivastava, D. 2003. Ranked join indices. In Proceedings of the 19th International Conference on Data Engineering. 277.Google ScholarGoogle Scholar
  65. Vrbsky, S. V. and Liu, J. W. S. 1993. APPROXIMATE—a query processor that produces monotonically improving approximate answers. IEEE Trans. Knowl. Data Eng. 5, 6, 1056--1068. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Xin, D., Chen, C., and Han, J. 2006. Towards robust indexing for ranked queries. In Proceedings of the 32nd International Conference on Very Large Data Bases. 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Xin, D., Han, J., and Chang, K. C. 2007. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 103--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Young, H. P. and Levenglick, A. 1978. A consistent extension of condorcet's election principle. SIAM J. Appl. Math. 35, 2, 285--300.Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J. X., and Zhang, Q. 2005. Efficient computation of the skyline cube. In Proceedings of the 31st International Conference on Very Large Data Bases. 241--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Zhang, Z., Hwang, S., Chang, K. C., Wang, M., Lang, C. A., and Chang, Y. 2006. Boolean + ranking: Querying a database by k-constrained optimization. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 359--370. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A survey of top-k query processing techniques in relational database systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 40, Issue 4
      October 2008
      145 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/1391729
      Issue’s Table of Contents

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 October 2008
      • Accepted: 1 November 2007
      • Revised: 1 July 2007
      • Received: 1 January 2007
      Published in csur Volume 40, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader