research-article

A survey of top-k query processing techniques in relational database systems

Authors:
Ihab F. Ilyas

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
George Beskales

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Mohamed A. Soliman

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 40 Issue 4Article No.: 11pp 1–58https://doi.org/10.1145/1391729.1391730

Published:15 October 2008Publication History

ACM Computing Surveys

Abstract

Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search, and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.

References

Acharya, S., Gibbons, P. B., and Poosala, V. 2000. Congressional samples for approximate answering of group-by queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 487--498. Google ScholarDigital Library
Amato, G., Rabitti, F., Savino, P., and Zezula, P. 2003. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inform. Syst. 21, 2, 192--227. Google ScholarDigital Library
Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005. Structure and content scoring for xml. In Proceedings of the 31st International Conference on Very Large Data Bases. 361--372. Google ScholarDigital Library
Aref, W. G., Catlin, A. C., Elmagarmid, A. K., Fan, J., Hammad, M. A., Ilyas, I. F., Marzouk, M., Prabhakar, S., and Zhu, X. 2004. VDBMS: A testbed facility for research in video database benchmarking. ACM Multimed. Syst. J. (Special Issue on Multimedia Document Management Systems) 9, 6, 575--585.Google ScholarDigital Library
Arrow, K. 1951. Social Choice and Individual Values. Wiley, New York, NY.Google Scholar
Babcock, B., Chaudhuri, S., and Das, G. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 539--550. Google ScholarDigital Library
Barbará, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4, 5, 487--502. Google ScholarDigital Library
Black, D. 1958. The Theory of Committees and Elections. Cambridge University Press, London, U.K.Google Scholar
Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. 421. Google ScholarDigital Library
Brams, S. J. and Fishburn, P. C. 1983. Approval Voting. Birkhauser, Boston, MA.Google Scholar
Bruno, N., Chaudhuri, S., and Gravano, L. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27, 2, 153--187. Google ScholarDigital Library
Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over Web-accessible databases. In Proceedings of the 18th International Conference on Data Engineering. 369. Google ScholarDigital Library
Chakrabarti, K., Garofalakis, M., Rastogi, R., and Shim, K. 2001. Approximate query processing using wavelets. VLDB J. 10, 2--3, 199--223. Google ScholarDigital Library
Chang, K. C. and Hwang, S. 2002. Minimal probing: supporting expensive predicates for top-k queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 346--357. Google ScholarDigital Library
Chang, Y., Bergman, L. D., Castelli, V., Li, C., Lo, M., and Smith, J. R. 2000. The Onion Technique: Indexing for linear optimization queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 391--402. Google ScholarDigital Library
Chaudhuri, S., Das, G., Datar, M., Motwani, R., and Narasayya, V. R. 2001a. Overcoming limitations of sampling for aggregation queries. In Proceedings of the 17th International Conference on Data Engineering. 534--542. Google ScholarDigital Library
Chaudhuri, S., Das, G., and Narasayya, V. 2001b. A robust, optimization-based approach for approximate answering of aggregate queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 295--306. Google ScholarDigital Library
Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough&quest; In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. 436--447. Google ScholarDigital Library
Copeland, A. H. 1951. A reasonable social welfare function. Mimeo. University of Michigan, Ann Arbor, MI.Google Scholar
Cranor, L. 1996. Declared-strategy voting: An instrument for group decision-making. Ph.D. dissertation. Washington University, St. Louis, MO. Google ScholarDigital Library
Das, G., Gunopulos, D., Koudas, N., and Tsirogiannis, D. 2006. Answering top-k queries using views. In Proceedings of the 32nd International Conference on Very Large Data Bases. 451--462. Google ScholarDigital Library
Diaconis, P. 1998. Group Representation in Probability and Statistics. Institute of Mathematical Statistics. Web site: www.imstat.org.Google Scholar
Diaconis, P. and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Statist. Soc. Series B 39, 2, 262--268.Google ScholarCross Ref
Donjerkovic, D. and Ramakrishnan, R. 1999. Probabilistic optimization of top N queries. In Proceedings of the 25th International Conference on Very Large Data Bases. 411--422. Google ScholarDigital Library
Dwork, C., Kumar, S. R., Naor, M., and Sivakumar, D. 2001. Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on World Wide Web. 613--622. Google ScholarDigital Library
Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., and Vee, E. 2004. Comparing and aggregating rankings with ties. In Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 47--58. Google ScholarDigital Library
Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms. 28--36. Google ScholarDigital Library
Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 1, 1, 614--656. Google ScholarDigital Library
Fuhr, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the 16th International Conference on Very Large Data Bases. 696--707. Google ScholarDigital Library
Ganti, V., Lee, M., and Ramakrishnan, R. 2000. ICICLES: Self-tuning samples for approximate query answering. In Proceedings of the 26th International Conference on Very Large Data Bases. 176--187. Google ScholarDigital Library
Getoor, L. and Diehl, C. P. 2005. Link mining: A survey. ACM SIGKDD Explor. Newsl. 7, 2, 3--12. Google ScholarDigital Library
Güntzer, U., Balke, W., and Kießling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Data Bases. 419--428. Google ScholarDigital Library
Güntzer, U., Balke, W., and Kießling, W. 2001. Towards efficient multi-feature queries in heterogeneous environments. In Proceedings of the International Conference on Information Technology: Coding and Computing. 622. Google ScholarDigital Library
Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 16--27. Google ScholarDigital Library
Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. 171--182. Google ScholarDigital Library
Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, 13--30.Google Scholar
Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 259--270. Google ScholarDigital Library
Hristidis, V. and Papakonstantinou, Y. 2004. Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13, 1, 49--70. Google ScholarDigital Library
Hwang, S. and Chang, K. C. 2007a. Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Trans. Database Syst. 32, 1, 5. Google ScholarDigital Library
Hwang, S. and Chang, K. C. 2007b. Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates. IEEE Trans. Knowl. Data Eng. 19, 5, 646--662. Google ScholarDigital Library
Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2002. Joining ranked inputs in practice. In Proceedings of the 28th International Conference on Very Large Data Bases. 950--961. Google ScholarDigital Library
Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. Supporting top-k join queries in relational databases. VLDB J. 13, 3, 207--221. Google ScholarDigital Library
Ilyas, I. F., Aref, W. G., Elmagarmid, A. K., Elmongui, H. G., Shah, R., and Vitter, J. S. 2006. Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31, 4, 1257--1304. Google ScholarDigital Library
Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. Rank-aware query optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 203--214. Google ScholarDigital Library
Ilyas, F. I., Walid, G. A., and Elmagarmid, A. K. 2003. Supporting top-k join queries in relational databases. In Proceedings of the 29th International Conference on Very Large Databases. 754--765. Google ScholarDigital Library
Imielinski, T. and Lipski Jr., W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarDigital Library
Kendall, M. G. 1945. The treatment of ties in ranking problems. Biometrika 33, 3, 239--251.Google ScholarCross Ref
Klamler, C. 2003. A comparison of the dodgson method and the copeland rule. Econ. Bull. 4, 8, 1--7.Google Scholar
Li, C., Chang, K. C., and Ilyas, I. F. 2006. Supporting ad-hoc ranking aggregates. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 61--72. Google ScholarDigital Library
Li, C., Chang, K. C., Ilyas, I. F., and Song, S. 2005. RankSQL: Query algebra and optimization for relational top-k queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 131--142. Google ScholarDigital Library
Mamoulis, N., Cheng, K. H., Yiu, M. L., and Cheung, D. W. 2006. Efficient aggregation of ranked inputs. In Proceedings of the 22rd International Conference on Data Engineering. 72. Google ScholarDigital Library
Marian, A., Bruno, N., and Gravano, L. 2004. Evaluating top-k queries over Web-accessible databases. ACM Trans. Database Syst. 29, 2, 319--362. Google ScholarDigital Library
Michel, S., Triantafillou, P., and Weikum, G. 2005. KLEE: A framework for distributed top-k query algorithms. In Proceedings of the 31st International Conference on Very Large Data Bases. 637--648. Google ScholarDigital Library
Natsev, A., Chang, Y., Smith, J. R., Li, C., and Vitter, J. S. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International Conference on Very Large Data Bases. 281--290. Google ScholarDigital Library
Nurmi, H. 1987. Comparing Voting Systems. D. Reidel Publishing Company, Dordrecht, Germany.Google Scholar
Ré, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the 23rd International Conference on Data Engineering. 886--895.Google Scholar
Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 232--241. Google ScholarDigital Library
Salton, G. and McGill, M. J. 1983. Introduction to Modern IR. McGraw-Hill, New York, NY. Google ScholarDigital Library
Silberstein, A., Braynard, R., Ellis, C. S., Munagala, K., and Yang, J. 2006. A sampling-based approach to optimizing top-k queries in sensor networks. In Proceedings of the 22nd International Conference on Data Engineering. 68. Google ScholarDigital Library
Soliman, M. A., Ilyas, I. F., and Chang, K. C.-C. 2007. Top-k query processing in uncertain databases. In Proceedings of the 23rd International Conference on Data Engineering. 896--905.Google Scholar
Theobald, M., Schenkel, R., and Weikum, G. 2005. An efficient and versatile query engine for TopX search. In Proceedings of the 31st International Conference on Very Large Data Bases. 625--636. Google ScholarDigital Library
Theobald, M., Weikum, G., and Schenkel, R. 2004. Top-k query evaluation with probabilistic guarantees. In Proceedings of the 30th International Conference on Very Large Data Bases. 648--659. Google ScholarDigital Library
Truchon, M. 1998. An extension of the Condorcet criterion and Kemeny orders. Cahier 98-15. Centre de Recherche en Economie et Finance Appliquees, Université Laval, Québec, Canada.Google Scholar
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., and Srivastava, D. 2003. Ranked join indices. In Proceedings of the 19th International Conference on Data Engineering. 277.Google Scholar
Vrbsky, S. V. and Liu, J. W. S. 1993. APPROXIMATE—a query processor that produces monotonically improving approximate answers. IEEE Trans. Knowl. Data Eng. 5, 6, 1056--1068. Google ScholarDigital Library
Xin, D., Chen, C., and Han, J. 2006. Towards robust indexing for ranked queries. In Proceedings of the 32nd International Conference on Very Large Data Bases. 235--246. Google ScholarDigital Library
Xin, D., Han, J., and Chang, K. C. 2007. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 103--114. Google ScholarDigital Library
Young, H. P. and Levenglick, A. 1978. A consistent extension of condorcet's election principle. SIAM J. Appl. Math. 35, 2, 285--300.Google ScholarDigital Library
Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J. X., and Zhang, Q. 2005. Efficient computation of the skyline cube. In Proceedings of the 31st International Conference on Very Large Data Bases. 241--252. Google ScholarDigital Library
Zhang, Z., Hwang, S., Chang, K. C., Wang, M., Lang, C. A., and Chang, Y. 2006. Boolean + ranking: Querying a database by k-constrained optimization. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 359--370. Google ScholarDigital Library

Index Terms

A survey of top-k query processing techniques in relational database systems
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Adaptive rank-aware query optimization in relational databases

Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-k queries is an integral part of the application semantics. In this article, we introduce a rank-aware ...
Read More
A Novel Top-k Query Scheme in Unstructured P2P Networks
CIT '09: Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02

There're two major problems in the unstructured p2p systems, one is their heavy network traffic; the other is the problem of query effectiveness, which is caused mainly by high numbers of query answers, many of which are irrelevant for users. A ...
Read More
gTop: An Efficient SPARQL Query Engine
Web and Big Data
Abstract
In this demonstration, we present gTop, a top-k query engine based on gStore which supports SPARQL queries over RDF databases. gTop can answer top-k queries with high efficiency and scalability. We use the DP-B algorithm for top-k queries and the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Volume 40, Issue 4
October 2008
145 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/1391729
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2008
- Accepted: 1 November 2007
- Revised: 1 July 2007
- Received: 1 January 2007
Published in csur Volume 40, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Top-k
rank aggregation
rank-aware processing
voting
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 640
  Total Citations
  View Citations
- 6,063
  Total Downloads
- Downloads (Last 12 months)184
- Downloads (Last 6 weeks)25
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive rank-aware query optimization in relational databases

A Novel Top-k Query Scheme in Unstructured P2P Networks

gTop: An Efficient SPARQL Query Engine

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A survey of top-k query processing techniques in relational database systems

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Adaptive rank-aware query optimization in relational databases

A Novel Top-k Query Scheme in Unstructured P2P Networks

gTop: An Efficient SPARQL Query Engine

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media