Abstract
Efficient processing of top-k queries is a crucial requirement in many interactive environments that involve massive amounts of data. In particular, efficient top-k processing in domains such as the Web, multimedia search, and distributed systems has shown a great impact on performance. In this survey, we describe and classify top-k processing techniques in relational databases. We discuss different design dimensions in the current techniques including query models, data access methods, implementation levels, data and query certainty, and supported scoring functions. We show the implications of each dimension on the design of the underlying techniques. We also discuss top-k queries in XML domain, and show their connections to relational approaches.
- Acharya, S., Gibbons, P. B., and Poosala, V. 2000. Congressional samples for approximate answering of group-by queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 487--498. Google ScholarDigital Library
- Amato, G., Rabitti, F., Savino, P., and Zezula, P. 2003. Region proximity in metric spaces and its use for approximate similarity search. ACM Trans. Inform. Syst. 21, 2, 192--227. Google ScholarDigital Library
- Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005. Structure and content scoring for xml. In Proceedings of the 31st International Conference on Very Large Data Bases. 361--372. Google ScholarDigital Library
- Aref, W. G., Catlin, A. C., Elmagarmid, A. K., Fan, J., Hammad, M. A., Ilyas, I. F., Marzouk, M., Prabhakar, S., and Zhu, X. 2004. VDBMS: A testbed facility for research in video database benchmarking. ACM Multimed. Syst. J. (Special Issue on Multimedia Document Management Systems) 9, 6, 575--585.Google ScholarDigital Library
- Arrow, K. 1951. Social Choice and Individual Values. Wiley, New York, NY.Google Scholar
- Babcock, B., Chaudhuri, S., and Das, G. 2003. Dynamic sample selection for approximate query processing. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 539--550. Google ScholarDigital Library
- Barbará, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4, 5, 487--502. Google ScholarDigital Library
- Black, D. 1958. The Theory of Committees and Elections. Cambridge University Press, London, U.K.Google Scholar
- Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. 421. Google ScholarDigital Library
- Brams, S. J. and Fishburn, P. C. 1983. Approval Voting. Birkhauser, Boston, MA.Google Scholar
- Bruno, N., Chaudhuri, S., and Gravano, L. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Database Syst. 27, 2, 153--187. Google ScholarDigital Library
- Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over Web-accessible databases. In Proceedings of the 18th International Conference on Data Engineering. 369. Google ScholarDigital Library
- Chakrabarti, K., Garofalakis, M., Rastogi, R., and Shim, K. 2001. Approximate query processing using wavelets. VLDB J. 10, 2--3, 199--223. Google ScholarDigital Library
- Chang, K. C. and Hwang, S. 2002. Minimal probing: supporting expensive predicates for top-k queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 346--357. Google ScholarDigital Library
- Chang, Y., Bergman, L. D., Castelli, V., Li, C., Lo, M., and Smith, J. R. 2000. The Onion Technique: Indexing for linear optimization queries. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. 391--402. Google ScholarDigital Library
- Chaudhuri, S., Das, G., Datar, M., Motwani, R., and Narasayya, V. R. 2001a. Overcoming limitations of sampling for aggregation queries. In Proceedings of the 17th International Conference on Data Engineering. 534--542. Google ScholarDigital Library
- Chaudhuri, S., Das, G., and Narasayya, V. 2001b. A robust, optimization-based approach for approximate answering of aggregate queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 295--306. Google ScholarDigital Library
- Chaudhuri, S., Motwani, R., and Narasayya, V. 1998. Random sampling for histogram construction: How much is enough? In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. 436--447. Google ScholarDigital Library
- Copeland, A. H. 1951. A reasonable social welfare function. Mimeo. University of Michigan, Ann Arbor, MI.Google Scholar
- Cranor, L. 1996. Declared-strategy voting: An instrument for group decision-making. Ph.D. dissertation. Washington University, St. Louis, MO. Google ScholarDigital Library
- Das, G., Gunopulos, D., Koudas, N., and Tsirogiannis, D. 2006. Answering top-k queries using views. In Proceedings of the 32nd International Conference on Very Large Data Bases. 451--462. Google ScholarDigital Library
- Diaconis, P. 1998. Group Representation in Probability and Statistics. Institute of Mathematical Statistics. Web site: www.imstat.org.Google Scholar
- Diaconis, P. and Graham, R. 1977. Spearman's footrule as a measure of disarray. J. Roy. Statist. Soc. Series B 39, 2, 262--268.Google ScholarCross Ref
- Donjerkovic, D. and Ramakrishnan, R. 1999. Probabilistic optimization of top N queries. In Proceedings of the 25th International Conference on Very Large Data Bases. 411--422. Google ScholarDigital Library
- Dwork, C., Kumar, S. R., Naor, M., and Sivakumar, D. 2001. Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on World Wide Web. 613--622. Google ScholarDigital Library
- Fagin, R., Kumar, R., Mahdian, M., Sivakumar, D., and Vee, E. 2004. Comparing and aggregating rankings with ties. In Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. 47--58. Google ScholarDigital Library
- Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms. 28--36. Google ScholarDigital Library
- Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 1, 1, 614--656. Google ScholarDigital Library
- Fuhr, N. 1990. A probabilistic framework for vague queries and imprecise information in databases. In Proceedings of the 16th International Conference on Very Large Data Bases. 696--707. Google ScholarDigital Library
- Ganti, V., Lee, M., and Ramakrishnan, R. 2000. ICICLES: Self-tuning samples for approximate query answering. In Proceedings of the 26th International Conference on Very Large Data Bases. 176--187. Google ScholarDigital Library
- Getoor, L. and Diehl, C. P. 2005. Link mining: A survey. ACM SIGKDD Explor. Newsl. 7, 2, 3--12. Google ScholarDigital Library
- Güntzer, U., Balke, W., and Kießling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Data Bases. 419--428. Google ScholarDigital Library
- Güntzer, U., Balke, W., and Kießling, W. 2001. Towards efficient multi-feature queries in heterogeneous environments. In Proceedings of the International Conference on Information Technology: Coding and Computing. 622. Google ScholarDigital Library
- Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data. 16--27. Google ScholarDigital Library
- Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. 171--182. Google ScholarDigital Library
- Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. American Statistical Association Journal, 13--30.Google Scholar
- Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. 259--270. Google ScholarDigital Library
- Hristidis, V. and Papakonstantinou, Y. 2004. Algorithms and applications for answering ranked queries using ranked views. VLDB J. 13, 1, 49--70. Google ScholarDigital Library
- Hwang, S. and Chang, K. C. 2007a. Optimizing top-k queries for middleware access: A unified cost-based approach. ACM Trans. Database Syst. 32, 1, 5. Google ScholarDigital Library
- Hwang, S. and Chang, K. C. 2007b. Probe minimization by schedule optimization: Supporting top-k queries with expensive predicates. IEEE Trans. Knowl. Data Eng. 19, 5, 646--662. Google ScholarDigital Library
- Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2002. Joining ranked inputs in practice. In Proceedings of the 28th International Conference on Very Large Data Bases. 950--961. Google ScholarDigital Library
- Ilyas, I. F., Aref, W. G., and Elmagarmid, A. K. 2004a. Supporting top-k join queries in relational databases. VLDB J. 13, 3, 207--221. Google ScholarDigital Library
- Ilyas, I. F., Aref, W. G., Elmagarmid, A. K., Elmongui, H. G., Shah, R., and Vitter, J. S. 2006. Adaptive rank-aware query optimization in relational databases. ACM Trans. Database Syst. 31, 4, 1257--1304. Google ScholarDigital Library
- Ilyas, I. F., Shah, R., Aref, W. G., Vitter, J. S., and Elmagarmid, A. K. 2004b. Rank-aware query optimization. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data. 203--214. Google ScholarDigital Library
- Ilyas, F. I., Walid, G. A., and Elmagarmid, A. K. 2003. Supporting top-k join queries in relational databases. In Proceedings of the 29th International Conference on Very Large Databases. 754--765. Google ScholarDigital Library
- Imielinski, T. and Lipski Jr., W. 1984. Incomplete information in relational databases. J. ACM 31, 4, 761--791. Google ScholarDigital Library
- Kendall, M. G. 1945. The treatment of ties in ranking problems. Biometrika 33, 3, 239--251.Google ScholarCross Ref
- Klamler, C. 2003. A comparison of the dodgson method and the copeland rule. Econ. Bull. 4, 8, 1--7.Google Scholar
- Li, C., Chang, K. C., and Ilyas, I. F. 2006. Supporting ad-hoc ranking aggregates. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 61--72. Google ScholarDigital Library
- Li, C., Chang, K. C., Ilyas, I. F., and Song, S. 2005. RankSQL: Query algebra and optimization for relational top-k queries. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data. 131--142. Google ScholarDigital Library
- Mamoulis, N., Cheng, K. H., Yiu, M. L., and Cheung, D. W. 2006. Efficient aggregation of ranked inputs. In Proceedings of the 22rd International Conference on Data Engineering. 72. Google ScholarDigital Library
- Marian, A., Bruno, N., and Gravano, L. 2004. Evaluating top-k queries over Web-accessible databases. ACM Trans. Database Syst. 29, 2, 319--362. Google ScholarDigital Library
- Michel, S., Triantafillou, P., and Weikum, G. 2005. KLEE: A framework for distributed top-k query algorithms. In Proceedings of the 31st International Conference on Very Large Data Bases. 637--648. Google ScholarDigital Library
- Natsev, A., Chang, Y., Smith, J. R., Li, C., and Vitter, J. S. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International Conference on Very Large Data Bases. 281--290. Google ScholarDigital Library
- Nurmi, H. 1987. Comparing Voting Systems. D. Reidel Publishing Company, Dordrecht, Germany.Google Scholar
- Ré, C., Dalvi, N. N., and Suciu, D. 2007. Efficient top-k query evaluation on probabilistic data. In Proceedings of the 23rd International Conference on Data Engineering. 886--895.Google Scholar
- Robertson, S. E. and Walker, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 232--241. Google ScholarDigital Library
- Salton, G. and McGill, M. J. 1983. Introduction to Modern IR. McGraw-Hill, New York, NY. Google ScholarDigital Library
- Silberstein, A., Braynard, R., Ellis, C. S., Munagala, K., and Yang, J. 2006. A sampling-based approach to optimizing top-k queries in sensor networks. In Proceedings of the 22nd International Conference on Data Engineering. 68. Google ScholarDigital Library
- Soliman, M. A., Ilyas, I. F., and Chang, K. C.-C. 2007. Top-k query processing in uncertain databases. In Proceedings of the 23rd International Conference on Data Engineering. 896--905.Google Scholar
- Theobald, M., Schenkel, R., and Weikum, G. 2005. An efficient and versatile query engine for TopX search. In Proceedings of the 31st International Conference on Very Large Data Bases. 625--636. Google ScholarDigital Library
- Theobald, M., Weikum, G., and Schenkel, R. 2004. Top-k query evaluation with probabilistic guarantees. In Proceedings of the 30th International Conference on Very Large Data Bases. 648--659. Google ScholarDigital Library
- Truchon, M. 1998. An extension of the Condorcet criterion and Kemeny orders. Cahier 98-15. Centre de Recherche en Economie et Finance Appliquees, Université Laval, Québec, Canada.Google Scholar
- Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., and Srivastava, D. 2003. Ranked join indices. In Proceedings of the 19th International Conference on Data Engineering. 277.Google Scholar
- Vrbsky, S. V. and Liu, J. W. S. 1993. APPROXIMATE—a query processor that produces monotonically improving approximate answers. IEEE Trans. Knowl. Data Eng. 5, 6, 1056--1068. Google ScholarDigital Library
- Xin, D., Chen, C., and Han, J. 2006. Towards robust indexing for ranked queries. In Proceedings of the 32nd International Conference on Very Large Data Bases. 235--246. Google ScholarDigital Library
- Xin, D., Han, J., and Chang, K. C. 2007. Progressive and selective merge: computing top-k with ad-hoc ranking functions. In Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. 103--114. Google ScholarDigital Library
- Young, H. P. and Levenglick, A. 1978. A consistent extension of condorcet's election principle. SIAM J. Appl. Math. 35, 2, 285--300.Google ScholarDigital Library
- Yuan, Y., Lin, X., Liu, Q., Wang, W., Yu, J. X., and Zhang, Q. 2005. Efficient computation of the skyline cube. In Proceedings of the 31st International Conference on Very Large Data Bases. 241--252. Google ScholarDigital Library
- Zhang, Z., Hwang, S., Chang, K. C., Wang, M., Lang, C. A., and Chang, Y. 2006. Boolean + ranking: Querying a database by k-constrained optimization. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 359--370. Google ScholarDigital Library
Index Terms
- A survey of top-k query processing techniques in relational database systems
Recommendations
Adaptive rank-aware query optimization in relational databases
Rank-aware query processing has emerged as a key requirement in modern applications. In these applications, efficient and adaptive evaluation of top-k queries is an integral part of the application semantics. In this article, we introduce a rank-aware ...
A Novel Top-k Query Scheme in Unstructured P2P Networks
CIT '09: Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02There're two major problems in the unstructured p2p systems, one is their heavy network traffic; the other is the problem of query effectiveness, which is caused mainly by high numbers of query answers, many of which are irrelevant for users. A ...
gTop: An Efficient SPARQL Query Engine
Web and Big DataAbstractIn this demonstration, we present gTop, a top-k query engine based on gStore which supports SPARQL queries over RDF databases. gTop can answer top-k queries with high efficiency and scalability. We use the DP-B algorithm for top-k queries and the ...
Comments