Abstract
A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.
- Avnur, R. and Hellerstein, J. M. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD'00). ACM, New York.]] Google ScholarDigital Library
- Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/covtype.]]Google Scholar
- Bruno, N., Chaudhuri, S., and Gravano, L. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Datab. Syst. 27, 2.]] Google ScholarDigital Library
- Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over web-accessible databases. In Proceedings of the 2002 International Conference on Data Engineering (ICDE'02).]] Google ScholarDigital Library
- Carey, M. J. and Kossmann, D. 1997. On saying "Enough Already!" in SQL. In Proceedings of the 1997 ACM International Conference on Management of Data (SIGMOD'97) ACM, New York.]] Google ScholarDigital Library
- Carey, M. J. and Kossmann, D. 1998. Reducing the braking distance of an SQL query engine. In Proceedings of the 24th International Conference on Very Large Databases (VLDB'98).]] Google ScholarDigital Library
- Chang, K. C.-C. and Hwang, S.-W. 2002. Minimal probing: Supporting expensive predicates for top-k queries. In Proceedings of the 2002 ACM International Conference on Management of Data (SIGMOD'02). ACM, New York.]] Google ScholarDigital Library
- Chaudhuri, S. and Gravano, L. 1996. Optimizing queries over multimedia repositories. In Proceedings of the 1996 ACM International Conference on Management of Data (SIGMOD'96). ACM, New York.]] Google ScholarDigital Library
- Chaudhuri, S., Gravano, L., and Marian, A. 2004. Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. (TKDE).]] Google ScholarDigital Library
- Chen, C.-M. and Ling, Y. 2002. A sampling-based estimator for top-k query. In Proceedings of the 2002 International Conference on Data Engineering (ICDE'02).]] Google ScholarDigital Library
- Donjerkovic, D. and Ramakrishnan, R. 1999. Probabilistic optimization of top n queries. In Proceedings of the 25th International Conference on Very Large Databases (VLDB'99).]] Google ScholarDigital Library
- Fagin, R. 1996. Combining fuzzy information from multiple systems. In Proceedings of the 15th ACM Symposium on Principles of Database Systems (PODS'96). ACM, New York.]] Google ScholarDigital Library
- Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the 20th ACM Symposium on Principles of Database Systems (PODS'01). ACM, New York.]] Google ScholarDigital Library
- Fagin, R., Lotem, A., and Naor, M. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4.]] Google ScholarDigital Library
- Freedman, D., Pisani, R., and Purves, R. 1997. Statistics. W.W. Norton & Company; 3rd edition.]]Google Scholar
- Goldman, R. and Widom, J. 2000. WSQ/DSQ: A practical approach for combined querying of databases and the web. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD'00). ACM, New York.]] Google ScholarDigital Library
- Gravano, L., Marian, A., and Bruno, N. 2002. Evaluating top-k queries over web-accessible databases. Tech. Rep., Columbia Univ., New York.]]Google Scholar
- Güntzer, U., Balke, W.-T., and Kieβling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Databases (VLDB'00).]] Google ScholarDigital Library
- Hellerstein, J. M. and Stonebraker, M. 1993. Predicate migration: Optimizing queries with expensive predicates. In Proceedings of the 1993 ACM International Conference on Management of Data (SIGMOD'93). ACM, New York.]] Google ScholarDigital Library
- Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD'01). ACM, New York.]] Google ScholarDigital Library
- Kemper, A., Moerkotte, G., Peithner, K., and Steinbrunn, M. 1994. Optimizing disjunctive queries with expensive predicates. In Proceedings of the 1994 ACM International Conference on Management of Data (SIGMOD'94). ACM, New York.]] Google ScholarDigital Library
- Mills, D. 1983. Internet delay experiments; RFC 889. In ARPANET Working Group Requests for Comments. Number 889. SRI International, Menlo Park, Calif.]] Google Scholar
- Natsev, A., Chang, Y.-C., Smith, J. R., Li, C.-S., and Vitter, J. S. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International Conference on Very Large Databases (VLDB'01).]] Google ScholarDigital Library
- Nepal, S. and Ramakrishna, M. V. 1999. Query processing issues in image (multimedia) databases. In Proceedings of the 1999 International Conference on Data Engineering (ICDE'99).]] Google ScholarDigital Library
- Ortega, M., Rui, Y., Chakrabarti, K., Porkaew, K., Mehrotra, S., and Huang, T. S. 1998. Supporting ranked Boolean similarity queries in MARS. IEEE Trans. Know. Data Eng. (TKDE) 10, 6, 905--925.]] Google ScholarDigital Library
- Williams, S. A., Press, H., Flannery, B. P., and Vetterling, W. T. 1993. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.]] Google ScholarDigital Library
Index Terms
- Evaluating top-k queries over web-accessible databases
Recommendations
Top-k selection queries over relational databases: Mapping strategies and performance evaluation
In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute ...
Efficient Top-k Query Answering through its Top-N Rewritings Using Views
PIKM '15: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge ManagementRecently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the ...
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database TheoryThe problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Comments