article

Evaluating top-k queries over web-accessible databases

Authors:
Amélie Marian

Columbia University, New York, NY

Columbia University, New York, NY
View Profile

,
Nicolas Bruno

Microsoft Research, Redmond, Washington, WA

Microsoft Research, Redmond, Washington, WA
View Profile

,
Luis Gravano

Columbia University, New York, NY

Columbia University, New York, NY
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 29 Issue 2pp 319–362https://doi.org/10.1145/1005566.1005569

Published:01 June 2004Publication History

ACM Transactions on Database Systems

Abstract

A query to a web search engine usually consists of a list of keywords, to which the search engine responds with the best or "top" k pages for the query. This top-k query model is prevalent over multimedia collections in general, but also over plain relational data for certain applications. For example, consider a relation with information on available restaurants, including their location, price range for one diner, and overall food rating. A user who queries such a relation might simply specify the user's location and target price range, and expect in return the best 10 restaurants in terms of some combination of proximity to the user, closeness of match to the target price range, and overall food rating. Processing top-k queries efficiently is challenging for a number of reasons. One critical such reason is that, in many web applications, the relation attributes might not be available other than through external web-accessible form interfaces, which we will have to query repeatedly for a potentially large set of candidate objects. In this article, we study how to process top-k queries efficiently in this setting, where the attributes for which users specify target values might be handled by external, autonomous sources with a variety of access interfaces. We present a sequential algorithm for processing such queries, but observe that any sequential top-k query processing strategy is bound to require unnecessarily long query processing times, since web accesses exhibit high and variable latency. Fortunately, web sources can be probed in parallel, and each source can typically process concurrent requests, although sources may impose some restrictions on the type and number of probes that they are willing to accept. We adapt our sequential query processing technique and introduce an efficient algorithm that maximizes source-access parallelism to minimize query response time, while satisfying source-access constraints. We evaluate our techniques experimentally using both synthetic and real web-accessible data and show that parallel algorithms can be significantly more efficient than their sequential counterparts.

References

Avnur, R. and Hellerstein, J. M. 2000. Eddies: Continuously adaptive query processing. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD'00). ACM, New York.]] Google ScholarDigital Library
Blake, C. and Merz, C. 1998. UCI repository of machine learning databases. ftp://ftp.ics.uci.edu/pub/machine-learning-databases/covtype.]]Google Scholar
Bruno, N., Chaudhuri, S., and Gravano, L. 2002a. Top-k selection queries over relational databases: Mapping strategies and performance evaluation. ACM Trans. Datab. Syst. 27, 2.]] Google ScholarDigital Library
Bruno, N., Gravano, L., and Marian, A. 2002b. Evaluating top-k queries over web-accessible databases. In Proceedings of the 2002 International Conference on Data Engineering (ICDE'02).]] Google ScholarDigital Library
Carey, M. J. and Kossmann, D. 1997. On saying "Enough Already!" in SQL. In Proceedings of the 1997 ACM International Conference on Management of Data (SIGMOD'97) ACM, New York.]] Google ScholarDigital Library
Carey, M. J. and Kossmann, D. 1998. Reducing the braking distance of an SQL query engine. In Proceedings of the 24th International Conference on Very Large Databases (VLDB'98).]] Google ScholarDigital Library
Chang, K. C.-C. and Hwang, S.-W. 2002. Minimal probing: Supporting expensive predicates for top-k queries. In Proceedings of the 2002 ACM International Conference on Management of Data (SIGMOD'02). ACM, New York.]] Google ScholarDigital Library
Chaudhuri, S. and Gravano, L. 1996. Optimizing queries over multimedia repositories. In Proceedings of the 1996 ACM International Conference on Management of Data (SIGMOD'96). ACM, New York.]] Google ScholarDigital Library
Chaudhuri, S., Gravano, L., and Marian, A. 2004. Optimizing top-k selection queries over multimedia repositories. IEEE Trans. Knowl. Data Eng. (TKDE).]] Google ScholarDigital Library
Chen, C.-M. and Ling, Y. 2002. A sampling-based estimator for top-k query. In Proceedings of the 2002 International Conference on Data Engineering (ICDE'02).]] Google ScholarDigital Library
Donjerkovic, D. and Ramakrishnan, R. 1999. Probabilistic optimization of top n queries. In Proceedings of the 25th International Conference on Very Large Databases (VLDB'99).]] Google ScholarDigital Library
Fagin, R. 1996. Combining fuzzy information from multiple systems. In Proceedings of the 15th ACM Symposium on Principles of Database Systems (PODS'96). ACM, New York.]] Google ScholarDigital Library
Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the 20th ACM Symposium on Principles of Database Systems (PODS'01). ACM, New York.]] Google ScholarDigital Library
Fagin, R., Lotem, A., and Naor, M. 2003. Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66, 4.]] Google ScholarDigital Library
Freedman, D., Pisani, R., and Purves, R. 1997. Statistics. W.W. Norton & Company; 3rd edition.]]Google Scholar
Goldman, R. and Widom, J. 2000. WSQ/DSQ: A practical approach for combined querying of databases and the web. In Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD'00). ACM, New York.]] Google ScholarDigital Library
Gravano, L., Marian, A., and Bruno, N. 2002. Evaluating top-k queries over web-accessible databases. Tech. Rep., Columbia Univ., New York.]]Google Scholar
Güntzer, U., Balke, W.-T., and Kieβling, W. 2000. Optimizing multi-feature queries for image databases. In Proceedings of the 26th International Conference on Very Large Databases (VLDB'00).]] Google ScholarDigital Library
Hellerstein, J. M. and Stonebraker, M. 1993. Predicate migration: Optimizing queries with expensive predicates. In Proceedings of the 1993 ACM International Conference on Management of Data (SIGMOD'93). ACM, New York.]] Google ScholarDigital Library
Hristidis, V., Koudas, N., and Papakonstantinou, Y. 2001. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proceedings of the 2001 ACM International Conference on Management of Data (SIGMOD'01). ACM, New York.]] Google ScholarDigital Library
Kemper, A., Moerkotte, G., Peithner, K., and Steinbrunn, M. 1994. Optimizing disjunctive queries with expensive predicates. In Proceedings of the 1994 ACM International Conference on Management of Data (SIGMOD'94). ACM, New York.]] Google ScholarDigital Library
Mills, D. 1983. Internet delay experiments; RFC 889. In ARPANET Working Group Requests for Comments. Number 889. SRI International, Menlo Park, Calif.]] Google Scholar
Natsev, A., Chang, Y.-C., Smith, J. R., Li, C.-S., and Vitter, J. S. 2001. Supporting incremental join queries on ranked inputs. In Proceedings of the 27th International Conference on Very Large Databases (VLDB'01).]] Google ScholarDigital Library
Nepal, S. and Ramakrishna, M. V. 1999. Query processing issues in image (multimedia) databases. In Proceedings of the 1999 International Conference on Data Engineering (ICDE'99).]] Google ScholarDigital Library
Ortega, M., Rui, Y., Chakrabarti, K., Porkaew, K., Mehrotra, S., and Huang, T. S. 1998. Supporting ranked Boolean similarity queries in MARS. IEEE Trans. Know. Data Eng. (TKDE) 10, 6, 905--925.]] Google ScholarDigital Library
Williams, S. A., Press, H., Flannery, B. P., and Vetterling, W. T. 1993. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.]] Google ScholarDigital Library

Index Terms

Evaluating top-k queries over web-accessible databases
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

Top-k selection queries over relational databases: Mapping strategies and performance evaluation

In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the "top k" tuples that best match the given attribute ...
Read More
Efficient Top-k Query Answering through its Top-N Rewritings Using Views
PIKM '15: Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management

Recently, various algorithms were proposed to speed up top-k query answering by using multiple materialized query results. Nevertheless, for most of the proposed algorithms, a potentially costly view selection operation is required. In fact, the ...
Read More
Equivalence and minimization of conjunctive queries under combined semantics
ICDT '12: Proceedings of the 15th International Conference on Database Theory

The problems of query containment, equivalence, and minimization are fundamental problems in the context of query processing and optimization. In their classic work [2] published in 1977, Chandra and Merlin solved the three problems for the language of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 29, Issue 2
June 2004
211 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1005566
Issue’s Table of Contents

Copyright © 2004 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2004
Published in tods Volume 29, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Parallel query processing
query optimization
top-k query processing
web databases.
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 219
  Total Citations
  View Citations
- 1,507
  Total Downloads
- Downloads (Last 12 months)15
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Top-k selection queries over relational databases: Mapping strategies and performance evaluation

Efficient Top-k Query Answering through its Top-N Rewritings Using Views

Equivalence and minimization of conjunctive queries under combined semantics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Evaluating top-k queries over web-accessible databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Top-k selection queries over relational databases: Mapping strategies and performance evaluation

Efficient Top-k Query Answering through its Top-N Rewritings Using Views

Equivalence and minimization of conjunctive queries under combined semantics

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media