ACM Home Page
Please provide us with feedback. Feedback
Accurate estimation of the number of tuples satisfying a condition
Full text PdfPdf (1.27 MB)
Source International Conference on Management of Data archive
Proceedings of the 1984 ACM SIGMOD international conference on Management of data table of contents
Boston, Massachusetts
SESSION: Accessing strategies table of contents
Pages: 256 - 276  
Year of Publication: 1984
ISBN:0-89791-128-8
Also published in ...
Authors
Gregory Piatetsky-Shapiro  New York University and Advanced Database Systems Division, Strategic Information, Burlington, Mass
Charles Connell  Boston University and Advanced Database Systems Division, Strategic Information
Sponsor
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 86,   Citation Count: 99
Additional Information:

abstract   references   cited by   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/602259.602294
What is a DOI?

ABSTRACT

We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", "≥", "≤". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
{Blasgen 77} Blasgen, M W., and Eswaran, K. P. Storage and access in relational databases IBM Syst. J 16(4), 1977.
 
2
3
 
4
{Demolombe 80} Demolombe, R. Estimation of the number of tuples satisying a query expressed in predicate calculus language In Proc of 6th Int. Conf. on Very Large Data Bases, pages 55--63. Montreal, Canada, 1980.
 
5
{Dixon 79} Dixon, W. J., and Massey, F. J. Introduction to statistical analysis McGraw-Hill Book Company, 1979.
6
 
7
8
 
9
{Samson 83} Samson, W. B. and Bendell, A. Rank order distributions and secondary key indexing, extended abstract. In Proc of 2nd Int Conf. on Databases. Cambridge, England, September, 1983.
10
11
12
13
14
 
15
{Zipf 49} Zipf, G. K. Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge, Mass., 1949.

CITED BY  99
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Collaborative Colleagues:
Gregory Piatetsky-Shapiro: colleagues
Charles Connell: colleagues

Peer to Peer - Readers of this Article have also read: