|
ABSTRACT
We present a new method for estimating the number of tuples satisfying a condition of the type attribute rel constant, where rel is one of "=", ">", "<", "≥", "≤". Our method gives highly accurate, yet easy to compute, estimates. We store information about attribute values as a list of distribution steps (histograms where buckets, instead of having equal width, have equal height). These distribution steps provide an upper bound on the error when estimating the number of tuples satisfying a condition. The estimation error can be arbitrarily reduced by increasing the number of steps. We analyze desirable conditions that such estimates should satisfy. Based on the distribution steps, we derive a set of estimation formulas which minimize the worst-case error. We also present another set of formulas which reduce the average-case error. Finally, we show how to use sampling to compute a close approximation of the distribution steps very quickly. The major applications of our method are in query optimization and in answering statistical queries.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
{Blasgen 77} Blasgen, M W., and Eswaran, K. P. Storage and access in relational databases IBM Syst. J 16(4), 1977.
|
| |
2
|
|
 |
3
|
|
| |
4
|
{Demolombe 80} Demolombe, R. Estimation of the number of tuples satisying a query expressed in predicate calculus language In Proc of 6th Int. Conf. on Very Large Data Bases, pages 55--63. Montreal, Canada, 1980.
|
| |
5
|
{Dixon 79} Dixon, W. J., and Massey, F. J. Introduction to statistical analysis McGraw-Hill Book Company, 1979.
|
 |
6
|
|
| |
7
|
|
 |
8
|
|
| |
9
|
{Samson 83} Samson, W. B. and Bendell, A. Rank order distributions and secondary key indexing, extended abstract. In Proc of 2nd Int Conf. on Databases. Cambridge, England, September, 1983.
|
 |
10
|
P. Griffiths Selinger , M. M. Astrahan , D. D. Chamberlin , R. A. Lorie , T. G. Price, Access path selection in a relational database management system, Proceedings of the 1979 ACM SIGMOD international conference on Management of data, May 30-June 01, 1979, Boston, Massachusetts
[doi> 10.1145/582095.582099]
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
{Zipf 49} Zipf, G. K. Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge, Mass., 1949.
|
CITED BY 99
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kazuhiro Satoh , Masashi Tsuchida , Fumio Nakamura , Kazuhiko Oomachi, Local and global query optimization mechanisms for relational databases, Proceedings of the 11th international conference on Very Large Data Bases, p.405-417, August 21-23, 1985, Stockholm, Sweden
|
|
|
|
|
|
|
|
|
|
|
|
Markus Stocker , Andy Seaborne , Abraham Bernstein , Christoph Kiefer , Dave Reynolds, SPARQL basic graph pattern optimization using selectivity estimation, Proceeding of the 17th international conference on World Wide Web, April 21-25, 2008, Beijing, China
|
|
|
|
|
|
|
|
|
Anna C. Gilbert , Yannis Kotidis , S. Muthukrishnan , Marin J. Strauss, Optimal and approximate computation of summary statistics for range aggregates, Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.227-236, May 2001, Santa Barbara, California, United States
|
|
|
|
|
Dimuthu Makawita , Kian-Lee Tan , Huan Liu, Sampling from databases using B+-trees, Proceedings of the ninth international conference on Information and knowledge management, p.158-164, November 06-11, 2000, McLean, Virginia, United States
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Juliana Freire , Jayant R. Haritsa , Maya Ramanath , Prasan Roy , Jérôme Siméon, StatiX: making XML count, Proceedings of the 2002 ACM SIGMOD international conference on Management of data, June 03-06, 2002, Madison, Wisconsin
|
|
Masaru Kitsuregawa , Miyuki Nakano , Lilian Harada , Mikio Takagi, Performance evaluation of functional disk system with nonuniform data distribution, Proceedings of the second international symposium on Databases in parallel and distributed systems, p.80-89, July 02-04, 1990, Dublin, Ireland
|
|
|
|
|
Qiang Zhu , Brian Dunkel , Nandit Soparkar , Suyun Chen , Berni Schiefer , Tony Lai, A piggyback method to collect statistics for query optimization in database management systems, Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research, p.25, November 30-December 03, 1998, Toronto, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Brian Dunkel , Qiang Zhu , Wing Lau , Suyun Chen, Multiple-granularity interleaving for piggyback query processing, Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research, p.2, November 08-11, 1999, Mississauga, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Nisheeth Shrivastava , Chiranjeeb Buragohain , Divyakant Agrawal , Subhash Suri, Medians and beyond: new aggregation techniques for sensor networks, Proceedings of the 2nd international conference on Embedded networked sensor systems, November 03-05, 2004, Baltimore, MD, USA
|
|
|
|
|
|
|
|
|
Zina Ben Miled , Jin Liu , Omran Bukhres , Huian Li , Jesse Martin , Chavali Balagopalakrishna , Robert Oppelt, Use and Maintenance of Histograms for Large Scientific Database Access Planning: A Case Study of a Pharmaceutical Data Repository, Journal of Intelligent Information Systems, v.23 n.2, p.145-178, September 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
V. Markl , P. J. Haas , M. Kutsch , N. Megiddo , U. Srivastava , T. M. Tran, Consistent selectivity estimation via maximum entropy, The VLDB Journal — The International Journal on Very Large Data Bases, v.16 n.1, p.55-76, January 2007
|
|
Weifeng Su , Jiying Wang , Qiong Huang , Fred Lochovsky, Query result ranking over e-commerce web databases, Proceedings of the 15th ACM international conference on Information and knowledge management, November 06-11, 2006, Arlington, Virginia, USA
|
|
|
V. Markl , N. Megiddo , M. Kutsch , T. M. Tran , P. Haas , U. Srivastava, Consistently estimating the selectivity of conjuncts of predicates, Proceedings of the 31st international conference on Very large data bases, August 30-September 02, 2005, Trondheim, Norway
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|