|
ABSTRACT
Histograms of the distributions of individual attributes are currently used in leading database management systems (e.g., IBM DB2, Oracle Database, and Microsoft SQL server). Because attribute pairs in databases are seldom independent, however, the use of the distributions of individual attributes with the attribute independence assumption often leads to poor estimates. More accurate answers can be obtained by using multi-dimensional histograms to characterize the joint distribution of two or more attributes. When moving from one-dimensional to two-dimensional histograms, several new issues relating to histogram structure arise: (1) Which attribute should take priority over the other with respect to data partitioning?; (2) Into how many partitions should each dimension be split to obtain a desired number of histogram buckets?; and (3) How many most frequent values should be isolated and stored in singleton buckets? In the context of real data, we experimentally show that our proposed methods for dealing with histogram structure choices lead to good quality histograms for a variety of histogram partitioning techniques and various types of data distributions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
 |
2
|
Swarup Acharya , Viswanath Poosala , Sridhar Ramaswamy, Selectivity estimation in spatial databases, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.13-24, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
| |
3
|
{3} D. Barbara, W. DuMouchel, C. Faloutsos, P.J. Haas, J.M. Hellerstein, Y. Ioannidis, H.V. Jagadish, T. Johnson, R. Ng, V. Poosala, K.A. Ross, K.C. Sevcik. The New Jersey data reduction report. Data Engineering Bulletin, vol. 20, no. 4, pages 3-45, 1997.
|
 |
4
|
Nicolas Bruno , Surajit Chaudhuri , Luis Gravano, STHoles: a multidimensional workload-aware histogram, Proceedings of the 2001 ACM SIGMOD international conference on Management of data, p.211-222, May 21-24, 2001, Santa Barbara, California, United States
|
| |
5
|
{5} Census database repository, U. S. Census Bureau. Available at http://www.census.gov.
|
| |
6
|
|
 |
7
|
Dimitrios Gunopulos , George Kollios , Vassilis J. Tsotras , Carlotta Domeniconi, Approximating multi-dimensional aggregate range queries over real attributes, Proceedings of the 2000 ACM SIGMOD international conference on Management of data, p.463-474, May 15-18, 2000, Dallas, Texas, United States
|
| |
8
|
|
 |
9
|
|
 |
10
|
|
| |
11
|
|
| |
12
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
| |
13
|
|
 |
14
|
Ju-Hong Lee , Deok-Hwan Kim , Chin-Wan Chung, Multi-dimensional selectivity estimation using compressed histogram information, Proceedings of the 1999 ACM SIGMOD international conference on Management of data, p.205-214, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
|
 |
15
|
Yossi Matias , Jeffrey Scott Vitter , Min Wang, Wavelet-based histograms for selectivity estimation, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.448-459, June 01-04, 1998, Seattle, Washington, United States
|
 |
16
|
|
| |
17
|
|
| |
18
|
{18} Hang T. A. Pham. Accurate two-dimensional histograms for fast approximate answers to queries on real data. M.S. thesis. University of Toronto, March 2004. Available at http://www.cs.toronto.edu/~hangp/thesis.html
|
 |
19
|
|
| |
20
|
{20} V. Poosala, V. Ganti, Y. Ioannidis. Approximate Query answering using histograms. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 22, no. 4, pages 5-14, 1999.
|
 |
21
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
22
|
|
| |
23
|
|
| |
24
|
{24} UCI KDD database repository. Available at http://kdd.ics.uci.edu.
|
| |
25
|
|
| |
26
|
|
|