|
ABSTRACT
The database query optimizer requires the estimation of the query selectivity to find the most efficient access plan. For queries referencing multiple attributes from the same relation, we need a multi-dimensional selectivity estimation technique when the attributes are dependent each other because the selectivity is determined by the joint data distribution of the attributes. Additionally, for multimedia databases, there are intrinsic requirements for the multi-dimensional selectivity estimation because feature vectors are stored in multi-dimensional indexing trees. In the 1-dimensional case, a histogram is practically the most preferable. In the multi-dimensional case, however, a histogram is not adequate because of high storage overhead and high error rates.
In this paper, we propose a novel approach for the multi-dimensional selectivity estimation. Compressed information from a large number of small-sized histogram buckets is maintained using the discrete cosine transform. This enables low error rates and low storage overheads even in high dimensions. In addition, this approach has the advantage of supporting dynamic data updates by eliminating the overhead for periodical reconstructions of the compressed information. Extensive experimental results show advantages of the proposed approach.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
AFS93
|
|
 |
BBK98
|
Stefan Berchtold , Christian Böhm , Hans-Peter Kriegal, The pyramid-technique: towards breaking the curse of dimensionality, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.142-153, June 01-04, 1998, Seattle, Washington, United States
|
| |
BKK96
|
|
| |
BF95
|
|
 |
CR94
|
|
 |
CSZS97
|
Wendy Chang , Gholamhosein Sheikholeslami , Aidong Zhang , Tanveer F. Syeda-Mahmood, Efficient resource selection in distributed visual information systems, Proceedings of the fifth ACM international conference on Multimedia, p.203-213, November 09-13, 1997, Seattle, Washington, United States
[doi> 10.1145/266180.266367]
|
 |
CG96
|
|
| |
Chri83
|
S. Christodoulakis. Estimating record selectivities. Information Systems Journal, 8(2), pp 105-115, 1983.
|
| |
EKSWX98
|
|
| |
EKSX96
|
M. Ester, H. Kriegel, J. Sander, X. Xu. A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proc. 2nd Int. Conf. on Knowledge Discovering and Data Mining, 1996.
|
 |
Fa96
|
|
 |
Fa98
|
|
 |
GRS98
|
Sudipto Guha , Rajeev Rastogi , Kyuseok Shim, CURE: an efficient clustering algorithm for large databases, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.73-84, June 01-04, 1998, Seattle, Washington, United States
|
| |
HNSS95
|
|
| |
Io93
|
|
 |
IP95
|
|
| |
JKMPSS98
|
H. V. Jagadish , Nick Koudas , S. Muthukrishnan , Viswanath Poosala , Kenneth C. Sevcik , Torsten Suel, Optimal Histograms with Quality Guarantees, Proceedings of the 24rd International Conference on Very Large Data Bases, p.275-286, August 24-27, 1998
|
| |
Lim90
|
|
 |
MCS98
|
|
| |
NH94
|
|
 |
PSTW93
|
Bernd-Uwe Pagel , Hans-Werner Six , Heinrich Toben , Peter Widmayer, Towards an analysis of range query performance in spatial data structures, Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, p.214-221, May 25-28, 1993, Washington, D.C., United States
[doi> 10.1145/153850.153878]
|
 |
PIHS96
|
Viswanath Poosala , Peter J. Haas , Yannis E. Ioannidis , Eugene J. Shekita, Improved histograms for selectivity estimation of range predicates, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.294-305, June 04-06, 1996, Montreal, Quebec, Canada
|
| |
PI97
|
|
| |
RY90
|
|
 |
SCZ98
|
|
 |
SLRD93
|
Wei Sun , Yibei Ling , Naphtali Rishe , Yi Deng, An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment, Proceedings of the 1993 ACM SIGMOD international conference on Management of data, p.79-88, May 25-28, 1993, Washington, D.C., United States
|
| |
WKW94
|
|
 |
ZRL96
|
Tian Zhang , Raghu Ramakrishnan , Miron Livny, BIRCH: an efficient data clustering method for very large databases, Proceedings of the 1996 ACM SIGMOD international conference on Management of data, p.103-114, June 04-06, 1996, Montreal, Quebec, Canada
|
CITED BY 40
|
|
|
|
|
|
|
|
|
|
|
|
|
Frank Ramsak , Volker Markl , Robert Fenk , Martin Zirkel , Klaus Elhardt , Rudolf Bayer, Integrating the UB-Tree into a Database System Kernel, Proceedings of the 26th International Conference on Very Large Data Bases, p.263-272, September 10-14, 2000
|
|
|
|
|
|
Anna C. Gilbert , Sudipto Guha , Piotr Indyk , Yannis Kotidis , S. Muthukrishnan , Martin J. Strauss, Fast, small-space algorithms for approximate histogram maintenance, Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, May 19-21, 2002, Montreal, Quebec, Canada
|
|
|
|
|
|
Kanda Runapongsa , Thomas P. Nadeau , Toby J. Teorey, Storage estimation for multidimensional aggregates in OLAP, Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research, p.10, November 08-11, 1999, Mississauga, Ontario, Canada
|
|
|
|
|
|
|
|
|
|
|
|
Michael Böhlen , Linas Bukauskas , Poul Svante Eriksen , Steffen Lilholt Lauritzen , Arturas Mažeika , Peter Musaeus , Peer Mylov, 3D visual data mining: goals and experiences, Computational Statistics & Data Analysis, v.43 n.4, p.445-469, 28 August 2003
|
|
|
|
|
|
|
|
|
|
Feng Yan , Wen-Chi Hou , Zhewei Jiang , Cheng Luo , Qiang Zhu, Selectivity estimation of range queries based on data density approximation via cosine series, Data & Knowledge Engineering, v.63 n.3, p.855-878, December, 2007
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|