skip to main content
article

Wavelet synopses for general error metrics

Published: 01 December 2005 Publication History

Abstract

Several studies have demonstrated the effectiveness of the wavelet decomposition as a tool for reducing large amounts of data down to compact wavelet synopses that can be used to obtain fast, accurate approximate query answers. Conventional wavelet synopses that greedily minimize the overall root-mean-squared (i.e., L2-norm) error in the data approximation can suffer from important problems, including severe bias and wide variance in the quality of the data reconstruction, and lack of nontrivial guarantees for individual approximate answers. Thus, probabilistic thresholding schemes have been recently proposed as a means of building wavelet synopses that try to probabilistically control maximum approximation-error metrics (e.g., maximum relative error).A key open problem is whether it is possible to design efficient deterministic wavelet-thresholding algorithms for minimizing general, non-L2 error metrics that are relevant to approximate query processing systems, such as maximum relative or maximum absolute error. Obviously, such algorithms can guarantee better maximum-error wavelet synopses and avoid the pitfalls of probabilistic techniques (e.g., “bad” coin-flip sequences) leading to poor solutions; in addition, they can be used to directly optimize the synopsis construction process for other useful error metrics, such as the mean relative error in data-value reconstruction. In this article, we propose novel, computationally efficient schemes for deterministic wavelet thresholding with the objective of optimizing general approximation-error metrics. We first consider the problem of constructing wavelet synopses optimized for maximum error, and introduce an optimal low polynomial-time algorithm for one-dimensional wavelet thresholding---our algorithm is based on a new Dynamic-Programming (DP) formulation, and can be employed to minimize the maximum relative or absolute error in the data reconstruction. Unfortunately, directly extending our one-dimensional DP algorithm to multidimensional wavelets results in a super-exponential increase in time complexity with the data dimensionality. Thus, we also introduce novel, polynomial-time approximation schemes (with tunable approximation guarantees) for deterministic wavelet thresholding in multiple dimensions. We then demonstrate how our optimal and approximate thresholding algorithms for maximum error can be extended to handle a broad, natural class of distributive error metrics, which includes several important error measures, such as mean weighted relative error and weighted Lp-norm error. Experimental results on real-world and synthetic data sets evaluate our novel optimization algorithms, and demonstrate their effectiveness against earlier wavelet-thresholding schemes.

References

[1]
Acharya, S., Gibbons, P. B., Poosala, V., and Ramaswamy, S. 1999. Join synopses for approximate query answering. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, PA). ACM, New York, 275--286.]]
[2]
Amsaleg, L., Bonnet, P., Franklin, M. J., Tomasic, A., and Urhan, T. 1997. Improving responsiveness for wide-area data access. IEEE Data Eng. Bull. 20, 3 (Sept.), 3--11 (Special Issue on Improving Query Responsiveness).]]
[3]
Cai, Y. and Ng, R. 2004. Indexing spatio-temporal trajectories with Chebyshev polynomials. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (Paris, France). ACM, New York.]]
[4]
Chakrabarti, K., Garofalakis, M., Rastogi, R., and Shim, K. 2000. Approximate query processing using wavelets. In Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt), 111--122.]]
[5]
Chakrabarti, K., Garofalakis, M. N., Rastogi, R., and Shim, K. 2001. Approximate query processing using wavelets. The VLDB Journal 10, 2-3 (Sept.), 199--223 (“Best of VLDB'2000” Special Issue).]]
[6]
Deligiannakis, A. and Roussopoulos, N. 2003. Extended wavelets for multiple measures. In Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (San Diego, CA). ACM, New York.]]
[7]
Deshpande, A., Garofalakis, M., and Rastogi, R. 2001. Independence is Good: Dependency-Based Histogram Synopses for High-Dimensional Data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data (Santa Barbara, CA). ACM, New York.]]
[8]
DeVore, R. A. 1998. Nonlinear Approximation. Acta Num. 7, 51--150.]]
[9]
Garofalakis, M. and Gibbons, P. B. 2001. Approximate query processing: Taming the terabytes. Tutorial in Proceedings of the 27th International Confrence on Very Large Data Bases, Roma, Italy.]]
[10]
Garofalakis, M. and Gibbons, P. B. 2002. Wavelet synopses with error guarantees. In Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data (Madison, WI). ACM, New York, 476--487.]]
[11]
Garofalakis, M. and Gibbons, P. B. 2004. Probabilistic wavelet synopses. ACM Trans. Database Systems 29, 1 (Mar.) (SIGMOD/PODS Special Issue).]]
[12]
Garofalakis, M. and Kumar, A. 2004. Deterministic wavelet thresholding for maximum-error metrics. In Proceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Paris, France). ACM, New York.]]
[13]
Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., and Strauss, M. J. 2001. Surfing wavelets on streams: One-pass summaries for approximate aggregate queries. In Proceedings of the 27th International Conference on Very Large Data Bases (Roma, Italy).]]
[14]
Guha, S. 2004. A note on wavelet optimization. (Manuscript available from: http://www.cis.upenn.edu/~sudipto/note.html.)]]
[15]
Gunopulos, D., Kollios, G., Tsotras, V. J., and Domeniconi, C. 2000. Approximating multi-dimensional aggregate range queries over real attributes. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (Dallas, Texas). ACM, New York.]]
[16]
Haas, P. J. and Swami, A. N. 1992. Sequential sampling procedures for query size estimation. In Proceedings of the 1992 ACM SIGMOD International Conference on Management of Data (San Diego, CA). ACM, New York, 341--350.]]
[17]
Hellerstein, J. M., Haas, P. J., and Wang, H. J. 1997. Online aggregation. In Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (Tucson, AZ). ACM, New York.]]
[18]
Ioannidis, Y. 2003a. Approximations in database systems. In Proceedings of the 9th International Conference on Database Theory (ICDT'2003) (Siena, Italy).]]
[19]
Ioannidis, Y. 2003b. The history of histograms (abridged). In Proceedings of the 29th International Conference on Very Large Data Bases (Berlin, Germany).]]
[20]
Jawerth, B. and Sweldens, W. 1994. An overview of wavelet based multiresolution analyses. SIAM Rev. 36, 3, 377--412.]]
[21]
Matias, Y., Vitter, J. S., and Wang, M. 1998. Wavelet-based histograms for selectivity estimation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (Seattle, WA). ACM, New York, 448--459.]]
[22]
Matias, Y., Vitter, J. S., and Wang, M. 2000. Dynamic maintenance of wavelet-based histograms. In Proceedings of the 26th International Conference on Very Large Data Bases (Cairo, Egypt).]]
[23]
Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, MA.]]
[24]
Muthukrishnan, S. 2004. Nonuniform sparse approximation with Haar wavelet basis. Tech. Rep. 2004-42, DIMACS. Sept.]]
[25]
Muthukrishnan, S., Poosala, V., and Suel, T. 1999. On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In Proceedings of the 7th International Conference on Database Theory (ICDT'99) (Jerusalem, Israel).]]
[26]
Natsev, A., Rastogi, R., and Shim, K. 1999. WALRUS: A similarity retrieval algorithm for image databases. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, PA). ACM, New York.]]
[27]
Stollnitz, E. J., DeRose, T. D., and Salesin, D. H. 1996. Wavelets for Computer Graphics---Theory and Applications. Morgan Kaufmann, San Francisco, CA.]]
[28]
Vitter, J. S. and Wang, M. 1999. Approximate computation of multidimensional aggregates of sparse data using wavelets. In Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data (Philadelphia, PA). ACM, New York.]]

Cited By

View all
  • (2023)Optimization of Privacy Budget Allocation In Differential Privacy-Based Public Transit Trajectory Data Publishing for Smart Mobility ApplicationsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330978324:12(15158-15168)Online publication date: 1-Dec-2023
  • (2022)B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential PrivacyEntropy10.3390/e2403040424:3(404)Online publication date: 14-Mar-2022
  • (2022)Turbo-charging SPJ query plans with learned physical join operator selectionsProceedings of the VLDB Endowment10.14778/3551793.355182515:11(2706-2718)Online publication date: 29-Sep-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Database Systems
ACM Transactions on Database Systems  Volume 30, Issue 4
Special Issue: SIGMOD/PODS 2004
December 2005
241 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1114244
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2005
Published in TODS Volume 30, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data synopses
  2. Haar wavelets
  3. approximate query processing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Optimization of Privacy Budget Allocation In Differential Privacy-Based Public Transit Trajectory Data Publishing for Smart Mobility ApplicationsIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2023.330978324:12(15158-15168)Online publication date: 1-Dec-2023
  • (2022)B-DP: Dynamic Collection and Publishing of Continuous Check-In Data with Best-Effort Differential PrivacyEntropy10.3390/e2403040424:3(404)Online publication date: 14-Mar-2022
  • (2022)Turbo-charging SPJ query plans with learned physical join operator selectionsProceedings of the VLDB Endowment10.14778/3551793.355182515:11(2706-2718)Online publication date: 29-Sep-2022
  • (2022)An Optimal Online Semi-Connected PLA Algorithm With Maximum Error BoundIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298131934:1(164-177)Online publication date: 1-Jan-2022
  • (2022)Efficient two-dimensional Haar$$^+$$ synopsis construction for the maximum absolute error measureThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00551-228:5(675-701)Online publication date: 11-Mar-2022
  • (2021)A Tailored Regression for Learned Indexes: Logarithmic Error RegressionProceedings of the Fourth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management10.1145/3464509.3464891(9-15)Online publication date: 20-Jun-2021
  • (2020)An efficient multidimensional $L_{\infty }$ wavelet method and its application to approximate query processingWorld Wide Web10.1007/s11280-020-00834-7Online publication date: 10-Oct-2020
  • (2019)Two-dimensional wavelet synopses with maximum error bound and its application in parallel compressionJournal of Intelligent & Fuzzy Systems10.3233/JIFS-17915437:3(3499-3511)Online publication date: 9-Oct-2019
  • (2019)Scaling the Construction of Wavelet Synopses for Maximum Error MetricsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2018.286718531:9(1794-1808)Online publication date: 5-Aug-2019
  • (2019)Data compression approach for the home energy management systemApplied Energy10.1016/j.apenergy.2019.04.078247(643-656)Online publication date: Aug-2019
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media