skip to main content
article
Free Access

Improved histograms for selectivity estimation of range predicates

Published:01 June 1996Publication History
Skip Abstract Section

Abstract

Many commercial database systems maintain histograms to summarize the contents of relations and permit efficient estimation of query result sizes and access plan costs. Although several types of histograms have been proposed in the past, there has never been a systematic study of all histogram aspects, the available choices for each aspect, and the impact of such choices on histogram effectiveness. In this paper, we provide a taxonomy of histograms that captures all previously proposed histogram types and indicates many new possibilities. We introduce novel choices for several of the taxonomy dimensions, and derive new histogram types by combining choices in effective ways. We also show how sampling techniques can be used to reduce the cost of histogram construction. Finally, we present results from an empirical study of the proposed histogram types used in selectivity estimation of range predicates and identify the histogram types that have the best overall performance.

References

  1. CR94 Cr M Chen and N. Roussopoulos. Adaptive selectivity estimation using query feedback. Proc. ofACM SIG- MOD Conf, pages 16 I-I 72, May 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. dB78 C. de Boor. A practical guide to splines. Springer- Verlag, New York, 1978.Google ScholarGoogle ScholarCross RefCross Ref
  3. dB95 C. de Boor. Private communication, I995.Google ScholarGoogle Scholar
  4. GS91 A.P. Gurajada andJ. Srivastava. Equidepth partitioning of a data set based on finding its medians. Proc. Applied Computing Symposium, pages 92-101, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  5. HNSS95 P. J. Haas, J. E Naughton, S. Seshadri, and L. Stokes. Sampling-based estimation of the number of distinct values of an attribute. Proc. of the 21st Int. Conf on Very Large Databases, pages 311-322, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Hoe63 W. Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer Statist. Assoco, 58:13-30, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  7. HS95 P.J. Haas and A. N. Swami. Sampling-based selectivity estimation for joins using augmented frequent value statistics. Proc. of IEEE Conf. on Data Engineering, pages 522-531, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. IC91 Y. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. Proc. of A CM SIGMOD Conf, pages 268-277, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. IC93 Y. Ioannidis and S. Christodoulakis. Optimal histograms for limiting worst-case error propagation in the size of join results. ACM TODS, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. IK90 Y. Ioannidis and Y. Kang. Randomized algorithms for optimizing large join queries. In Proc. of the 1990 A CM-SIGMOD Conference on the Management of Data, pages 312-321, Atlantic City, NJ, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Ioa93 Y. loannidis. Universality of serial histograms. Proc. of the 19th Int. Conf. on Vet3, Large Databases, pages 256-267, December 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. IP95 Y. loannidis and V. Poosala. Balancing histogram optimality and practicality for query result size estimation. Proc. of A CM SIGMOD Conf, pages 233-244, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. JC85 R. Jain and I. Chlamtac. The p2 algorithm for dynamic calculation of quantiles and histograms without sorting observations. Communications of the A CM, pages 1076-1085. Oct 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kol41 A.N. Kolmogorov. Confidence limits for an unknown distribution function. Ann. Math. Statist., 12:461--463, 1941.Google ScholarGoogle ScholarCross RefCross Ref
  15. Koo80 R.P. Kooi. The optimization of queries in relational databases. PhD thesis, Case Western Reserver University, Sept 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. LNS90 R.J. Lipton, J. E Naughton, and D. A. Schneider. Practical selectivity estimation through adaptive sampling. Proc. of ACM SIGMOD Conf, pages 1-11, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Mas51 E J. Massey. The Kolmogorov-Smirnov test for goodness-of-fit. J. Amer. Statist. Assoc., 46:68-78, 1951.Google ScholarGoogle ScholarCross RefCross Ref
  18. MCS88 M.V. Mannino, E Chu, and T. Sager. Statistical profile estimation in database systems. ACM Computing Surveys, 20(3): 192-221, Sept 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. MD88 M. Muralikrishna and David J Dewltt. Equi-depth histograms for estimating selectivity factors for multidimensional queries. Proc. of ACM SIGMOD Con f~ pages 28-36, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. PSC84 G. Piatetsky-Shapiro and C. ConneI1. Accurate estimation of the number of tuples satisfying a condition, Proc. of ACM SIGMOD Conf, pages 256-276, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Raa87 K.E.E. Raatikainen. Simultaneous estimation of several percentiles. Simulation. 49:159-164, October 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. SAC+79 P.G. Selinger, M.M. Astrahan, D.D. Chamberlin, R A. Lorie, and T.T. Price. Access path selection ~n a relational database management system. Proc. of ACM SIGMOD Conf, pages 23-34, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. SG88 A. Swami and A. Gupta. Optimization of large join queries. In Proc. of the 1988 ACM-SIGMOD Conference on the Management of Data, pages 8-17, Chicago, IL, June 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SLRD93 W. Sun, Y. Ling, N. Rishe, and Y. Deng. An instant and accurate size estimation method for joins and selections in a retrieval-intensive environment. Proc. of ACM SIGMOD Conf, pages 79-88, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Vit85 J.S. Vitter. Random sampling with a reservoir A CM Trans. Math. Software, 11:37-57, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zip49 G.K. Zipf. Human behavtour and the principle of teast effort. Addison-Wesley, Reading, MA, 1949.Google ScholarGoogle Scholar

Index Terms

  1. Improved histograms for selectivity estimation of range predicates

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGMOD Record
        ACM SIGMOD Record  Volume 25, Issue 2
        June 1996
        557 pages
        ISSN:0163-5808
        DOI:10.1145/235968
        Issue’s Table of Contents
        • cover image ACM Conferences
          SIGMOD '96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data
          June 1996
          560 pages
          ISBN:0897917944
          DOI:10.1145/233269

        Copyright © 1996 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 June 1996

        Check for updates

        Qualifiers

        • article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader