skip to main content
10.1145/1066677.1066831acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

An information theoretic histogram for single dimensional selectivity estimation

Published: 13 March 2005 Publication History

Abstract

We study the problem of one dimensional selectivity estimation in relational databases. We introduce a new type of histogram based on information theory. We compare our histogram against a large number of other techniques and on a wide array of datasets. We observe our histograms to have the overall best accuracy on the real datasets. We also observe that the accuracy ranking of all methods varies significantly across datasets. As such, we observe results not consistent with several conclusions drawn in past literature. Thus, we believe a gap exists in the past accuracy characterization.

References

[1]
loannidis Y. A History of Histograms. Proc. VLDB, 19--30, 2003.
[2]
Poosala V., Ioannidis Y., Haas P., and Shekita E. Improved Histograms for Selectivity Estimation of Range Predicates. Proc. SIGMOD, 294--305, 1996.
[3]
Giannella C. and Sayrafi B. An Information Theoretic Histogram for Single Dimensional Selectivity Estimation. Technical Report TR584, Indiana University, 2004.
[4]
Zhang Q. and Lin X. On Linear-Spline Based Histograms. Lecture Notes in Computer Science 2419, 354--366, 2002.
[5]
Jagadish H., Koudas N., Muthukrishnan S., Poosala V., Sevcik K., and Suel T. Optimal Histograms with Quality Guarantees. Proc. VLDB, 275--286, 1998.

Cited By

View all
  • (2021)Parallel batch k-means for Big data clusteringComputers & Industrial Engineering10.1016/j.cie.2020.107023152(107023)Online publication date: Feb-2021
  • (2018)Applying Machine Learning Models to Identify Forest Cover2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)10.1109/UEMCON.2018.8796830(471-474)Online publication date: Nov-2018

Index Terms

  1. An information theoretic histogram for single dimensional selectivity estimation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '05: Proceedings of the 2005 ACM symposium on Applied computing
    March 2005
    1814 pages
    ISBN:1581139640
    DOI:10.1145/1066677
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 March 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. entropy
    2. histograms
    3. selectivity estimation

    Qualifiers

    • Article

    Conference

    SAC05
    Sponsor:
    SAC05: The 2005 ACM Symposium on Applied Computing
    March 13 - 17, 2005
    New Mexico, Santa Fe

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Parallel batch k-means for Big data clusteringComputers & Industrial Engineering10.1016/j.cie.2020.107023152(107023)Online publication date: Feb-2021
    • (2018)Applying Machine Learning Models to Identify Forest Cover2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)10.1109/UEMCON.2018.8796830(471-474)Online publication date: Nov-2018

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media