skip to main content
10.1145/3299869.3319861acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Published:25 June 2019Publication History

ABSTRACT

Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely space consuming, causing significant performance degradation due to the potential exhaustion of memory space. In this paper, we demonstrate that there exist many opportunities to exploit column correlations for accelerating data access. We propose HERMIT, a succinct secondary indexing mechanism for modern RDBMSs. HERMIT judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access. Instead of building a complete index that stores every single entry in the key columns, HERMIT navigates any incoming key access queries to an existing index built on the correlated columns. This is achieved through the Tiered Regression Search Tree (TRS-Tree), a succinct, ML-enhanced data structure that performs fast curve fitting to adaptively and dynamically capture both column correlations and outliers. Our extensive experimental study in two different RDBMSs have confirmed that HERMIT can significantly reduce space consumption with limited performance overhead, especially when supporting complex range queries.

References

  1. libpqxx. http://pqxx.org/development/libpqxx/.Google ScholarGoogle Scholar
  2. PostgreSQL. http://www.postgresql.org.Google ScholarGoogle Scholar
  3. Simple linear regression. https://en.wikipedia.org/wiki/Simple_linear_regression.Google ScholarGoogle Scholar
  4. S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated Selection of Materialized Views and Indexes for SQL Databases. In VLDB, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Athanassoulis and A. Ailamaki. BF-Tree: Approximate Tree Indexing. PVLDB, 7(14), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In EDBT, 2016.Google ScholarGoogle Scholar
  7. P. G. Brown and P. J. Hass. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Caruccio, V. Deufemia, and G. Polese. Relaxed Functional Dependencies - A Survey of Approaches. IEEE TKDE, 28(1), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Chaudhuri and V. R. Narasayya. An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Comer. The Ubiquitous B-Tree. CSUR, 11(2), 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Godfrey, J. Gryz, and C. Zuzarte. Exploiting Constraint-Like Data Characterizations in Query Optimization. In SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing Relations and Indexes. In ICDE, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Graefe et al. Modern B-Tree Techniques. Foundations and Trends® in Databases, 3(4):203--402, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Graefe, H. Volos, H. Kimura, H. A. Kuno, J. Tucek, M. Lillibridge, and A. C. Veitch. In-Memory Performance for Big Data. PVLDB, 8(1), 2014.Google ScholarGoogle Scholar
  15. J. Gryz, B. Schiefer, J. Zheng, and C. Zuzarte. Discovery and Application of Check Constraints in DB2. In ICDE, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation. In SIGMOD, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Idreos, M. L. Kersten, and S. Manegold. Self-Organizing Tuple Reconstruction in Column-Stores. In SIGMOD, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Idreos, S. Manegold, H. Kuno, and G. Graefe. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Store. PVLDB, 4(9), 2011.Google ScholarGoogle Scholar
  19. I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies. VLDB, 2(1), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. CORADD: Correlation Aware Database Designer for Materialized Views and Indexes. PVLDB, 3(1--2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR, 2019.Google ScholarGoogle Scholar
  23. J. Kivinen and H. Mannila. Approximate Inference of Functional Dependencies from Relations. Theoretical Computer Science, 149(1), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In SIGMOD, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Kruse and F. Naumann. Efficient Discovery of Approximate Dependencies. PVLDB, 11(7), 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Larson, W. Lehner, J. Zhou, and P. Zabback. Cardinality Estimation Using Sample Views with Quality Assurance. In SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. J. Lehman and M. J. Carey. A Study of Index Structures for Main Memory Database Management Systems. In VLDB, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In CASCON, 2015.Google ScholarGoogle Scholar
  29. T. Malik, R. C. Burns, and N. V. Chawla. A Black-Box Approach to Query Cardinality Estimation. In CIDR, 2007.Google ScholarGoogle Scholar
  30. T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schö nberg, J. Zwiener, and F. Naumann. Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. PVLDB, 8(10), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-Driving Database Management Systems. In CIDR, 2017.Google ScholarGoogle Scholar
  32. J. Rao and K. A. Ross. Making BGoogle ScholarGoogle Scholar
  33. -Trees Cache Conscious in Main Memory. In SIGMOD, 2000.Google ScholarGoogle Scholar
  34. L. Sidirourgos and M. L. Kersten. Column Imprints: A Secondary Index Structure. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. M. Stonebraker. The Case for Partial Indexes. Sigmod Record, 18(4), 1989.Google ScholarGoogle Scholar
  36. G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  37. A. D. Well and J. L. Myers. Research design & statistical analysis. Psychology Press, 2003.Google ScholarGoogle Scholar
  38. Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. PVLDB, 10(7), 2017.Google ScholarGoogle Scholar
  39. J. Yu and M. Sarwat. Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems. PVLDB, 10(4), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. H. Zhang, I. F. Ilyas, and K. Salem. PSALM: Cardinality Estimation in the Presence of Fine-Grained Access Controls. In ICDE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zukowski, S. Hé man, N. Nes, and P. A. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
          June 2019
          2106 pages
          ISBN:9781450356435
          DOI:10.1145/3299869

          Copyright © 2019 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 June 2019

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader