ABSTRACT
Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely space consuming, causing significant performance degradation due to the potential exhaustion of memory space. In this paper, we demonstrate that there exist many opportunities to exploit column correlations for accelerating data access. We propose HERMIT, a succinct secondary indexing mechanism for modern RDBMSs. HERMIT judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access. Instead of building a complete index that stores every single entry in the key columns, HERMIT navigates any incoming key access queries to an existing index built on the correlated columns. This is achieved through the Tiered Regression Search Tree (TRS-Tree), a succinct, ML-enhanced data structure that performs fast curve fitting to adaptively and dynamically capture both column correlations and outliers. Our extensive experimental study in two different RDBMSs have confirmed that HERMIT can significantly reduce space consumption with limited performance overhead, especially when supporting complex range queries.
- libpqxx. http://pqxx.org/development/libpqxx/.Google Scholar
- PostgreSQL. http://www.postgresql.org.Google Scholar
- Simple linear regression. https://en.wikipedia.org/wiki/Simple_linear_regression.Google Scholar
- S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated Selection of Materialized Views and Indexes for SQL Databases. In VLDB, 2000. Google ScholarDigital Library
- M. Athanassoulis and A. Ailamaki. BF-Tree: Approximate Tree Indexing. PVLDB, 7(14), 2014. Google ScholarDigital Library
- M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In EDBT, 2016.Google Scholar
- P. G. Brown and P. J. Hass. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In VLDB, 2003. Google ScholarDigital Library
- L. Caruccio, V. Deufemia, and G. Polese. Relaxed Functional Dependencies - A Survey of Approaches. IEEE TKDE, 28(1), 2016. Google ScholarDigital Library
- S. Chaudhuri and V. R. Narasayya. An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997. Google ScholarDigital Library
- D. Comer. The Ubiquitous B-Tree. CSUR, 11(2), 1979. Google ScholarDigital Library
- P. Godfrey, J. Gryz, and C. Zuzarte. Exploiting Constraint-Like Data Characterizations in Query Optimization. In SIGMOD, 2001. Google ScholarDigital Library
- J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing Relations and Indexes. In ICDE, 1998. Google ScholarDigital Library
- G. Graefe et al. Modern B-Tree Techniques. Foundations and Trends® in Databases, 3(4):203--402, 2011. Google ScholarDigital Library
- G. Graefe, H. Volos, H. Kimura, H. A. Kuno, J. Tucek, M. Lillibridge, and A. C. Veitch. In-Memory Performance for Big Data. PVLDB, 8(1), 2014.Google Scholar
- J. Gryz, B. Schiefer, J. Zheng, and C. Zuzarte. Discovery and Application of Check Constraints in DB2. In ICDE, 2001. Google ScholarDigital Library
- B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation. In SIGMOD, 2018. Google ScholarDigital Library
- S. Idreos, M. L. Kersten, and S. Manegold. Self-Organizing Tuple Reconstruction in Column-Stores. In SIGMOD, 2009.Google ScholarDigital Library
- S. Idreos, S. Manegold, H. Kuno, and G. Graefe. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Store. PVLDB, 4(9), 2011.Google Scholar
- I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In SIGMOD, 2004. Google ScholarDigital Library
- H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies. VLDB, 2(1), 2009. Google ScholarDigital Library
- H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. CORADD: Correlation Aware Database Designer for Materialized Views and Indexes. PVLDB, 3(1--2), 2010. Google ScholarDigital Library
- A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR, 2019.Google Scholar
- J. Kivinen and H. Mannila. Approximate Inference of Functional Dependencies from Relations. Theoretical Computer Science, 149(1), 1995. Google ScholarDigital Library
- T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In SIGMOD, 2018. Google ScholarDigital Library
- S. Kruse and F. Naumann. Efficient Discovery of Approximate Dependencies. PVLDB, 11(7), 2018. Google ScholarDigital Library
- P. Larson, W. Lehner, J. Zhou, and P. Zabback. Cardinality Estimation Using Sample Views with Quality Assurance. In SIGMOD, 2007. Google ScholarDigital Library
- T. J. Lehman and M. J. Carey. A Study of Index Structures for Main Memory Database Management Systems. In VLDB, 1986. Google ScholarDigital Library
- H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In CASCON, 2015.Google Scholar
- T. Malik, R. C. Burns, and N. V. Chawla. A Black-Box Approach to Query Cardinality Estimation. In CIDR, 2007.Google Scholar
- T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schö nberg, J. Zwiener, and F. Naumann. Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. PVLDB, 8(10), 2015. Google ScholarDigital Library
- A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-Driving Database Management Systems. In CIDR, 2017.Google Scholar
- J. Rao and K. A. Ross. Making BGoogle Scholar
- -Trees Cache Conscious in Main Memory. In SIGMOD, 2000.Google Scholar
- L. Sidirourgos and M. L. Kersten. Column Imprints: A Secondary Index Structure. In SIGMOD, 2013. Google ScholarDigital Library
- M. Stonebraker. The Case for Partial Indexes. Sigmod Record, 18(4), 1989.Google Scholar
- G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE, 2000.Google ScholarCross Ref
- A. D. Well and J. L. Myers. Research design & statistical analysis. Psychology Press, 2003.Google Scholar
- Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. PVLDB, 10(7), 2017.Google Scholar
- J. Yu and M. Sarwat. Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems. PVLDB, 10(4), 2016. Google ScholarDigital Library
- H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD, 2016. Google ScholarDigital Library
- H. Zhang, I. F. Ilyas, and K. Salem. PSALM: Cardinality Estimation in the Presence of Fine-Grained Access Controls. In ICDE, 2009. Google ScholarDigital Library
- H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD, 2018. Google ScholarDigital Library
- M. Zukowski, S. Hé man, N. Nes, and P. A. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE, 2006. Google ScholarDigital Library
Index Terms
- Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations
Recommendations
Indexing in-network trajectory flows
Indexing moving objects (MO) is a hot topic in the field of moving objects databases since many years. An impressive number of access methods have been proposed to optimize the processing of MO-related queries. Several methods have focused on spatio-...
Indexing moving objects for directions and velocities queries
Moving object databases are required to support different types of queries with a large number of moving objects. New types of queries namely directions and velocity queries (DV queries), are to be supported and covered. The TPR-tree and its successors ...
Temporally enhanced network-constrained (TENC) R-tree
MobiGIS '16: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Mobile Geographic Information SystemsThis paper describes a new Network-constrained Moving objects indexing structure, which extends the state-of-the-art for this kind of data. The indexing structure we propose is called Temporally Enhanced Network-Constrained R-tree (TENC R-tree), which ...
Comments