research-article

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Authors:
Yingjun Wu

IBM Research - Almaden, San Jose, CA, USA

IBM Research - Almaden, San Jose, CA, USA
View Profile

,
Jia Yu

Arizona State University, Tempe, AZ, USA

Arizona State University, Tempe, AZ, USA
View Profile

,
Yuanyuan Tian

IBM Research - Almaden, San Jose, CA, USA

IBM Research - Almaden, San Jose, CA, USA
View Profile

,
Richard Sidle

IBM, Ottawa, ON, Canada

IBM, Ottawa, ON, Canada
View Profile

,
Ronald Barber

IBM Research - Almaden, San Jose, CA, USA

IBM Research - Almaden, San Jose, CA, USA
View Profile

SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataJune 2019Pages 1223–1240https://doi.org/10.1145/3299869.3319861

Published:25 June 2019Publication History

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Pages 1223–1240

ABSTRACT

Database administrators construct secondary indexes on data tables to accelerate query processing in relational database management systems (RDBMSs). These indexes are built on top of the most frequently queried columns according to the data statistics. Unfortunately, maintaining multiple secondary indexes in the same database can be extremely space consuming, causing significant performance degradation due to the potential exhaustion of memory space. In this paper, we demonstrate that there exist many opportunities to exploit column correlations for accelerating data access. We propose HERMIT, a succinct secondary indexing mechanism for modern RDBMSs. HERMIT judiciously leverages the rich soft functional dependencies hidden among columns to prune out redundant structures for indexed key access. Instead of building a complete index that stores every single entry in the key columns, HERMIT navigates any incoming key access queries to an existing index built on the correlated columns. This is achieved through the Tiered Regression Search Tree (TRS-Tree), a succinct, ML-enhanced data structure that performs fast curve fitting to adaptively and dynamically capture both column correlations and outliers. Our extensive experimental study in two different RDBMSs have confirmed that HERMIT can significantly reduce space consumption with limited performance overhead, especially when supporting complex range queries.

References

libpqxx. http://pqxx.org/development/libpqxx/.Google Scholar
PostgreSQL. http://www.postgresql.org.Google Scholar
Simple linear regression. https://en.wikipedia.org/wiki/Simple_linear_regression.Google Scholar
S. Agrawal, S. Chaudhuri, and V. R. Narasayya. Automated Selection of Materialized Views and Indexes for SQL Databases. In VLDB, 2000. Google ScholarDigital Library
M. Athanassoulis and A. Ailamaki. BF-Tree: Approximate Tree Indexing. PVLDB, 7(14), 2014. Google ScholarDigital Library
M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In EDBT, 2016.Google Scholar
P. G. Brown and P. J. Hass. BHUNT: Automatic Discovery of Fuzzy Algebraic Constraints in Relational Data. In VLDB, 2003. Google ScholarDigital Library
L. Caruccio, V. Deufemia, and G. Polese. Relaxed Functional Dependencies - A Survey of Approaches. IEEE TKDE, 28(1), 2016. Google ScholarDigital Library
S. Chaudhuri and V. R. Narasayya. An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB, 1997. Google ScholarDigital Library
D. Comer. The Ubiquitous B-Tree. CSUR, 11(2), 1979. Google ScholarDigital Library
P. Godfrey, J. Gryz, and C. Zuzarte. Exploiting Constraint-Like Data Characterizations in Query Optimization. In SIGMOD, 2001. Google ScholarDigital Library
J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing Relations and Indexes. In ICDE, 1998. Google ScholarDigital Library
G. Graefe et al. Modern B-Tree Techniques. Foundations and Trends® in Databases, 3(4):203--402, 2011. Google ScholarDigital Library
G. Graefe, H. Volos, H. Kimura, H. A. Kuno, J. Tucek, M. Lillibridge, and A. C. Veitch. In-Memory Performance for Big Data. PVLDB, 8(1), 2014.Google Scholar
J. Gryz, B. Schiefer, J. Zheng, and C. Zuzarte. Discovery and Application of Check Constraints in DB2. In ICDE, 2001. Google ScholarDigital Library
B. Hentschel, M. S. Kester, and S. Idreos. Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation. In SIGMOD, 2018. Google ScholarDigital Library
S. Idreos, M. L. Kersten, and S. Manegold. Self-Organizing Tuple Reconstruction in Column-Stores. In SIGMOD, 2009.Google ScholarDigital Library
S. Idreos, S. Manegold, H. Kuno, and G. Graefe. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Store. PVLDB, 4(9), 2011.Google Scholar
I. F. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies. In SIGMOD, 2004. Google ScholarDigital Library
H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. Correlation Maps: A Compressed Access Method for Exploiting Soft Functional Dependencies. VLDB, 2(1), 2009. Google ScholarDigital Library
H. Kimura, G. Huo, A. Rasin, S. Madden, and S. B. Zdonik. CORADD: Correlation Aware Database Designer for Materialized Views and Indexes. PVLDB, 3(1--2), 2010. Google ScholarDigital Library
A. Kipf, T. Kipf, B. Radke, V. Leis, P. A. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In CIDR, 2019.Google Scholar
J. Kivinen and H. Mannila. Approximate Inference of Functional Dependencies from Relations. Theoretical Computer Science, 149(1), 1995. Google ScholarDigital Library
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In SIGMOD, 2018. Google ScholarDigital Library
S. Kruse and F. Naumann. Efficient Discovery of Approximate Dependencies. PVLDB, 11(7), 2018. Google ScholarDigital Library
P. Larson, W. Lehner, J. Zhou, and P. Zabback. Cardinality Estimation Using Sample Views with Quality Assurance. In SIGMOD, 2007. Google ScholarDigital Library
T. J. Lehman and M. J. Carey. A Study of Index Structures for Main Memory Database Management Systems. In VLDB, 1986. Google ScholarDigital Library
H. Liu, M. Xu, Z. Yu, V. Corvinelli, and C. Zuzarte. Cardinality Estimation Using Neural Networks. In CASCON, 2015.Google Scholar
T. Malik, R. C. Burns, and N. V. Chawla. A Black-Box Approach to Query Cardinality Estimation. In CIDR, 2007.Google Scholar
T. Papenbrock, J. Ehrlich, J. Marten, T. Neubert, J. Rudolph, M. Schö nberg, J. Zwiener, and F. Naumann. Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms. PVLDB, 8(10), 2015. Google ScholarDigital Library
A. Pavlo, G. Angulo, J. Arulraj, H. Lin, J. Lin, L. Ma, P. Menon, T. C. Mowry, M. Perron, I. Quah, et al. Self-Driving Database Management Systems. In CIDR, 2017.Google Scholar
J. Rao and K. A. Ross. Making BGoogle Scholar
-Trees Cache Conscious in Main Memory. In SIGMOD, 2000.Google Scholar
L. Sidirourgos and M. L. Kersten. Column Imprints: A Secondary Index Structure. In SIGMOD, 2013. Google ScholarDigital Library
M. Stonebraker. The Case for Partial Indexes. Sigmod Record, 18(4), 1989.Google Scholar
G. Valentin, M. Zuliani, D. C. Zilio, G. Lohman, and A. Skelley. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE, 2000.Google ScholarCross Ref
A. D. Well and J. L. Myers. Research design & statistical analysis. Psychology Press, 2003.Google Scholar
Y. Wu, J. Arulraj, J. Lin, R. Xian, and A. Pavlo. An Empirical Evaluation of In-Memory Multi-Version Concurrency Control. PVLDB, 10(7), 2017.Google Scholar
J. Yu and M. Sarwat. Two Birds, One Stone: A Fast, yet Lightweight, Indexing Scheme for Modern Database Systems. PVLDB, 10(4), 2016. Google ScholarDigital Library
H. Zhang, D. G. Andersen, A. Pavlo, M. Kaminsky, L. Ma, and R. Shen. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In SIGMOD, 2016. Google ScholarDigital Library
H. Zhang, I. F. Ilyas, and K. Salem. PSALM: Cardinality Estimation in the Presence of Fine-Grained Access Controls. In ICDE, 2009. Google ScholarDigital Library
H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In SIGMOD, 2018. Google ScholarDigital Library
M. Zukowski, S. Hé man, N. Nes, and P. A. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE, 2006. Google ScholarDigital Library

Index Terms

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

Recommendations

Indexing in-network trajectory flows

Indexing moving objects (MO) is a hot topic in the field of moving objects databases since many years. An impressive number of access methods have been proposed to optimize the processing of MO-related queries. Several methods have focused on spatio-...
Read More
Indexing moving objects for directions and velocities queries

Moving object databases are required to support different types of queries with a large number of moving objects. New types of queries namely directions and velocity queries (DV queries), are to be supported and covered. The TPR-tree and its successors ...
Read More
Temporally enhanced network-constrained (TENC) R-tree
MobiGIS '16: Proceedings of the 5th ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems

This paper describes a new Network-constrained Moving objects indexing structure, which extends the state-of-the-art for this kind of data. The indexing structure we propose is called Temporally Enhanced Network-Constrained R-tree (TENC R-tree), which ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
General Chairs:
Peter Boncz
CWI & Vrije Universiteit Amsterdam, The Netherlands
,
Stefan Manegold
CWI & Universiteit Leiden, The Netherlands
,
Program Chairs:
Anastasia Ailamaki
EPFL, Switzerland
,
Amol Deshpande
University of Maryland, USA
,
Tim Kraska
MIT, USA
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 June 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
access method
applied ai
index structure
succinct data structure
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '19 Paper Acceptance Rate88of430submissions,20%Overall Acceptance Rate785of4,003submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 688
  Total Downloads
- Downloads (Last 12 months)81
- Downloads (Last 6 weeks)18
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Designing Succinct Secondary Indexing Mechanism by Exploiting Column Correlations

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Indexing in-network trajectory flows

Indexing moving objects for directions and velocities queries

Temporally enhanced network-constrained (TENC) R-tree