research-article

An architecture for recycling intermediates in a column-store

Authors:
Milena G. Ivanova

Centrum Wiskunde en Informatica, Amsterdam, Netherlands

Centrum Wiskunde en Informatica, Amsterdam, Netherlands
View Profile

,
Martin L. Kersten

Centrum Wiskunde en Informatica, Amsterdam, Netherlands

Centrum Wiskunde en Informatica, Amsterdam, Netherlands
View Profile

,
Niels J. Nes

Centrum Wiskunde en Informatica, Amsterdam, Netherlands

Centrum Wiskunde en Informatica, Amsterdam, Netherlands
View Profile

,
Romulo A.P. Gonçalves

Centrum Wiskunde en Informatica, Amsterdam, Netherlands

Centrum Wiskunde en Informatica, Amsterdam, Netherlands
View Profile

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of dataJune 2009Pages 309–320https://doi.org/10.1145/1559845.1559879

Published:29 June 2009Publication History

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Pages 309–320

ABSTRACT

Automatically recycling (intermediate) results is a grand challenge for state-of-the-art databases to improve both query response time and throughput. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline avoiding materialization of intermediates as much as possible. This limits the opportunities for reuse of overlapping computations to DBA-defined materialized views and function/result cache tuning.

In contrast, the operator-at-a-time execution paradigm produces fully materialized results in each step of the query plan. To avoid resource contention, these intermediates are evicted as soon as possible.

In this paper we study an architecture that harvests the by-products of the operator-at-a-time paradigm in a column store system using a lightweight mechanism, the recycler. The key challenge then becomes selection of the policies to admit intermediates to the resource pool, their retention period, and the eviction strategy when facing resource limitations.

The proposed recycling architecture has been implemented in an open-source system. An experimental analysis against the TPC-H ad-hoc decision support benchmark and a complex, real-world application (SkyServer) demonstrates its effectiveness in terms of self-organizing behavior and its significant performance gains. The results indicate the potentials of recycling intermediates and charters a route for further development of database kernels.

References

S. Agrawal, S.Chaudhuri, and V. R. Narasayya. Automated Selection of Materialized Views and Indexes in SQL Databases. In VLDB, 2000. Google ScholarDigital Library
P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the Memory Wall in MonetDB. Commun. ACM, 51(12), 2008. Google ScholarDigital Library
C. Bornhövd, M. Altinel, C. Mohan, H. Pirahesh, and B. Reinwald. Adaptive Database Caching with DBCache. IEEE Data Eng. Bull., 27(2):11--18, 2004.Google Scholar
N. Bruno and S. Chaudhuri. Physical Design Refinement: The 'Merge-Reduce' Approach. ACM Trans. Database Syst., 32(4), 2007. Google ScholarDigital Library
C.-M. Chen and N. Roussopoulos. The Implementation and Performance Evaluation of the ADMS Query Optimizer: Integrating Query Result Caching and Matching. In EDBT, pages 323--336, 1994. Google ScholarDigital Library
C.-H. Choi, J. X. Yu, and H. Lu. Dynamic Materialized View Management Based on Predicates. In APWeb, pages 583--594, 2003. Google ScholarDigital Library
R. Cornacchia, S. Heman, M. Zukowski, A. P. de Vries, and P. A. Boncz. Flexible and Efficient IR Using Array Databases. VLDB J., 17(1):151---168, 2008. Google ScholarDigital Library
J. Goldstein and P.-A. Larson. Optimizing Queries Using Materialized Views: A practical, scalable solution. In SIGMOD Conference, pages 331--342, 2001. Google ScholarDigital Library
G. Graefe. Volcano -- An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng., 6(1):120--135, 1994. Google ScholarDigital Library
M. Ivanova, M. L. Kersten, and N. Nes. Self-organizing Strategies for a Column-store Database. In Proc. EDBT, pages 157--168, 2008. Google ScholarDigital Library
M. Ivanova, N. Nes, R. Goncalves, and M. L. Kersten. MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database. In Proc. SSDBM, Banff, Canada, July 2007. Google ScholarDigital Library
Y. Kotidis and N. Roussopoulos. A Case for Dynamic View Management. ACM Trans. Database Syst., 26(4):388--423, 2001. Google ScholarDigital Library
P.-Å. Larson, J. Goldstein, and J. Zhou. MTCache: Transparent Mid-Tier Database Caching in SQL Server. In ICDE, pages 177--189, 2004. Google ScholarDigital Library
G. Luo. Partial Materialized Views. In ICDE, pages 756--765, 2007.Google ScholarCross Ref
G. Luo and P. S. Yu. Content-based Filtering for Efficient Online Materialized View Maintenance. In CIKM, pages 163--172, 2008. Google ScholarDigital Library
H. Mistry, P. Roy, S. Sudarshan, and K. Ramamritham. Materialized View Selection and Maintenance Using Multi-Query Optimization. In SIGMOD Conference, pages 307--318, 2001. Google ScholarDigital Library
MonetDB, http://monetdb.cwi.nl/, 2008.Google Scholar
T. Phan and W.-S. Li. Dynamic Materialization of Query Views for Data Warehouse Workloads. In ICDE, pages 436--445, 2008. Google ScholarDigital Library
P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. Efficient and Extensible Algorithms for Multi Query Optimization. In SIGMOD Conference, pages 249--260, 2000. Google ScholarDigital Library
Sloan Digital Sky Survey / SkyServer, 2008.Google Scholar
A. S. Szalay, J. Gray, et al. The SDSS SkyServer: Public Access to the Sloan Digital Sky Server Data. In SIGMOD, pages 570--581, 2002. Google ScholarDigital Library
K.-L. Tan, S.-T. Goh, and B. C. Ooi. Cache-on-Demand: Recycling with Certainty. In ICDE, pages 633--640, 2001. Google ScholarDigital Library
Transaction Processing Performance Council. TPC Benchmark H, Revision 2.6.2, 2008.Google Scholar
J. Zhou, P.-Å. Larson, J. C. Freytag, and W. Lehner. Efficient Exploitation of Similar Subexpressions for Query Processing. In SIGMOD Conference, pages 533--544, 2007. Google ScholarDigital Library
J. Zhou, P.-Å. Larson, J. Goldstein, and L. Ding. Dynamic Materialized Views. In ICDE, pages 526--535, 2007.Google ScholarCross Ref
M. Zukowski, S. Héman, N. Nes, and P. Boncz. Super-Scalar RAM-CPU Cache Compression. In Proc. ICDE, Atlanta, GA, USA, 2006. Google ScholarDigital Library

Index Terms

An architecture for recycling intermediates in a column-store
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

An architecture for recycling intermediates in a column-store

Automatic recycling of intermediate results to improve both query response time and throughput is a grand challenge for state-of-the-art databases. Tuples are loaded and streamed through a tuple-at-a-time processing pipeline, avoiding materialization of ...
Read More
Hybrid Materialization in a Disk-Based Column-Store
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)

In column-oriented query processing, a materialization strategy determines when lightweight positions (row IDs) are translated into tuples. It is an important part of column-store architecture, since it defines the class of supported query plans, and, ...
Read More
Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented Processing
Model and Data Engineering
Abstract
Recursive queries and recursive derived tables constitute an important part of the SQL standard. Their efficient processing is important for many real-life applications that rely on graph or hierarchy traversal. Position-enabled column-stores ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
June 2009
1168 pages
ISBN:9781605585512
DOI:10.1145/1559845
Editors:
Carsten Binnig,
Benoit Dageville,
General Chairs:
Uğur Çetintemel
Brown University, USA
,
Stan Zdonik
Brown University, USA
,
Program Chair:
Donald Kossmann
ETH Zurich, Switzerland
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Results Reproduced / v1.1
Author Tags
caching
column-stores
databasekernels
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 42
  Total Citations
  View Citations
- 876
  Total Downloads
- Downloads (Last 12 months)31
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

An architecture for recycling intermediates in a column-store

SIGMOD '09: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

An architecture for recycling intermediates in a column-store

Hybrid Materialization in a Disk-Based Column-Store

Finding a Second Wind: Speeding Up Graph Traversal Queries in RDBMSs Using Column-Oriented Processing