Article

How to barter bits for chronons: compression and bandwidth trade offs for database scans

Authors:
Allison L. Holloway

University of Wisconsin, Madison, WI

University of Wisconsin, Madison, WI
View Profile

,
Vijayshankar Raman

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
Garret Swart

IBM Almaden Research Center, San Jose, CA

IBM Almaden Research Center, San Jose, CA
View Profile

,
David J. DeWitt

University of Wisconsin, Madison, WI

University of Wisconsin, Madison, WI
View Profile

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of dataJune 2007Pages 389–400https://doi.org/10.1145/1247480.1247525

Published:11 June 2007Publication History

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

Pages 389–400

ABSTRACT

Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query processing workload, and second, large memories and compression are making table scans CPU, rather than disk bandwidth, bound. Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical partitions of a denormalized schema.

In this paper we present a study of how to make such scans faster by the use of a scan code generator that produces code tuned to the database schema, the compression dictionaries, the queries being evaluated and the target CPU architecture. We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples. We present a detailed experimental study of the performance of generated scans against these compression formats, and use this to explore the trade off between compression quality and scan speed. We also introduce new strategies for removing instruction-level dependencies and increasing instruction-level parallelism, allowing for greater exploitation of multi-issue processors.

References

D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, 2006. Google ScholarDigital Library
G. Antoshenkov, D. Lomet, and J. Murray. Order preserving string compression. In ICDE, 1996. Google ScholarDigital Library
P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google Scholar
Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD, 2001.Google ScholarDigital Library
G. Cormack. Data compresson on a database system. CACM, 28(12), 1985. Google ScholarDigital Library
J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In ICDE, 1998. Google ScholarDigital Library
G. Graefe and L. D. Shapiro. Data compression and database performance. In Proc. ACM/IEEE-CS Symp. on Applied Computing, 1991.Google ScholarCross Ref
S. Harizopoulos, V. Liang, D. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, 2006. Google ScholarDigital Library
D. Huffman. A method for the construction of minimum-redundancy codes. In Proceedings of the I.R.E., pages 1098--1102, 1952.Google ScholarCross Ref
B. R. Iyer and D. Wilhite. Data compression support in databases. In VLDB, 1994.Google Scholar
T. Lehman and M. Carey. Query processing in main memory database management systems. In SIGMOD, 1986.Google ScholarDigital Library
R. MacNicol and B. French. Sybase IQ Multiplex-Designed for analytics. In VLDB, 2004. Google ScholarDigital Library
S. J. O'Connell and N. Winterbottom. Performing joins without decompression in a compressed database system. SIGMOD Rec., 32(1), 2003. Google ScholarDigital Library
MPöss and D. Potapov. Data compression in Oracle. In VLDB, 2003. Google ScholarDigital Library
V. Raman and G. Swart. Entropy compression of relations and querying of compressed relations. In VLDB, 2006. Google ScholarDigital Library
G. Ray, J. Haritsa, and S. Seshadri. Database compression: A performance enhancement tool. In COMAD, 1995.Google Scholar
M. Stonebraker et al. C-store: a column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
M. Stonebraker, E. Wong, P. Kreps, and G. Held. The design and implementation of INGRES. ACM Transactions on Database Systems, 1(3):189--222, 1976. Google ScholarDigital Library
T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29(3):55--67, 2000. Google ScholarDigital Library
A. Zandi, B. Iyer, and G. Langdon. Sort order preserving data compression for extended alphabets. In Data Compression Conference, 1993.Google Scholar
M. Zukowski, S. Heman, N. Nes, and P. A. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE, April 2006. Google ScholarDigital Library

Index Terms

How to barter bits for chronons: compression and bandwidth trade offs for database scans
1. Information systems
  1. Data management systems
    1. Database management system engines

Recommendations

Lightweight Huffman Coding for Efficient GPU Compression
ICS '23: Proceedings of the 37th International Conference on Supercomputing

Lossy compression is often deployed in scientific applications to reduce data footprint and improve data transfers and I/O performance. Especially for applications requiring on-the-flight compression, it is essential to minimize compression's runtime. ...
Read More
Enhanced Huffman Coding with Encryption for Wireless Data Broadcasting System
IS3C '12: Proceedings of the 2012 International Symposium on Computer, Consumer and Control

Data compression has been playing an important role in the areas of data transmission. Many great contributions have been made in this area, such as Huffman coding, LZW algorithm, run length coding, and so on. These methods only focus on the data ...
Read More
LOCO-I: a low complexity, context-based, lossless image compression algorithm
DCC '96: Proceedings of the Conference on Data Compression

LOCO-I (low complexity lossless compression for images) is a novel lossless compression algorithm for continuous-tone images which combines the simplicity of Huffman coding with the compression potential of context models, thus "enjoying the best of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
June 2007
1210 pages
ISBN:9781595936868
DOI:10.1145/1247480
General Chairs:
Lizhu Zhou
Tsinghua University, China
,
Tok Wang Ling
National University of Singapore, Singapore
,
Program Chair:
Beng Chin Ooi
National University of Singapore, Singapore
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Huffman coding
bandwidth trade offs
compression
difference coding
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 52
  Total Citations
  View Citations
- 68
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How to barter bits for chronons: compression and bandwidth trade offs for database scans

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lightweight Huffman Coding for Efficient GPU Compression

Enhanced Huffman Coding with Encryption for Wireless Data Broadcasting System

LOCO-I: a low complexity, context-based, lossless image compression algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

How to barter bits for chronons: compression and bandwidth trade offs for database scans

SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lightweight Huffman Coding for Efficient GPU Compression

Enhanced Huffman Coding with Encryption for Wireless Data Broadcasting System

LOCO-I: a low complexity, context-based, lossless image compression algorithm

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media