ABSTRACT
Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query processing workload, and second, large memories and compression are making table scans CPU, rather than disk bandwidth, bound. Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical partitions of a denormalized schema.
In this paper we present a study of how to make such scans faster by the use of a scan code generator that produces code tuned to the database schema, the compression dictionaries, the queries being evaluated and the target CPU architecture. We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples. We present a detailed experimental study of the performance of generated scans against these compression formats, and use this to explore the trade off between compression quality and scan speed. We also introduce new strategies for removing instruction-level dependencies and increasing instruction-level parallelism, allowing for greater exploitation of multi-issue processors.
- D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, 2006. Google ScholarDigital Library
- G. Antoshenkov, D. Lomet, and J. Murray. Order preserving string compression. In ICDE, 1996. Google ScholarDigital Library
- P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google Scholar
- Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD, 2001.Google ScholarDigital Library
- G. Cormack. Data compresson on a database system. CACM, 28(12), 1985. Google ScholarDigital Library
- J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In ICDE, 1998. Google ScholarDigital Library
- G. Graefe and L. D. Shapiro. Data compression and database performance. In Proc. ACM/IEEE-CS Symp. on Applied Computing, 1991.Google ScholarCross Ref
- S. Harizopoulos, V. Liang, D. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, 2006. Google ScholarDigital Library
- D. Huffman. A method for the construction of minimum-redundancy codes. In Proceedings of the I.R.E., pages 1098--1102, 1952.Google ScholarCross Ref
- B. R. Iyer and D. Wilhite. Data compression support in databases. In VLDB, 1994.Google Scholar
- T. Lehman and M. Carey. Query processing in main memory database management systems. In SIGMOD, 1986.Google ScholarDigital Library
- R. MacNicol and B. French. Sybase IQ Multiplex-Designed for analytics. In VLDB, 2004. Google ScholarDigital Library
- S. J. O'Connell and N. Winterbottom. Performing joins without decompression in a compressed database system. SIGMOD Rec., 32(1), 2003. Google ScholarDigital Library
- MPöss and D. Potapov. Data compression in Oracle. In VLDB, 2003. Google ScholarDigital Library
- V. Raman and G. Swart. Entropy compression of relations and querying of compressed relations. In VLDB, 2006. Google ScholarDigital Library
- G. Ray, J. Haritsa, and S. Seshadri. Database compression: A performance enhancement tool. In COMAD, 1995.Google Scholar
- M. Stonebraker et al. C-store: a column-oriented DBMS. In VLDB, 2005. Google ScholarDigital Library
- M. Stonebraker, E. Wong, P. Kreps, and G. Held. The design and implementation of INGRES. ACM Transactions on Database Systems, 1(3):189--222, 1976. Google ScholarDigital Library
- T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29(3):55--67, 2000. Google ScholarDigital Library
- A. Zandi, B. Iyer, and G. Langdon. Sort order preserving data compression for extended alphabets. In Data Compression Conference, 1993.Google Scholar
- M. Zukowski, S. Heman, N. Nes, and P. A. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE, April 2006. Google ScholarDigital Library
Index Terms
- How to barter bits for chronons: compression and bandwidth trade offs for database scans
Recommendations
Lightweight Huffman Coding for Efficient GPU Compression
ICS '23: Proceedings of the 37th International Conference on SupercomputingLossy compression is often deployed in scientific applications to reduce data footprint and improve data transfers and I/O performance. Especially for applications requiring on-the-flight compression, it is essential to minimize compression's runtime. ...
Enhanced Huffman Coding with Encryption for Wireless Data Broadcasting System
IS3C '12: Proceedings of the 2012 International Symposium on Computer, Consumer and ControlData compression has been playing an important role in the areas of data transmission. Many great contributions have been made in this area, such as Huffman coding, LZW algorithm, run length coding, and so on. These methods only focus on the data ...
LOCO-I: a low complexity, context-based, lossless image compression algorithm
DCC '96: Proceedings of the Conference on Data CompressionLOCO-I (low complexity lossless compression for images) is a novel lossless compression algorithm for continuous-tone images which combines the simplicity of Huffman coding with the compression potential of context models, thus "enjoying the best of ...
Comments