skip to main content
10.1145/1247480.1247525acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

How to barter bits for chronons: compression and bandwidth trade offs for database scans

Published:11 June 2007Publication History

ABSTRACT

Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query processing workload, and second, large memories and compression are making table scans CPU, rather than disk bandwidth, bound. Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical partitions of a denormalized schema.

In this paper we present a study of how to make such scans faster by the use of a scan code generator that produces code tuned to the database schema, the compression dictionaries, the queries being evaluated and the target CPU architecture. We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples. We present a detailed experimental study of the performance of generated scans against these compression formats, and use this to explore the trade off between compression quality and scan speed. We also introduce new strategies for removing instruction-level dependencies and increasing instruction-level parallelism, allowing for greater exploitation of multi-issue processors.

References

  1. D. Abadi, S. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Antoshenkov, D. Lomet, and J. Murray. Order preserving string compression. In ICDE, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.Google ScholarGoogle Scholar
  4. Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormack. Data compresson on a database system. CACM, 28(12), 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In ICDE, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Graefe and L. D. Shapiro. Data compression and database performance. In Proc. ACM/IEEE-CS Symp. on Applied Computing, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  8. S. Harizopoulos, V. Liang, D. Abadi, and S. Madden. Performance tradeoffs in read-optimized databases. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Huffman. A method for the construction of minimum-redundancy codes. In Proceedings of the I.R.E., pages 1098--1102, 1952.Google ScholarGoogle ScholarCross RefCross Ref
  10. B. R. Iyer and D. Wilhite. Data compression support in databases. In VLDB, 1994.Google ScholarGoogle Scholar
  11. T. Lehman and M. Carey. Query processing in main memory database management systems. In SIGMOD, 1986.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. MacNicol and B. French. Sybase IQ Multiplex-Designed for analytics. In VLDB, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. J. O'Connell and N. Winterbottom. Performing joins without decompression in a compressed database system. SIGMOD Rec., 32(1), 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. MPöss and D. Potapov. Data compression in Oracle. In VLDB, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. Raman and G. Swart. Entropy compression of relations and querying of compressed relations. In VLDB, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Ray, J. Haritsa, and S. Seshadri. Database compression: A performance enhancement tool. In COMAD, 1995.Google ScholarGoogle Scholar
  17. M. Stonebraker et al. C-store: a column-oriented DBMS. In VLDB, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Stonebraker, E. Wong, P. Kreps, and G. Held. The design and implementation of INGRES. ACM Transactions on Database Systems, 1(3):189--222, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29(3):55--67, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Zandi, B. Iyer, and G. Langdon. Sort order preserving data compression for extended alphabets. In Data Compression Conference, 1993.Google ScholarGoogle Scholar
  21. M. Zukowski, S. Heman, N. Nes, and P. A. Boncz. Super-Scalar RAM-CPU Cache Compression. In ICDE, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How to barter bits for chronons: compression and bandwidth trade offs for database scans

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data
      June 2007
      1210 pages
      ISBN:9781595936868
      DOI:10.1145/1247480
      • General Chairs:
      • Lizhu Zhou,
      • Tok Wang Ling,
      • Program Chair:
      • Beng Chin Ooi

      Copyright © 2007 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2007

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader