skip to main content
research-article
Open access

Could Compression Be of General Use? Evaluating Memory Compression across Domains

Published: 05 December 2017 Publication History

Abstract

Recent proposals present compression as a cost-effective technique to increase cache and memory capacity and bandwidth. While these proposals show potentials of compression, there are several open questions to adopt these proposals in real systems including the following: (1) Do these techniques work for real-world workloads running for long time? (2) Which application domains would potentially benefit the most from compression? (3) At which level of memory hierarchy should we apply compression: caches, main memory, or both?
In this article, our goal is to shed light on some main questions on applicability of compression. We evaluate compression in the memory hierarchy for selected examples from different application classes. We analyze real applications with real data and complete runs of several benchmarks. While simulators provide a pretty accurate framework to study potential performance/energy impacts of ideas, they mostly limit us to a small range of workloads with short runtimes. To enable studying real workloads, we introduce a fast and simple methodology to get samples of memory and cache contents of a real machine (a desktop or a server). Compared to a cycle-accurate simulator, our methodology allows us to study real workloads as well as benchmarks. Our toolset is not a replacement for simulators but mostly complements them. While we can use a simulator to measure performance/energy impact of a particular compression proposal, here with our methodology we can study the potentials with long running workloads in early stages of the design.
Using our toolset, we evaluate a collection of workloads from different domains, such as a web server of CS department of UW—Madison for 24h, Google Chrome (watching a 1h-long movie on YouTube), and Linux games (playing for about an hour). We also use several benchmarks from different domains, including SPEC, mobile, and big data. We run these benchmarks to completion.
Using these workloads and our toolset, we analyze different compression properties for both real applications and benchmarks. We focus on eight main hypotheses on compression, derived from previous work on compression. These properties (Table 2) act as foundation of several proposals on compression, so performance of those proposals depends very much on these basic properties.
Overall, our results suggest that compression could be of general use both in main memory and caches. On average, the compression ratio is ≥2 for 64% and 54% of workloads, respectively, for memory and cache data. Our evaluation indicates significant potential for both cache and memory compression, with higher compressibility in memory due to abundance of zero blocks. Among application domains we studied, servers show on average the highest compressibility, while our mobile benchmarks show the lowest compressibility.
For comparing benchmarks with real workloads, we show that (1) it is critical to run benchmarks to completion or considerably long runtimes to avoid biased conclusions, and (2) SPEC benchmarks are good representative of real Desktop applications in terms of compressibility of their datasets. However, this does not hold for all compression properties. For example, SPEC benchmarks have much better compression locality (i.e., neighboring blocks have similar compressibility) than real workloads. Thus, it is critical for designers to consider wider range of workloads, including real applications, to evaluate their compression techniques.

Supplementary Material

TACO1404-44 (taco1404-44.pdf)
Slide deck associated with this paper

References

[1]
B. Abali, H. Franke, X. Shen, D. Poff, and T. Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th IEEE Symposium on High-Performance Computer Architecture.
[2]
Alaa R. Alameldeen and David A. Wood. 2004. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA-31).
[3]
Apple OS X Mavericks. 2013. Retrieved from http://www.apple.com/media/us/osx/2013/docs/OSX_Mavericks_Core_Technology_Overview.pdf.
[4]
Angelos Arelakis and P. Stenstrom. 2014. Sc2: A statistical compression cache scheme. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA’14). IEEE Press, 145--156.
[5]
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2013. ECM: Effective capacity maximizer for high-performance compressed caching. In Proceedings of IEEE Symposium on High-Performance Computer Architecture.
[6]
Á. Beszédes, R. Ferenc, T. Gyimóthy, A. Dolenc, and K. Karsisto. 2003. Survey of code-size reduction methods. ACM Comput. Surv. 35, 3 (2003), 223--267.
[7]
N. Binkert, B. Beckmann, G. Black, S. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. Hill, and D. Wood. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News. 1--7.
[8]
M. Burtscher and P. Ratanaworabhan. 2007. High throughput compression of double-precision floating-point data. Data Compression Conference.
[9]
M. Burtscher and P. Ratanaworabhan. 2010. gFPC: A self-tuning compression algorithm. In Proceedings of the Data Compression Conference.
[10]
I. Chen, P. Bird, and T. Mudge. 1997. The impact of instruction compression on I-cache performance. Tech. Rep. CSE-TR-330--97, EECS Department, University of Michigan.
[11]
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Transactions on Very Large Scale Integration (VLSI) Systems. 1196--1208.
[12]
Yann Collet and Chip Turner. 2016. Facebook zstandard compression: Smaller and faster data compression with zstandard. Retrieved from https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/.
[13]
Coremark. Retrieved from www.coremark.org.
[14]
Arelakis F. Dahlgren and P. Stenstrom. 2015. Hycomp: A hybrid cache compression method for selection of data-type-specific compression methods. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). ACM, 38--49.
[15]
Julien Dusser and Andre Seznec. 2011. Decoupled zero-compressed memory. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers.
[16]
M. Ekman and P. Stenstrom. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. 74--85.
[17]
E. Hallnor and S. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture.
[18]
M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. Popescu, A. Ailamaki, and B. Falsafi. 2012. Clearing the clouds: A study of emerging scale-out workloads on modern hardware. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’12).
[19]
J. Gandhi, A. Basu, M. Hill, and M. Swift 2014. BadgerTrap: A tool to instrument x86-64 TLB misses. SIGARCH Computer Architecture News (CAN), 2014
[20]
Jayesh Gaur, Alaa R. Alameldeen, and Sreenivas Subramoney. 2016. Base-victim compression: An opportunistic cache compression architecture. In Proceedings of the 43th Annual International Symposium on Computer Architecture (ISCA’16).
[21]
R. C. Murphy, K. B. Wheeler, B. W. Barrett, and J. A. Ang. 2010. Introducing the Graph 500. Cray User Group 2010 Proceedings.
[22]
A. Gutierrez, R. Dreslinski, T. Wenisch, T. Mudge, A. Saidi, C. Emmons, and N. Paver. 2011. Full-system analysis and characterization of interactive smartphone applications. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'11).
[23]
G. Hamerly, E. Perelman, J. Lau, and B. Calder. 2005. SimPoint 3.0: Faster and more flexible program analysis. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation.
[24]
Y. Jin and R. Chen 2000. Instruction Cache Compression for Embedded Systems. Berkley Wireless Research Center,” Technical Report, 2000.
[25]
K. Kant and R. Iyer. 2002. Compressibility characteristics of address/data transfers in commercial workloads. In Proceedings of the 5th Workshop on Computer Architecture Evaluation Using Commercial Workloads. 59--67.
[26]
Nam Sung Kim, Todd Austin, and Trevor Mudge. 2002. Low-energy data cache using sign compression and cache line bisection. In Proceedings of the 2nd Annual Workshop on Memory Performance Issues.
[27]
Soontae Kim, Jesung Kim, Jongmin Lee, and Seokin Hong. 2011. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.
[28]
Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. 2016. Bit-plane compression: Transforming data for better compression in many-core architectures. In Proceedings of the 43th Annual International Symposium on Computer Architecture (ISCA’16)
[29]
Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. Journal of Systems Architecture: The EUROMICRO Journal 46, 15 (2000), 1365--1382. 2000.
[30]
Sangpil Lee, Keunsoo Kim, Gunjae Koo, Hyeran Jeon, Won Woo Ro, and Murali Annavaram. 2015. Warped-compression: Enabling power efficient GPUs through register compression. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’15).
[31]
C. Lefurgy, P. Bird, I. Chen, and T. Mudge. 1997. Improving code density using compression techniques. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture. 194--203.
[32]
N. R. Mahapatra, J. Liu, K. Sundaresan, S. Dangeti, and B. V. Venkatrao. 2003. The potential of compression to improve memory system performance, power consumption, and cost. In Proceedings of IEEE Performance, Computing and Communications Conference.
[33]
N. R. Mahapatra, J. Liu, K. Sundaresan, S. Dangeti, and B. V. Venkatrao 2005. A limit study on the potential of compression for improving memory system performance, power consumption, and cost. J. Instruct.-Level Parallel. 7 (2005), 1--37.
[34]
Sparsh Mittal and Jeffrey S. Vetter. 2016. A survey of architectural approaches for data compression in cache and main memory systems. IEEE Transactions on Parallel and Distributed Systems, 2016.
[35]
Tri M. Nguyen and David Wentzlaff. 2015. MORC: A manycore-oriented compressed cache. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’15).
[36]
Poovaiah M. Palangappa and Kartik Mohanram. 2016. CompEx: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVM. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16).
[37]
Poovaiah M. Palangappa and Kartik Mohanram. 2017, CompEx++: Compression-expansion coding for energy, latency, and lifetime improvements in MLC/TLC NVMs. ACM Transactions on Architecture and Code Optimization (TACO), 2017.
[38]
Biswabandan Panda (INRIA) and André Seznec. 2016. Dictionary sharing: An efficient cache compression scheme for compressed caches. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2016.
[39]
Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler. 2016. A case for toggle-aware compression for GPU systems. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’16), 2016.
[40]
G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63.
[41]
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques (PACT'12). ACM, New York, NY, 377--388.
[42]
Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. Linearly compressed pages: A low-complexity, low-latency main memory compression framework. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture, 2013.
[43]
P. Ratanaworabhan, J. Ke, and M. Burtscher. 2006. Fast lossless compression of scientific floating-point data. In Proceedings of the Data Compression Conference.
[44]
Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the Annual IEEE/ACM International Symposium on Microarchitecture.
[45]
Somayeh Sardashti, Angelos Arelakis, Per Stenstrom, and David A. Wood. 2015. A primer on compression in the memory hierarchy. Morgan and Claypool.
[46]
Somayeh Sardashti, Andre Seznec, and David A. Wood. 2014. Skewed compressed caches. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47).
[47]
Somayeh Sardashti, Andre Seznec, and David A. Wood. 2016. Yet another compressed cache: A low-cost yet effective compressed cache. ACM Transactions on Architecture and Code Optimization (TACO), 2016.
[48]
Vijay Sathish, Michael J. Schulte, and Nam Sung Kim. 2012. Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques.
[49]
Ali Shafiee, Meysam Taassori, Rajeev Balasubramonian, and Al Davis. 2014. Memzip: Exploiting unconventional benefits from memory compression. In Proceedings of the IEEE International Symposium on High Performance Computer Architecture (HPCA’14).
[50]
Luis Villa, Michael Zhang, and Krste Asanovic. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture.
[51]
Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu. 2015. A case for core-assisted bottleneck acceleration in GPUs: Enabling flexible data compression with assist warps. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’15).
[52]
C. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. Steely, and J. Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture.
[53]
Jun Yang and Rajiv Gupta. 2002. Frequent value locality and its applications. ACM Trans. Embed. Comput. Syst. 2002.
[54]
Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO’00).
[55]
D. Yoon, M. Jeong, and M. Erez. 2011. Adaptive granularity memory systems: A tradeoff between storage efficiency and throughput. In Proceeding of the 38th Annual International Symposium on Computer Architecture.
[56]
Vinson Young, Prashant J. Nair, Moinuddin K. Qureshi. 2017. DICE: Compressing DRAM Caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA’17).

Cited By

View all
  • (2024)SoK: Compression in Rollups2024 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC59979.2024.10634469(712-728)Online publication date: 27-May-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Could Compression Be of General Use? Evaluating Memory Compression across Domains

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 4
    December 2017
    600 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3154814
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2017
    Accepted: 01 September 2017
    Revised: 01 July 2017
    Received: 01 June 2016
    Published in TACO Volume 14, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Compression
    2. cache and memory design
    3. energy efficiency
    4. multi-core systems
    5. performance

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)145
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 07 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SoK: Compression in Rollups2024 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)10.1109/ICBC59979.2024.10634469(712-728)Online publication date: 27-May-2024
    • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
    • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
    • (2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
    • (2021)CID: Co-Architecting Instruction Cache and Decompression System for Embedded SystemsIEEE Transactions on Computers10.1109/TC.2020.301006270:7(1132-1145)Online publication date: 1-Jul-2021
    • (2020)Lossless Compression Techniques for Low Bandwidth Networks2020 3rd International Conference on Intelligent Sustainable Systems (ICISS)10.1109/ICISS49785.2020.9315936(823-828)Online publication date: 3-Dec-2020
    • (2019)TouchéProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3352460.3358281(453-465)Online publication date: 12-Oct-2019
    • (2019)Exploiting Adaptive Data Compression to Improve Performance and Energy-Efficiency of Compute Workloads in Multi-GPU Systems2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2019.00075(664-674)Online publication date: May-2019
    • (2019)Enabling Transparent Memory-Compression for Commodity Memory Systems2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2019.00010(570-581)Online publication date: Feb-2019
    • (2018)CompressoProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00051(546-558)Online publication date: 20-Oct-2018
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media