skip to main content
10.1145/2048066.2048092acmconferencesArticle/Chapter ViewAbstractPublication PagessplashConference Proceedingsconference-collections
research-article

Why nothing matters: the impact of zeroing

Published:22 October 2011Publication History

ABSTRACT

Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to 12.7%, with average costs ranging from 2.7 to 4.5% on a high performance virtual machine on IA32 architectures. Zero initialization also incurs indirect costs due to its memory bandwidth demands and cache displacement effects. Existing virtual machines either: a) minimize direct costs by zeroing in large blocks, or b) minimize indirect costs by zeroing in the allocation sequence, which reduces cache displacement and bandwidth. This paper evaluates the two widely used zero initialization designs, showing that they make different tradeoffs to achieve very similar performance. Our analysis inspires three better designs: (1) bulk zeroing with cache-bypassing (non-temporal) instructions to reduce the direct and indirect zeroing costs simultaneously, (2) concurrent non-temporal bulk zeroing that exploits parallel hardware to move work off the application's critical path, and (3) adaptive zeroing, which dynamically chooses between (1) and (2) based on available hardware parallelism. The new software strategies offer speedups sometimes greater than the direct overhead, improving total performance by 3% on average. Our findings invite additional optimizations and microarchitectural support.

References

  1. AMD. Using the x86 Open64 Compiler Suite. Advanced Micro Devices, 2011. URL http://developer.amd.com/assets/x86_open64_user_guide.pdf.Google ScholarGoogle Scholar
  2. S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Languages Design and Implementation, Tucson, AZ, PLDI '08, pages 22--32, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/steveb/research/research-infrastructure/pjbb2005.Google ScholarGoogle Scholar
  4. S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In Proceedings of the 2004 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, New York, NY, SIGMETRICS-Performance '04, pages 25--36, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? High performance garbage collection in Java with MMTk. In Proceedings of the International Conference on Software Engineering, Edinburgh, UK, ICSE '04, pages 137--146, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 18th ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Portland, OR, OOPSLA '06, pages 169--190, Oct. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century. Communications of the ACM, 51 (8): 83--89, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Borkar and A. A. Chien. The future of microprocessors. Communications of the ACM, 54 (5): 67--77, May 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Burger, J. R. Goodman, and A. K\"agi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer architecture, Philadelphia, PA, ISCA '96, pages 78--89, May 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Click. Azul's experiences with hardware/software co-design. Keynote at ECOOP '09, July 2009.Google ScholarGoogle Scholar
  11. P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro, 30 (2): 16 --29, March--April 2010. ISSN 0272--1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In Proceedings of the 4th International Symposium on Memory Management, Vancouver, BC, ISMM '04, pages 37--48, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. GNU. GNU C Library. Free Software Foundation, 2011. URL http://www.gnu.org/software/libc/manual/.Google ScholarGoogle Scholar
  14. N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. Java just-in-time compiler and virtual machine improvements for server and middleware applications. In Proceedings of the 3rd Virtual Machine Research and Technology, San Jose, CA, VM'04, pages 12--12, May 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, Seattle, WA, PACT '06, pages 13--22, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. H. Inoue, H. Komatsu, and T. Nakatani. A study of memory management for web-based applications on multicore processors. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Languages Design and Implementation, Dublin, Ireland, PLDI '09, pages 386--396, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Intel. MMX Technology Developer's Guide. Intel Corporation, Mar. 1996. URL ftp://download.intel.com/ids/mmx/MMX_Manual_Tech_Developers_Guide.pdf.Google ScholarGoogle Scholar
  18. Intel. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation, Apr. 2011. Order Number 248966-024.Google ScholarGoogle Scholar
  19. Intel. Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 2: Instruction Set Reference, A-Z. Intel Corporation, May 2011. Order Number 325383-039US.Google ScholarGoogle Scholar
  20. Intel. Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 3: Systems Programming Guide. Intel Corporation, May 2011. Order Number 325384-039US.Google ScholarGoogle Scholar
  21. N. P. Jouppi. Cache write policies and performance. In Proceedings of the 20th Annual International Symposium on Computer architecture, San Diego, CA, ISCA '93, pages 191--201, May 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Kalla, B. Sinharoy, W. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30 (2): 7--15, March--April 2010. ISSN 0272--1732. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. B. Kessler. Java HotSpot virtual machine. Talk at FOSDEM-2007, Feb. 2007.Google ScholarGoogle Scholar
  24. C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, Bangalore, India, HPCA-10, pages 176--185, Feb. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Molka, D. Hackenberg, R. Schone, and M. S. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Raleigh, NC, PACT '09, pages 261--270, Sept. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Languages Design and Implementation, San Diego, CA, PLDI '07, pages 1--11, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Oracle Corporation. Java bug 6977804: G1:remove the zero-filling thread. URL http://bugs.sun.com/view_bug.do?bug_id=6977804.Google ScholarGoogle Scholar
  28. B. Rogers, A. Krishna, G. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. In Proceedings of the 36th Annual International Symposium on Computer architecture, Austin, TX, ISCA '09, pages 371--382, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams. URL https://issues.apache.org/jira/browse/LUCENE-1800.Google ScholarGoogle Scholar
  30. E. Sikha, R. Simpson, C. May, and H. Warren. The PowerPC Architecture: A Specification for a New Family of RISC Processors. Morgan Kaufmann Publishers, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SPEC. SPECjvm98, Release 1.03. Standard Performance Evaluation Corporation, Mar. 1999. URL http://www.spec.org/jvm98.Google ScholarGoogle Scholar
  32. SPEC. SPECjbb2005 (Java Server Benchmark), Release 1.07. Standard Performance Evaluation Corporation, 2006. URL http://www.spec.org/jbb2005.Google ScholarGoogle Scholar
  33. C. Yu and P. Petrov. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, DAC '10, pages 132--137, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Zhao, J. Shi, K. Zheng, H. Wang, H. Lin, and L. Shao. Allocation wall: A limiting factor of Java applications on emerging multi-core platforms. In Proceedings of the 21st ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Orlando, FL, OOPSLA '09, pages 361--376, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Why nothing matters: the impact of zeroing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      OOPSLA '11: Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
      October 2011
      1104 pages
      ISBN:9781450309400
      DOI:10.1145/2048066
      • cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 46, Issue 10
        OOPSLA '11
        October 2011
        1063 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2076021
        Issue’s Table of Contents

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 22 October 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate268of1,244submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader