ABSTRACT
Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to 12.7%, with average costs ranging from 2.7 to 4.5% on a high performance virtual machine on IA32 architectures. Zero initialization also incurs indirect costs due to its memory bandwidth demands and cache displacement effects. Existing virtual machines either: a) minimize direct costs by zeroing in large blocks, or b) minimize indirect costs by zeroing in the allocation sequence, which reduces cache displacement and bandwidth. This paper evaluates the two widely used zero initialization designs, showing that they make different tradeoffs to achieve very similar performance. Our analysis inspires three better designs: (1) bulk zeroing with cache-bypassing (non-temporal) instructions to reduce the direct and indirect zeroing costs simultaneously, (2) concurrent non-temporal bulk zeroing that exploits parallel hardware to move work off the application's critical path, and (3) adaptive zeroing, which dynamically chooses between (1) and (2) based on available hardware parallelism. The new software strategies offer speedups sometimes greater than the direct overhead, improving total performance by 3% on average. Our findings invite additional optimizations and microarchitectural support.
- AMD. Using the x86 Open64 Compiler Suite. Advanced Micro Devices, 2011. URL http://developer.amd.com/assets/x86_open64_user_guide.pdf.Google Scholar
- S. M. Blackburn and K. S. McKinley. Immix: A mark-region garbage collector with space efficiency, fast collection, and mutator performance. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Languages Design and Implementation, Tucson, AZ, PLDI '08, pages 22--32, June 2008. Google ScholarDigital Library
- S. M. Blackburn, M. Hirzel, R. Garner, and D. Stefanović. pjbb2005: The pseudojbb benchmark. URL http://users.cecs.anu.edu.au/steveb/research/research-infrastructure/pjbb2005.Google Scholar
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Myths and realities: The performance impact of garbage collection. In Proceedings of the 2004 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, New York, NY, SIGMETRICS-Performance '04, pages 25--36, June 2004. Google ScholarDigital Library
- S. M. Blackburn, P. Cheng, and K. S. McKinley. Oil and water? High performance garbage collection in Java with MMTk. In Proceedings of the International Conference on Software Engineering, Edinburgh, UK, ICSE '04, pages 137--146, May 2004. Google ScholarDigital Library
- S. M. Blackburn, R. Garner, C. Hoffman, A. M. Khan, K. S. McKinley, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. The DaCapo benchmarks: Java benchmarking development and analysis. In Proceedings of the 18th ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Portland, OR, OOPSLA '06, pages 169--190, Oct. 2006. Google ScholarDigital Library
- S. M. Blackburn, K. S. McKinley, R. Garner, C. Hoffman, A. M. Khan, R. Bentzur, A. Diwan, D. Feinberg, D. Frampton, S. Z. Guyer, M. Hirzel, A. Hosking, M. Jump, H. Lee, J. E. B. Moss, A. Phansalkar, D. Stefanović, T. VanDrunen, D. von Dincklage, and B. Wiedermann. Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century. Communications of the ACM, 51 (8): 83--89, Aug. 2008. Google ScholarDigital Library
- S. Borkar and A. A. Chien. The future of microprocessors. Communications of the ACM, 54 (5): 67--77, May 2011. Google ScholarDigital Library
- D. Burger, J. R. Goodman, and A. K\"agi. Memory bandwidth limitations of future microprocessors. In Proceedings of the 23rd Annual International Symposium on Computer architecture, Philadelphia, PA, ISCA '96, pages 78--89, May 1996. Google ScholarDigital Library
- C. Click. Azul's experiences with hardware/software co-design. Keynote at ECOOP '09, July 2009.Google Scholar
- P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro, 30 (2): 16 --29, March--April 2010. ISSN 0272--1732. Google ScholarDigital Library
- D. Detlefs, C. Flood, S. Heller, and T. Printezis. Garbage-first garbage collection. In Proceedings of the 4th International Symposium on Memory Management, Vancouver, BC, ISMM '04, pages 37--48, Oct. 2004. Google ScholarDigital Library
- GNU. GNU C Library. Free Software Foundation, 2011. URL http://www.gnu.org/software/libc/manual/.Google Scholar
- N. Grcevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan. Java just-in-time compiler and virtual machine improvements for server and middleware applications. In Proceedings of the 3rd Virtual Machine Research and Technology, San Jose, CA, VM'04, pages 12--12, May 2004. Google ScholarDigital Library
- L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, Seattle, WA, PACT '06, pages 13--22, Sept. 2006. Google ScholarDigital Library
- H. Inoue, H. Komatsu, and T. Nakatani. A study of memory management for web-based applications on multicore processors. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Languages Design and Implementation, Dublin, Ireland, PLDI '09, pages 386--396, June 2009. Google ScholarDigital Library
- Intel. MMX Technology Developer's Guide. Intel Corporation, Mar. 1996. URL ftp://download.intel.com/ids/mmx/MMX_Manual_Tech_Developers_Guide.pdf.Google Scholar
- Intel. Intel 64 and IA-32 Architectures Optimization Reference Manual. Intel Corporation, Apr. 2011. Order Number 248966-024.Google Scholar
- Intel. Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 2: Instruction Set Reference, A-Z. Intel Corporation, May 2011. Order Number 325383-039US.Google Scholar
- Intel. Intel 64 and IA-32 Architectures, Software Developer's Manual, Volume 3: Systems Programming Guide. Intel Corporation, May 2011. Order Number 325384-039US.Google Scholar
- N. P. Jouppi. Cache write policies and performance. In Proceedings of the 20th Annual International Symposium on Computer architecture, San Diego, CA, ISCA '93, pages 191--201, May 1993. Google ScholarDigital Library
- R. Kalla, B. Sinharoy, W. Starke, and M. Floyd. Power7: IBM's next-generation server processor. IEEE Micro, 30 (2): 7--15, March--April 2010. ISSN 0272--1732. Google ScholarDigital Library
- P. B. Kessler. Java HotSpot virtual machine. Talk at FOSDEM-2007, Feb. 2007.Google Scholar
- C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the 10th International Symposium on High Performance Computer Architecture, Bangalore, India, HPCA-10, pages 176--185, Feb. 2004. Google ScholarDigital Library
- D. Molka, D. Hackenberg, R. Schone, and M. S. Muller. Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques, Raleigh, NC, PACT '09, pages 261--270, Sept. 2009. Google ScholarDigital Library
- G. Novark, E. D. Berger, and B. G. Zorn. Exterminator: automatically correcting memory errors with high probability. In Proceedings of the 2007 ACM SIGPLAN Conference on Programming Languages Design and Implementation, San Diego, CA, PLDI '07, pages 1--11, June 2007. Google ScholarDigital Library
- Oracle Corporation. Java bug 6977804: G1:remove the zero-filling thread. URL http://bugs.sun.com/view_bug.do?bug_id=6977804.Google Scholar
- B. Rogers, A. Krishna, G. Bell, K. Vu, X. Jiang, and Y. Solihin. Scaling the bandwidth wall: Challenges in and avenues for cmp scaling. In Proceedings of the 36th Annual International Symposium on Computer architecture, Austin, TX, ISCA '09, pages 371--382, June 2009. Google ScholarDigital Library
- Y. Seeley. JIRA issue LUCENE-1800: QueryParser should use reusable token streams. URL https://issues.apache.org/jira/browse/LUCENE-1800.Google Scholar
- E. Sikha, R. Simpson, C. May, and H. Warren. The PowerPC Architecture: A Specification for a New Family of RISC Processors. Morgan Kaufmann Publishers, 1994. Google ScholarDigital Library
- SPEC. SPECjvm98, Release 1.03. Standard Performance Evaluation Corporation, Mar. 1999. URL http://www.spec.org/jvm98.Google Scholar
- SPEC. SPECjbb2005 (Java Server Benchmark), Release 1.07. Standard Performance Evaluation Corporation, 2006. URL http://www.spec.org/jbb2005.Google Scholar
- C. Yu and P. Petrov. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms. In Proceedings of the 47th Design Automation Conference, Anaheim, CA, DAC '10, pages 132--137, June 2010. Google ScholarDigital Library
- Y. Zhao, J. Shi, K. Zheng, H. Wang, H. Lin, and L. Shao. Allocation wall: A limiting factor of Java applications on emerging multi-core platforms. In Proceedings of the 21st ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages, and Applications, Orlando, FL, OOPSLA '09, pages 361--376, 2009. Google ScholarDigital Library
Index Terms
- Why nothing matters: the impact of zeroing
Recommendations
Why nothing matters: the impact of zeroing
OOPSLA '11Memory safety defends against inadvertent and malicious misuse of memory that may compromise program correctness and security. A critical element of memory safety is zero initialization. The direct cost of zero initialization is surprisingly high: up to ...
FreeGuard: A Faster Secure Heap Allocator
CCS '17: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications SecurityIn spite of years of improvements to software security, heap-related attacks still remain a severe threat. One reason is that many existing memory allocators fall short in a variety of aspects. For instance, performance-oriented allocators are designed ...
CrypTag: Thwarting Physical and Logical Memory Vulnerabilities using Cryptographically Colored Memory
ASIA CCS '21: Proceedings of the 2021 ACM Asia Conference on Computer and Communications SecurityMemory vulnerabilities are a major threat to many computing systems. To effectively thwart spatial and temporal memory vulnerabilities, full logical memory safety is required. However, current mitigation techniques for memory safety are either too ...
Comments