ABSTRACT
Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters. With the trend towards increased specialization of hardware, we propose Mallacc, an in-core hardware accelerator designed for broad use across a number of high-performance, modern memory allocators. The design of Mallacc is quite different from traditional throughput-oriented hardware accelerators. Because memory allocation requests tend to be very frequent, fast, and interspersed inside other application code, accelerators must be optimized for latency rather than throughput and area overheads must be kept to a bare minimum. Mallacc accelerates the three primary operations of a typical memory allocation request: size class computation, retrieval of a free memory block, and sampling of memory usage. Our results show that malloc latency can be reduced by up to 50% with a hardware cost of less than 1500 um2 of silicon area, less than 0.006% of a typical high-performance processor core.
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, 2000.Google ScholarDigital Library
- Shekhar Borkar and Andrew A Chien. The future of micro-processors. Communications of the ACM, 2011.Google Scholar
- Hasan Cam, Mostafa Abd-El-Barr, and Sadiq M Sait. A high-performance hardware-efficient memory allocation technique and design. In Computer Design (ICCD), 1999.Google ScholarCross Ref
- J. Morris Chang and Edward F Gehringer. A high performance memory allocator for object-oriented systems. Transactions on Computers, 1996.Google Scholar
- J Morris Chang, Witawas Srisa-An, and C-TD Lo. Architectural support for dynamic memory management. In Computer Design (ICCD), 2000.Google Scholar
- George O Collins Jr. Experience in automatic storage allocation. Communications of the ACM, 1961.Google ScholarDigital Library
- WT Comfort. Multiword list items. Communications of the ACM, 1964. Google ScholarDigital Library
- Jason Evans. A Scalable Concurrent malloc Implementation for FreeBSD. In Proceedings of the Technical BSD Conference, 2006.Google Scholar
- Jason Evans. Scalable memory allocation using jemalloc. https://goo.gl/rvl2oK, 2011.Google Scholar
- T.B. Ferreira, R. Matias, A. Macedo, and L.B. Araujo. An experimental study on memory allocators in multicore and multithreaded applications. In Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2011. Google ScholarDigital Library
- Sanjay Ghemawat and Paul Menage. TCMalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html, 2007.Google Scholar
- Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Tipp Parthasarathy, Ranganathan amd Moseley, Gu-Yeon Wei, and David Brooks. Profiling a warehouse-scale computer. In Computer Architecture (ISCA), 2015.Google ScholarDigital Library
- Svilen Kanev, Gu-Yeon Wei, and David Brooks. XIOSim: power-performance modeling of mobile x86 cores. In Low-power electronics and design (ISLPED), 2012.Google Scholar
- Kenneth C. Knowlton. A fast storage allocator. Communications of the ACM, 8(10), 1965.Google Scholar
- Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen, and Norman P Jouppi. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Microarchitecture (MICRO), 2009.Google Scholar
- Wentong Li, Saraju P Mohanty, and Krishna Kavi. A page-based hybrid (software-hardware) dynamic memory allocator. Computer Architecture Letters (CAL), 2006.Google ScholarDigital Library
- Wentong Li, Mehran Rezaei, Krishna Kavi, Afrin Naz, and Philip Sweany. Feasibility of decoupling memory management from the execution pipeline. Journal of Systems Architecture, 2007. Google ScholarDigital Library
- Yandong Mao, Eddie Kohler, and Robert Morris. Cache craftiness for fast multicore key-value storage. In EuroSys, 2012.Google ScholarDigital Library
- Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Computer Architecture (ISCA), 2014.Google ScholarDigital Library
- Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A Horowitz. Convolution engine: balancing efficiency & flexibility in specialized computing. In Computer Architecture (ISCA), 2013.Google ScholarDigital Library
- Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Computer Architecture (ISCA), 2016.Google ScholarDigital Library
- Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. The Aladdin Approach to Accelerator Design and Modeling. IEEE Micro, 2015. Google ScholarCross Ref
- Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. Automatically characterizing large scale program behavior. In Computer architecture (ISCA), 2002.Google Scholar
- CJ Stephenson. New methods for dynamic storage allocation (fast fits). In Operating systems principles (SOSP), 1983. Google ScholarDigital Library
- Supreet Jeloka and Naveen Bharathwaj Akesh and Dennis Sylvester and David Blaauw. A 28nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory. Journal of Solid-State Circuits (JSSC), 2016.Google Scholar
- M. Tadman. Fast-fit: A new hierarchical dynamic storage allocation technique. Master's thesis, 1978.Google Scholar
- Paul R Wilson, Mark S Johnston, Michael Neely, and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In International Workshop on Memory Management, 1995.Google ScholarDigital Library
Index Terms
Mallacc: Accelerating Memory Allocation
Recommendations
Mallacc: Accelerating Memory Allocation
Asplos'17Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters. With the trend towards increased specialization of hardware, we propose Mallacc, an in-core hardware accelerator designed for broad use across a ...
Mallacc: Accelerating Memory Allocation
ASPLOS '17Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters. With the trend towards increased specialization of hardware, we propose Mallacc, an in-core hardware accelerator designed for broad use across a ...
Performance characterization of a DRAM-NVM hybrid memory architecture for HPC applications using intel optane DC persistent memory modules
MEMSYS '19: Proceedings of the International Symposium on Memory SystemsNon-volatile, byte-addressable memory (NVM) has been introduced by Intel in the form of NVDIMMs named Intel® Optane™ DC PMM. This memory module has the ability to persist the data stored in it without the need for power. This expands the memory ...
Comments