skip to main content
10.1145/3037697.3037736acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Mallacc: Accelerating Memory Allocation

Published:04 April 2017Publication History

ABSTRACT

Recent work shows that dynamic memory allocation consumes nearly 7% of all cycles in Google datacenters. With the trend towards increased specialization of hardware, we propose Mallacc, an in-core hardware accelerator designed for broad use across a number of high-performance, modern memory allocators. The design of Mallacc is quite different from traditional throughput-oriented hardware accelerators. Because memory allocation requests tend to be very frequent, fast, and interspersed inside other application code, accelerators must be optimized for latency rather than throughput and area overheads must be kept to a bare minimum. Mallacc accelerates the three primary operations of a typical memory allocation request: size class computation, retrieval of a free memory block, and sampling of memory usage. Our results show that malloc latency can be reduced by up to 50% with a hardware cost of less than 1500 um2 of silicon area, less than 0.006% of a typical high-performance processor core.

References

  1. Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A Scalable Memory Allocator for Multithreaded Applications. In Proceedings of the Symposium on Architectural Support for Programming Languages and Operating Systems, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Shekhar Borkar and Andrew A Chien. The future of micro-processors. Communications of the ACM, 2011.Google ScholarGoogle Scholar
  3. Hasan Cam, Mostafa Abd-El-Barr, and Sadiq M Sait. A high-performance hardware-efficient memory allocation technique and design. In Computer Design (ICCD), 1999.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Morris Chang and Edward F Gehringer. A high performance memory allocator for object-oriented systems. Transactions on Computers, 1996.Google ScholarGoogle Scholar
  5. J Morris Chang, Witawas Srisa-An, and C-TD Lo. Architectural support for dynamic memory management. In Computer Design (ICCD), 2000.Google ScholarGoogle Scholar
  6. George O Collins Jr. Experience in automatic storage allocation. Communications of the ACM, 1961.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. WT Comfort. Multiword list items. Communications of the ACM, 1964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jason Evans. A Scalable Concurrent malloc Implementation for FreeBSD. In Proceedings of the Technical BSD Conference, 2006.Google ScholarGoogle Scholar
  9. Jason Evans. Scalable memory allocation using jemalloc. https://goo.gl/rvl2oK, 2011.Google ScholarGoogle Scholar
  10. T.B. Ferreira, R. Matias, A. Macedo, and L.B. Araujo. An experimental study on memory allocators in multicore and multithreaded applications. In Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Sanjay Ghemawat and Paul Menage. TCMalloc: Thread-caching malloc. http://goog-perftools.sourceforge.net/doc/tcmalloc.html, 2007.Google ScholarGoogle Scholar
  12. Svilen Kanev, Juan Pablo Darago, Kim Hazelwood, Tipp Parthasarathy, Ranganathan amd Moseley, Gu-Yeon Wei, and David Brooks. Profiling a warehouse-scale computer. In Computer Architecture (ISCA), 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Svilen Kanev, Gu-Yeon Wei, and David Brooks. XIOSim: power-performance modeling of mobile x86 cores. In Low-power electronics and design (ISLPED), 2012.Google ScholarGoogle Scholar
  14. Kenneth C. Knowlton. A fast storage allocator. Communications of the ACM, 8(10), 1965.Google ScholarGoogle Scholar
  15. Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen, and Norman P Jouppi. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In Microarchitecture (MICRO), 2009.Google ScholarGoogle Scholar
  16. Wentong Li, Saraju P Mohanty, and Krishna Kavi. A page-based hybrid (software-hardware) dynamic memory allocator. Computer Architecture Letters (CAL), 2006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Wentong Li, Mehran Rezaei, Krishna Kavi, Afrin Naz, and Philip Sweany. Feasibility of decoupling memory management from the execution pipeline. Journal of Systems Architecture, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yandong Mao, Eddie Kohler, and Robert Morris. Cache craftiness for fast multicore key-value storage. In EuroSys, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. A reconfigurable fabric for accelerating large-scale datacenter services. In Computer Architecture (ISCA), 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wajahat Qadeer, Rehan Hameed, Ofer Shacham, Preethi Venkatesan, Christos Kozyrakis, and Mark A Horowitz. Convolution engine: balancing efficiency & flexibility in specialized computing. In Computer Architecture (ISCA), 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Computer Architecture (ISCA), 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Yakun Sophia Shao, Brandon Reagen, Gu-Yeon Wei, and David Brooks. The Aladdin Approach to Accelerator Design and Modeling. IEEE Micro, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  23. Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder. Automatically characterizing large scale program behavior. In Computer architecture (ISCA), 2002.Google ScholarGoogle Scholar
  24. CJ Stephenson. New methods for dynamic storage allocation (fast fits). In Operating systems principles (SOSP), 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Supreet Jeloka and Naveen Bharathwaj Akesh and Dennis Sylvester and David Blaauw. A 28nm Configurable Memory (TCAM/BCAM/SRAM) Using Push-Rule 6T Bit Cell Enabling Logic-in-Memory. Journal of Solid-State Circuits (JSSC), 2016.Google ScholarGoogle Scholar
  26. M. Tadman. Fast-fit: A new hierarchical dynamic storage allocation technique. Master's thesis, 1978.Google ScholarGoogle Scholar
  27. Paul R Wilson, Mark S Johnston, Michael Neely, and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In International Workshop on Memory Management, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Mallacc: Accelerating Memory Allocation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
        April 2017
        856 pages
        ISBN:9781450344654
        DOI:10.1145/3037697

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 April 2017

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ASPLOS '17 Paper Acceptance Rate53of320submissions,17%Overall Acceptance Rate535of2,713submissions,20%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader