skip to main content
article
Free Access

Improving the cache locality of memory allocation

Published:01 June 1993Publication History
Skip Abstract Section

Abstract

The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simualtion of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.

References

  1. 1 Thomas Ball and James R. Larus. Optimally profiling and tracing programs. In Conference Record of the Nineteenth ACM Symposium on Principles of Programming Languages, pages 59-70, January 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2 David Barrett and Benjamin Zorn. Using lifetime predictors to improve memory allocation performance. In $IGPLAN'93 Conference on Programming Language Design and Implementation, Albuquerque, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3 Gerald Bozman. The software lookasize buffer reduces search overhead with linked lists. Communications of the ACM, 27(3):222-227, March 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4 David Callahan, Ken Kennefy, and AUan Porterfield. Software prefetching. In Fourth Intl. Conf. on Arch. Support for Programming Languages and Operating Systems, pages 40-52, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5 John DeTreville. Heap usage in the Topaz environment. Technical Report 63, Digital Equipment Corporation System Research Center, Palo Alto, CA, August 1990.Google ScholarGoogle Scholar
  6. 6 Digital Equipment Corporation. Unix ManualPagefor PIXIE, ULTRIX V4.2 (rev 96) edition, September 1991.Google ScholarGoogle Scholar
  7. 7 A. J. Goldberg and J. Hennessy. Performance debugging shared memory multiprocessor programs with MTOOL. In Proceedings Supercomputing '91, pages 481-491,1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8 Dirk Grunwald and Benjamin Zorn. CUSTOMALI. OC:Efficient Synthesized Memory Allocators. Technical Report CS-CS- 602-92, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, July 1992.Google ScholarGoogle Scholar
  9. 9 Mike Haertel. Description of GNU malloc implementation. Personal communication, August 1991.Google ScholarGoogle Scholar
  10. 10 Mark D. Hill. TYCHO. University of Wisconsin, Madison, WI. Unix manual page.Google ScholarGoogle Scholar
  11. 11 Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the Seventeenth Annual International Symposium on Computer Architecture, pages 364-373, Seattle, WA, May 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12 Chris Kingsley. Description of a very fast storage allocator. Documentation of 4.2 BSD Unix malloc implementation, February 1982.Google ScholarGoogle Scholar
  13. 13 Donald E. Knuth. FundamentaIAlgorithms, volume 1 of The Art of Computer Programming, chapter 2, pages 435--451. Addison Wesley, Reading, MA, 2nd edition, 1973.Google ScholarGoogle Scholar
  14. 14 David G. Korn and Kiem-Phong Vo. In search of a better malloc. InProceedings of the Summer 1985 USENiX Conference, pages 489-506,1985.Google ScholarGoogle Scholar
  15. 15 M.S. Lain, P. W. Wilson, and T. G.Moher. Object type directed garbage collection to improve locality. In Proceedings of the International Workshop on Memory Management, St. Malo, FRANCE, September 1992. Springer Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. 16 Doug Lea. An efficient first-fit memory allocator. (From comments in source and personal communication).Google ScholarGoogle Scholar
  17. 17 Alvin R. Lebeck and David A. Wood. CPROF: A cache performance profiler. Technical report, Computer Sciences Dept., Univ. of Wisconsin---Madison, Iuly 1992.Google ScholarGoogle Scholar
  18. 18 Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM, 26(6):419-429, June 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19 Jeffrey C. Mogul and Anita Borg. The effect of context switches on cache performance. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-1V), pages 75-84, Santa Clara, CA, April 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. 20 David A. Moon. Garbage collection in a large Lisp system. In Conference Record of the 1984 ACM Symposium on LISP and Functional Programming, pages 235-246, Austin, Texas, August 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21 AJ. Smith. Line (block) size choice for CPU cache memories. IEEE Transactions on Computers, 36(9):1063-1075, September 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. 22 Thomas Standish. Data Structures Techniques. Addison- Wesley Publishing Company, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. 23 David Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SIG. SOFT/SIGPLAN Practical Programming Environments Con. ference, pages 157-167, April 1984. Google ScholarGoogle Scholar
  24. 24 Charles B. Weinstock and William A. Wulf. Quickfit: An efficient algorithm for heap storage aIiocation.ACMSIGPLAN Notices, 23(10):141-144, October 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. 25 Paul R. Wilson, Michael S. Lain, and Thomas G. Moher. Caching considerations for generation garbage collection. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming, pages 32-42, San Francisco, CA, June 1992. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. 26 Paul R. Wilson, M.S. Lain, and T.G. Moher. Effective 'static graph' reorganization to improve locality in garbage-coUected systems. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, volume 26, pages 177-191, Toronto, Ontario, Canada, June 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. 27 Benjamin Zorn. Comparative Performance Evaluation of Garbage Collection Algorithms. PhD thesis, University of California at Berkeley, Berkeley, CA, November 1989. Also appears as tech report UCB/CSD 89/544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. 28 Benjamin Zorn. The effect of garbage collection on cache performance. Technical Report CU-CS-528-91, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, May 1991.Google ScholarGoogle ScholarCross RefCross Ref
  29. 29 Benjamin Zorn and Dirk Grunwald. Empirical measurements of six allocation-intensive C programs. SIGPLAN Notices, 27(12):71-80, December 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. 30 Benjamin Zorn and Dirk Grunwald. Evaluating models of memory allocation. TechnicalReport CU-CS-603-92, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, July 1992.Google ScholarGoogle Scholar

Index Terms

  1. Improving the cache locality of memory allocation

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM SIGPLAN Notices
              ACM SIGPLAN Notices  Volume 28, Issue 6
              June 1993
              313 pages
              ISSN:0362-1340
              EISSN:1558-1160
              DOI:10.1145/173262
              Issue’s Table of Contents
              • cover image ACM Conferences
                PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
                August 1993
                313 pages
                ISBN:0897915984
                DOI:10.1145/155090

              Copyright © 1993 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 1 June 1993

              Check for updates

              Qualifiers

              • article

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader