Abstract
The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simualtion of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.
- 1 Thomas Ball and James R. Larus. Optimally profiling and tracing programs. In Conference Record of the Nineteenth ACM Symposium on Principles of Programming Languages, pages 59-70, January 1992. Google ScholarDigital Library
- 2 David Barrett and Benjamin Zorn. Using lifetime predictors to improve memory allocation performance. In $IGPLAN'93 Conference on Programming Language Design and Implementation, Albuquerque, June 1993. Google ScholarDigital Library
- 3 Gerald Bozman. The software lookasize buffer reduces search overhead with linked lists. Communications of the ACM, 27(3):222-227, March 1984. Google ScholarDigital Library
- 4 David Callahan, Ken Kennefy, and AUan Porterfield. Software prefetching. In Fourth Intl. Conf. on Arch. Support for Programming Languages and Operating Systems, pages 40-52, April 1991. Google ScholarDigital Library
- 5 John DeTreville. Heap usage in the Topaz environment. Technical Report 63, Digital Equipment Corporation System Research Center, Palo Alto, CA, August 1990.Google Scholar
- 6 Digital Equipment Corporation. Unix ManualPagefor PIXIE, ULTRIX V4.2 (rev 96) edition, September 1991.Google Scholar
- 7 A. J. Goldberg and J. Hennessy. Performance debugging shared memory multiprocessor programs with MTOOL. In Proceedings Supercomputing '91, pages 481-491,1991. Google ScholarDigital Library
- 8 Dirk Grunwald and Benjamin Zorn. CUSTOMALI. OC:Efficient Synthesized Memory Allocators. Technical Report CS-CS- 602-92, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, July 1992.Google Scholar
- 9 Mike Haertel. Description of GNU malloc implementation. Personal communication, August 1991.Google Scholar
- 10 Mark D. Hill. TYCHO. University of Wisconsin, Madison, WI. Unix manual page.Google Scholar
- 11 Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the Seventeenth Annual International Symposium on Computer Architecture, pages 364-373, Seattle, WA, May 1990. Google ScholarDigital Library
- 12 Chris Kingsley. Description of a very fast storage allocator. Documentation of 4.2 BSD Unix malloc implementation, February 1982.Google Scholar
- 13 Donald E. Knuth. FundamentaIAlgorithms, volume 1 of The Art of Computer Programming, chapter 2, pages 435--451. Addison Wesley, Reading, MA, 2nd edition, 1973.Google Scholar
- 14 David G. Korn and Kiem-Phong Vo. In search of a better malloc. InProceedings of the Summer 1985 USENiX Conference, pages 489-506,1985.Google Scholar
- 15 M.S. Lain, P. W. Wilson, and T. G.Moher. Object type directed garbage collection to improve locality. In Proceedings of the International Workshop on Memory Management, St. Malo, FRANCE, September 1992. Springer Verlag. Google ScholarDigital Library
- 16 Doug Lea. An efficient first-fit memory allocator. (From comments in source and personal communication).Google Scholar
- 17 Alvin R. Lebeck and David A. Wood. CPROF: A cache performance profiler. Technical report, Computer Sciences Dept., Univ. of Wisconsin---Madison, Iuly 1992.Google Scholar
- 18 Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM, 26(6):419-429, June 1983. Google ScholarDigital Library
- 19 Jeffrey C. Mogul and Anita Borg. The effect of context switches on cache performance. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-1V), pages 75-84, Santa Clara, CA, April 1991. Google ScholarDigital Library
- 20 David A. Moon. Garbage collection in a large Lisp system. In Conference Record of the 1984 ACM Symposium on LISP and Functional Programming, pages 235-246, Austin, Texas, August 1984. Google ScholarDigital Library
- 21 AJ. Smith. Line (block) size choice for CPU cache memories. IEEE Transactions on Computers, 36(9):1063-1075, September 1987. Google ScholarDigital Library
- 22 Thomas Standish. Data Structures Techniques. Addison- Wesley Publishing Company, 1980. Google ScholarDigital Library
- 23 David Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SIG. SOFT/SIGPLAN Practical Programming Environments Con. ference, pages 157-167, April 1984. Google Scholar
- 24 Charles B. Weinstock and William A. Wulf. Quickfit: An efficient algorithm for heap storage aIiocation.ACMSIGPLAN Notices, 23(10):141-144, October 1988. Google ScholarDigital Library
- 25 Paul R. Wilson, Michael S. Lain, and Thomas G. Moher. Caching considerations for generation garbage collection. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming, pages 32-42, San Francisco, CA, June 1992. ACM. Google ScholarDigital Library
- 26 Paul R. Wilson, M.S. Lain, and T.G. Moher. Effective 'static graph' reorganization to improve locality in garbage-coUected systems. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, volume 26, pages 177-191, Toronto, Ontario, Canada, June 1991. Google ScholarDigital Library
- 27 Benjamin Zorn. Comparative Performance Evaluation of Garbage Collection Algorithms. PhD thesis, University of California at Berkeley, Berkeley, CA, November 1989. Also appears as tech report UCB/CSD 89/544. Google ScholarDigital Library
- 28 Benjamin Zorn. The effect of garbage collection on cache performance. Technical Report CU-CS-528-91, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, May 1991.Google ScholarCross Ref
- 29 Benjamin Zorn and Dirk Grunwald. Empirical measurements of six allocation-intensive C programs. SIGPLAN Notices, 27(12):71-80, December 1992. Google ScholarDigital Library
- 30 Benjamin Zorn and Dirk Grunwald. Evaluating models of memory allocation. TechnicalReport CU-CS-603-92, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, July 1992.Google Scholar
Index Terms
- Improving the cache locality of memory allocation
Recommendations
Improving the cache locality of memory allocation
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementationThe allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a ...
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses
Modern CPUs often use large physically indexed caches that are direct-mapped or have low associativities. Such caches do not interact well with virtual memory systems. An improperly placed physical page will end up in a wrong place in the cache, causing ...
Comments