Improving the cache locality of memory allocation

Authors Info & Claims

ACM SIGPLAN Notices Volume 28 Issue 6June 1993pp 177–186https://doi.org/10.1145/173262.155107

Published:01 June 1993Publication History

ACM SIGPLAN Notices

Abstract

The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simualtion of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.

References

1 Thomas Ball and James R. Larus. Optimally profiling and tracing programs. In Conference Record of the Nineteenth ACM Symposium on Principles of Programming Languages, pages 59-70, January 1992. Google ScholarDigital Library
2 David Barrett and Benjamin Zorn. Using lifetime predictors to improve memory allocation performance. In $IGPLAN'93 Conference on Programming Language Design and Implementation, Albuquerque, June 1993. Google ScholarDigital Library
3 Gerald Bozman. The software lookasize buffer reduces search overhead with linked lists. Communications of the ACM, 27(3):222-227, March 1984. Google ScholarDigital Library
4 David Callahan, Ken Kennefy, and AUan Porterfield. Software prefetching. In Fourth Intl. Conf. on Arch. Support for Programming Languages and Operating Systems, pages 40-52, April 1991. Google ScholarDigital Library
5 John DeTreville. Heap usage in the Topaz environment. Technical Report 63, Digital Equipment Corporation System Research Center, Palo Alto, CA, August 1990.Google Scholar
6 Digital Equipment Corporation. Unix ManualPagefor PIXIE, ULTRIX V4.2 (rev 96) edition, September 1991.Google Scholar
7 A. J. Goldberg and J. Hennessy. Performance debugging shared memory multiprocessor programs with MTOOL. In Proceedings Supercomputing '91, pages 481-491,1991. Google ScholarDigital Library
8 Dirk Grunwald and Benjamin Zorn. CUSTOMALI. OC:Efficient Synthesized Memory Allocators. Technical Report CS-CS- 602-92, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, July 1992.Google Scholar
9 Mike Haertel. Description of GNU malloc implementation. Personal communication, August 1991.Google Scholar
10 Mark D. Hill. TYCHO. University of Wisconsin, Madison, WI. Unix manual page.Google Scholar
11 Norman P. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proceedings of the Seventeenth Annual International Symposium on Computer Architecture, pages 364-373, Seattle, WA, May 1990. Google ScholarDigital Library
12 Chris Kingsley. Description of a very fast storage allocator. Documentation of 4.2 BSD Unix malloc implementation, February 1982.Google Scholar
13 Donald E. Knuth. FundamentaIAlgorithms, volume 1 of The Art of Computer Programming, chapter 2, pages 435--451. Addison Wesley, Reading, MA, 2nd edition, 1973.Google Scholar
14 David G. Korn and Kiem-Phong Vo. In search of a better malloc. InProceedings of the Summer 1985 USENiX Conference, pages 489-506,1985.Google Scholar
15 M.S. Lain, P. W. Wilson, and T. G.Moher. Object type directed garbage collection to improve locality. In Proceedings of the International Workshop on Memory Management, St. Malo, FRANCE, September 1992. Springer Verlag. Google ScholarDigital Library
16 Doug Lea. An efficient first-fit memory allocator. (From comments in source and personal communication).Google Scholar
17 Alvin R. Lebeck and David A. Wood. CPROF: A cache performance profiler. Technical report, Computer Sciences Dept., Univ. of Wisconsin---Madison, Iuly 1992.Google Scholar
18 Henry Lieberman and Carl Hewitt. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM, 26(6):419-429, June 1983. Google ScholarDigital Library
19 Jeffrey C. Mogul and Anita Borg. The effect of context switches on cache performance. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-1V), pages 75-84, Santa Clara, CA, April 1991. Google ScholarDigital Library
20 David A. Moon. Garbage collection in a large Lisp system. In Conference Record of the 1984 ACM Symposium on LISP and Functional Programming, pages 235-246, Austin, Texas, August 1984. Google ScholarDigital Library
21 AJ. Smith. Line (block) size choice for CPU cache memories. IEEE Transactions on Computers, 36(9):1063-1075, September 1987. Google ScholarDigital Library
22 Thomas Standish. Data Structures Techniques. Addison- Wesley Publishing Company, 1980. Google ScholarDigital Library
23 David Ungar. Generation scavenging: A non-disruptive high performance storage reclamation algorithm. In SIG. SOFT/SIGPLAN Practical Programming Environments Con. ference, pages 157-167, April 1984. Google Scholar
24 Charles B. Weinstock and William A. Wulf. Quickfit: An efficient algorithm for heap storage aIiocation.ACMSIGPLAN Notices, 23(10):141-144, October 1988. Google ScholarDigital Library
25 Paul R. Wilson, Michael S. Lain, and Thomas G. Moher. Caching considerations for generation garbage collection. In Proceedings of the 1992 ACM Conference on LISP and Functional Programming, pages 32-42, San Francisco, CA, June 1992. ACM. Google ScholarDigital Library
26 Paul R. Wilson, M.S. Lain, and T.G. Moher. Effective 'static graph' reorganization to improve locality in garbage-coUected systems. In Proceedings of the ACM SIGPLAN '91 Conference on Programming Language Design and Implementation, volume 26, pages 177-191, Toronto, Ontario, Canada, June 1991. Google ScholarDigital Library
27 Benjamin Zorn. Comparative Performance Evaluation of Garbage Collection Algorithms. PhD thesis, University of California at Berkeley, Berkeley, CA, November 1989. Also appears as tech report UCB/CSD 89/544. Google ScholarDigital Library
28 Benjamin Zorn. The effect of garbage collection on cache performance. Technical Report CU-CS-528-91, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, May 1991.Google ScholarCross Ref
29 Benjamin Zorn and Dirk Grunwald. Empirical measurements of six allocation-intensive C programs. SIGPLAN Notices, 27(12):71-80, December 1992. Google ScholarDigital Library
30 Benjamin Zorn and Dirk Grunwald. Evaluating models of memory allocation. TechnicalReport CU-CS-603-92, Department of Computer Science, University of Colorado, Boulder, Boulder, CO, July 1992.Google Scholar

Index Terms

Improving the cache locality of memory allocation

Recommendations

Improving the cache locality of memory allocation
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation

The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a ...
Read More
Improving memory hierarchy performance with hardware prefetching and cache replacement
Read More
Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses

Modern CPUs often use large physically indexed caches that are direct-mapped or have low associativities. Such caches do not interact well with virtual memory systems. An improperly placed physical page will end up in a wrong place in the cache, causing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 28, Issue 6
June 1993
313 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/173262
Editor:
Richard Wexelblat
IDA/CSED, Alexandria, VA
Issue’s Table of Contents
PLDI '93: Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
August 1993
313 pages
ISBN:0897915984
DOI:10.1145/155090
Editor:
Robert Cartwright
Rice Univ., Houston, TX
Copyright © 1993 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1993
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 97
  Total Citations
  View Citations
- 1,292
  Total Downloads
- Downloads (Last 12 months)138
- Downloads (Last 6 weeks)21
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Improving the cache locality of memory allocation

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Improving the cache locality of memory allocation

Improving memory hierarchy performance with hardware prefetching and cache replacement

Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses