skip to main content
article

Cache-conscious coallocation of hot data streams

Published: 11 June 2006 Publication History

Abstract

The memory system performance of many programs can be improved by coallocating contemporaneously accessed heap objects in the same cache block. We present a novel profile-based analysis for producing such a layout. The analysis achieves cacheconscious coallocation of a hot data stream H (i.e., a regular data access pattern that frequently repeats) by isolating and combining allocation sites of object instances that appear in H such that intervening allocations coming from other sites are separated. The coallocation solution produced by the analysis is enforced by an automatic tool, cminstr, that redirects a program's heap allocations to a run-time coallocation library comalloc. We also extend the analysis to coallocation at object field granularity. The resulting field coallocation solution generalizes common data restructuring techniques, such as field reordering, object splitting, and object merging, and allows their combination. Furthermore, it provides insight into object restructuring by breaking down the coallocation benefit on a per-technique basis, which provides the opportunity to pick the "sweet spot" for each program. Experimental results using a set of memory-performance-limited benchmarks, including a few SPECInt2000 programs, and Microsoft VisualFoxPro, indicate that programs possess significant coallocation opportunities. Automatic object coallocation improves execution time by 13% on average in the presence of hardware prefetching. Hand-implemented field coallocation solutions for two of the benchmarks produced additional improvements (12% and 22%) but the effort involved suggests implementing an automated version for type-safe languages, such as Java and C#.

References

[1]
E. Berger, B. Zorn, and K. McKinley. "Composing high-performance memory allocators." In ACM SIGPLAN'01 Conference on Programming Language Design and Implementation (PLDI), June 2001.
[2]
B. Calder et. al. "Cache-conscious data placement." In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 1998.
[3]
T.M. Chilimbi. "Efficient Representations and Abstractions for Quantifying and Exploiting Data Reference Locality." In ACM SIGPLAN'01 Conference on Programming Language Design and Implementation (PLDI), June 2001.
[4]
T.M. Chilimbi. "On the stability of temporal data reference profiles." In International Conference on Parallel Architectures and Compilation Techniques (PACT), Aug. 2001.
[5]
T.M. Chilimbi, J.R. Larus, and M.D. Hill. "Cache-conscious structure layout." In ACM SIGPLAN'99 Conference on Programming Language Design and Implementation (PLDI), May 1999.
[6]
T.M. Chilimbi, J.R. Larus, and B. Davidson. "Cache-conscious structure definition." In ACM SIGPLAN'99 Conference on Programming Language Design and Implementation (PLDI), May 1999.
[7]
P. Crescenzi et al. "A compendium of NP optimization problems." ww.nada.kth.se/~viggo/problemlist/compendium.html
[8]
N. Gloy et. al "Procedure placement using temporal ordering information." In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MICRO), Dec. 1997.
[9]
R. Gonzalez et al. "Energy dissipation in general-purpose microprocessors." In IEEE Journal of Solid State Circuits, 31(9), Sept. 1996.
[10]
S. Guyer and K. McKinley. "Finding your cronies: static analysis for dynamic object colocation." In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), Oct. 2004.
[11]
X. Huang et al. "The garbage collection advantage: Improving program locality." In Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), Oct. 2004.
[12]
M. M. Halldorsson. "Approximations of weighted independent set and hereditary subset problems." In Journal of Graph Algorithms and Applications, Vol. 4, 2000.
[13]
M. Hirzel and T.M. Chilimbi. "Bursty Tracing: A Framework for Low-Overhead Temporal Profiling." In Workshop on Feedback Directed and Dynamic Optimizations (FDDO), Dec. 2001.
[14]
T. Kistler and M. Franz. "Automated data-member layout of heap objects to improve memory-hierarchy performance." In Transactions on Programming Languages and Systems (TOPLAS),volume 22, 2000.
[15]
E. Petrank and D. Rawitz. "The hardness of cache-conscious data placement." In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Jan 2002.
[16]
S. Rubin, R. Bodik, and T.M. Chilimbi. "An Efficient Profile-Analysis Framework for Data-Layout Optimizations." In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), Jan 2002.
[17]
M. Seidl and B. Zorn. "Segregating heap objects by reference behavior and lifetime." In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct 1998.
[18]
A. Srivastava, A. Edwards, and H. Vo. Vulcan: Binary transformation in a distributed environment. In MSR-TR-2001-50, 2001.
[19]
D. Truong, F. Bodin, and A. Seznec. "Improving cache behavior of dynamically allocated data structures." In International Conference on Parallel Architectures and Compilation Techniques (PACT), 1998.
[20]
Y. Zhong et al. "Array regrouping and structure splitting using wholeprogram reference affinity." In ACM SIGPLAN'04 Conference on Programming Language Design and Implementation (PLDI), 2004.
[21]
C. Lattner and V. Adve. "Automatic pool allocation: Improving performance by controlling data structure layout in the heap." In ACM SIGPLAN'05 Conference on Programming Language Design and Implementation (PLDI), 2005.

Cited By

View all
  • (2022)Data layout optimization based on the spatio-temporal model of field access2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00055(238-244)Online publication date: Apr-2022
  • (2020)Improved Basic Block ReorderingIEEE Transactions on Computers10.1109/TC.2020.298288869:12(1784-1794)Online publication date: 1-Dec-2020
  • (2011)Making STMs Cache Friendly with Compiler TransformationsProceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2011.55(232-242)Online publication date: 10-Oct-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 41, Issue 6
Proceedings of the 2006 PLDI Conference
June 2006
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1133255
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '06: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation
    June 2006
    438 pages
    ISBN:1595933204
    DOI:10.1145/1133981
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2006
Published in SIGPLAN Volume 41, Issue 6

Check for updates

Author Tags

  1. cache optimization
  2. data locality
  3. data profiling
  4. dynamic allocation
  5. hot data streams
  6. memory layout

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Data layout optimization based on the spatio-temporal model of field access2022 5th International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE)10.1109/AEMCSE55572.2022.00055(238-244)Online publication date: Apr-2022
  • (2020)Improved Basic Block ReorderingIEEE Transactions on Computers10.1109/TC.2020.298288869:12(1784-1794)Online publication date: 1-Dec-2020
  • (2011)Making STMs Cache Friendly with Compiler TransformationsProceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2011.55(232-242)Online publication date: 10-Oct-2011
  • (2010)AjaxScopeACM Transactions on the Web10.1145/1841909.18419104:4(1-52)Online publication date: 1-Sep-2010
  • (2010)Engineering scalable, cache and space efficient tries for stringsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-010-0183-919:5(633-660)Online publication date: 1-Oct-2010
  • (2009)Region Based Structure Layout Optimization by Selective Data CopyingProceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT.2009.43(338-347)Online publication date: 12-Sep-2009
  • (2007)AjaxScopeACM SIGOPS Operating Systems Review10.1145/1323293.129426441:6(17-30)Online publication date: 14-Oct-2007
  • (2007)AjaxScopeProceedings of twenty-first ACM SIGOPS symposium on Operating systems principles10.1145/1294261.1294264(17-30)Online publication date: 14-Oct-2007
  • (2007)Structure Layout Optimization for Multithreaded ProgramsProceedings of the International Symposium on Code Generation and Optimization10.1109/CGO.2007.36(271-282)Online publication date: 11-Mar-2007
  • (2022)Online Application Guidance for Heterogeneous Memory SystemsACM Transactions on Architecture and Code Optimization10.1145/353385519:3(1-27)Online publication date: 6-Jul-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media