article

Analyzing data reuse for cache reconfiguration

Authors:

N. Vijaykrishnan,

M. J. IrwinAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 4, Issue 4

Pages 851 - 876

https://doi.org/10.1145/1113830.1113836

Published: 01 November 2005 Publication History

Abstract

Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each nest might work best with a different cache configuration and transforming a nest for a given fixed cache configuration may not be possible due to data and control dependences. Working with a fixed cache configuration can also increase energy consumption in loops where the best required configuration is smaller than the default (fixed) one. In this paper, we take an alternate approach and modify the cache configuration for each nest, depending on the access pattern exhibited by the nest. We call this technique compiler-directed cache polymorphism (CDCP). More specifically, in this paper, we make the following contributions. First, we present an approach for analyzing data reuse properties of loop nests. Second, we give algorithms to simulate the footprints of array references in their reuse space. Third, based on our reuse analysis, we present an optimization algorithm to compute the cache configurations for each loop nest. Our experimental results show that CDCP is very effective in finding the near-optimal data cache configurations for different nests in array-intensive applications.

References

[1]

Ahmed, N., Mateev, N., and Pingali, K. 2000. Synthesizing transformations for locality enhancement of imperfectly-nested loop nests. In Proceedings of the 2000 International Conference on Supercomputing, Santa Fe, New Mexico. 141--152.

Digital Library

[2]

Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In Proc. of the 32nd Annual International Conference on Microarchitecture.

Digital Library

[3]

Bacon, D. F., Graham, S. L., and Sharp, O. J. 1994. Compiler transformations for high-performance computing. ACM Computing Surveys 26, 4, 345--420.

Digital Library

[4]

Burger, D., Kagi, A., and Hrishikesh, M. S. 2000. Memory hierarchy extensions to simplescalar 3.0. Tech. Rep. TR99-25, Department of Computer Sciences, The University of Texas at Austin, Austin, TX.

[5]

Chame, J. 1997. Compiler analysis of cache interference and its applications to compiler optimizations. Ph.D. thesis, Dept. of Computer Engineering, University of Southern California, Los Angeles, CA.

Digital Library

[6]

Cmelik, B. and Keppel, D. 1994. Shade: a fast instruction-set simulator for execution profiling. In Proc. of the 1994 ACM SIGMETRICES Conf. on the Measurement and Modeling of Computer Systems. 128--137.

Digital Library

[7]

Gannon, D., Jalby, W., and Gallivan, K. 1988. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing 5, 5 (Oct.), 587--616.

Digital Library

[8]

Ghosh, S., Martonosi, M., and Malik, S. 1997. Cache miss equations: An analytical representation of cache misses. In Proc. of the 11th International Conference on Supercomputing (ICS-97).

Digital Library

[9]

Ghosh, S., Martonosi, M., and Malik, S. 1998. Precise miss analysis for program transformations with caches of arbitrary associativity. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA. 228--239.

Digital Library

[10]

Givargis, T., Henkel, J., and Vahid, F. 1999. Interface and cache power exploration for core-based embedded systems. In Proceedings of International Conference on Computer Aided Design (ICCAD). 270--273.

Digital Library

[11]

Grunwald, D., Zorn, B. G., and Henderson, R. 1993. Improving the cache locality of memory allocation. In Proceedings of the ACM SIGPLAN'93 Conference on Programming Language Design and Implementation (PLDI), Albuquerque, New Mexico. 177--186.

Digital Library

[12]

Kadayif, I., Kandemir, M., Vijaykrishnan, N., Irwin, M. J., and Ramanujam, J. 2001. Morphable cache architectures: potential benefits. In ACM Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'01).

Digital Library

[13]

Manjikian, N. and Abdelrahman, T. S. 1995. Fusion of loops for parallelism and locality. In Proceedings of the 24th International Conference on Parallel Processing (ICPP'95). Oconomowoc, Wisconsin. II:19--28.

[14]

McKinley, K. S., Carr, S., and Tseng, C.-W. 1996. Improving data locality with loop transformations. ACM Transactions on Programming Lanaguages and Systems 18, 4 (July), 424--453.

Digital Library

[15]

Ranganathan, P., Adve, S., and Jouppi, N. P. 2000. Reconfigurable caches and their application to media processing. In Proc. of the 27th Annual International Symposium on Computer Architecture. 214--224.

Digital Library

[16]

Reinman, G. and Jouppi, N. 1999. An integrated cache timing and power model. Cacti 2.0 technical report, COMPAQ Western Research Lab.

[17]

Rivera, G. and Tseng, C.-W. 1998. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 International Conference on Supercomputing, Melbourne, Australia. 353--360.

Digital Library

[18]

Simunic, T., Micheli, G. D., and Benini, L. 1999. Energy-efficient design of battery-powered embedded systems. In Proceedings of International Symposium on Low Power Electronics and Design. 212--217.

Digital Library

[19]

Song, Y. and Li, Z. 1999. New tiling techniques to improve cache temporal locality. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA.

Digital Library

[20]

Stanford Compiler Group. 1994. The SUIF Library, version 1.0 edition. Stanford Compiler Group.

[21]

Temam, O., Fricker, C., and Jalby, W. 1994. Cache interference phenomena. In Proc. of ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems.

Digital Library

[22]

Tiwari, V., Malik, S., Wolfe, A., and Lee, M. 1996. Instruction level power analysis and optimization of software. Journal of VLSI Signal Processing 13, 2, 1--18.

Digital Library

[23]

Wolf, M. and Lam, M. 1991. A data locality optimizing algorithm. In Proc. of SIGPLAN'91 conf. Programming Language Design and Implementation. 30--44.

Digital Library

[24]

Yi, Q., Adve, V., and Kennedy, K. 2000. Transforming loops to recursion for multi-level memory hierarchies. In Proceedings of the SIGPLAN '00 Conference on Programming Language Design and Implementation, Vancouver, Canada.

Digital Library

Cited By

Totoni ETorrellas JKale LDamkroger TDongarra J(2014)Using an adaptive HPC runtime system to reconfigure the cache hierarchyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.90(1047-1058)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.90
Wang YJia ZChen RWang MLiu DShao Z(2014)Loop scheduling with memory access reduction subject to register constraints for DSP applicationsSoftware—Practice & Experience10.1002/spe.218644:8(999-1026)Online publication date: 1-Aug-2014
https://dl.acm.org/doi/10.1002/spe.2186
Liu LLiu MWang C(2013)An Optimized GP-GPU Warp Scheduling Algorithm for Sparse Matrix-Vector MultiplicationProceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and Storage10.1109/NAS.2013.35(222-231)Online publication date: 17-Jul-2013
https://dl.acm.org/doi/10.1109/NAS.2013.35
Show More Cited By

Index Terms

Analyzing data reuse for cache reconfiguration
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Compiler-directed cache polymorphism

Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each loop nest might work best with a different cache configuration and ...
Compiler-directed cache polymorphism
LCTES/SCOPES '02: Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems

Classical compiler optimizations assume a fixed cache architecture and modify the program to take best advantage of it. In some cases, this may not be the best strategy because each loop nest might work best with a different cache configuration and ...
A Layout-Conscious Iteration Space Transformation Technique

Exploiting locality of references has become extremely important in realizing the potential performance of modern machines with deep memory hierarchies. The data access patterns of programs and the memory layouts of the accessed data sets play a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 4, Issue 4

November 2005

259 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/1113830

Issue’s Table of Contents

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 November 2005

Published in TECS Volume 4, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
546
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Totoni ETorrellas JKale LDamkroger TDongarra J(2014)Using an adaptive HPC runtime system to reconfigure the cache hierarchyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.90(1047-1058)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.90
Wang YJia ZChen RWang MLiu DShao Z(2014)Loop scheduling with memory access reduction subject to register constraints for DSP applicationsSoftware—Practice & Experience10.1002/spe.218644:8(999-1026)Online publication date: 1-Aug-2014
https://dl.acm.org/doi/10.1002/spe.2186
Liu LLiu MWang C(2013)An Optimized GP-GPU Warp Scheduling Algorithm for Sparse Matrix-Vector MultiplicationProceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and Storage10.1109/NAS.2013.35(222-231)Online publication date: 17-Jul-2013
https://dl.acm.org/doi/10.1109/NAS.2013.35
Paul MPetrov P(2011)Dynamically Adaptive I-Cache Partitioning for Energy-Efficient Embedded MultitaskingIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2010.206699519:11(2067-2080)Online publication date: 1-Nov-2011
https://dl.acm.org/doi/10.1109/TVLSI.2010.2066995
Reddy RPetrov P(2010)Cache partitioning for energy-efficient and interference-free embedded multitaskingACM Transactions on Embedded Computing Systems10.1145/1698772.16987749:3(1-35)Online publication date: 5-Mar-2010
https://dl.acm.org/doi/10.1145/1698772.1698774
Tao JHillenbrand DWang LMarten H(2010)Studying Filter Cache Bypassing on Embedded SystemsProceedings of the 2010 10th IEEE International Conference on Computer and Information Technology10.1109/CIT.2010.296(1679-1686)Online publication date: 29-Jun-2010
https://dl.acm.org/doi/10.1109/CIT.2010.296
Tao JHillenbrand DMarten H(2009)Instruction Hints for Super Efficient Data CachesProceedings of the 9th International Conference on Computational Science10.1007/978-3-642-01973-9_76(677-685)Online publication date: 25-May-2009
https://dl.acm.org/doi/10.1007/978-3-642-01973-9_76
Baradaran NDiniz P(2008)A compiler approach to managing storage and memory bandwidth in configurable architecturesACM Transactions on Design Automation of Electronic Systems10.1145/1391962.139196913:4(1-26)Online publication date: 3-Oct-2008
https://dl.acm.org/doi/10.1145/1391962.1391969
Reddy RPetrov PKim TSainrat PLumetta SNavarro N(2007)Eliminating inter-process cache interference through cache reconfigurability for real-time and low-power embedded multi-tasking systemsProceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems10.1145/1289881.1289917(198-207)Online publication date: 30-Sep-2007
https://dl.acm.org/doi/10.1145/1289881.1289917

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents