- 1.D. Bacon, J.-H. Chow, D.-C. Ju, K. Muthukumar, and V. Sarkar. A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness. In Proceedings of CASCON'94, Toronto, Canada, October 1994. Google ScholarDigital Library
- 2.S. Carr and K. Kennedy. Compiler blockability of numerical algorithms. In Proceedings of Supercomputing '92, Minneapolis, MN, November 1992. Google ScholarDigital Library
- 3.J. Chame and S. Moon. A tile selection algorithm for data locality and cache interference. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999. Google ScholarDigital Library
- 4.S. Chatterjee, V. Jain, A. Lebeck, S. Mundhra, and M. Thottethodi. Nonlinear array layouts for hierarchical memory systems. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999. Google ScholarDigital Library
- 5.M. Cierniak and W. Li. Unifying data and control transformations for distributed shared-memory machines. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarDigital Library
- 6.S. Coleman and K. S. McKinley. Tile size selection using cache organization and data layout. In Proceedings of the SIGPLAN '95 Conference on Programming Language Design and Implementation, La Jolla, CA, June 1995. Google ScholarDigital Library
- 7.J. Ferrante, V. Sarkar, and W. Thrash. On estimating and enhancing cache effectiveness. In U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, editors, Languages and Compilers for Parallel Computing, Fourth International Workshop, Santa Clara, CA, August 1991. Springer-Verlag. Google ScholarDigital Library
- 8.D. Gannon, W. Jalby, and K. Gallivan. Strategies for cache and local memory management by global program transformation. Journal of Parallel and Distributed Computing, 5(5):587-616, October 1988. Google ScholarDigital Library
- 9.G. Gao, R. Olsen, V. Sarkar, and R. Thekkath. Collective loop fusion for array contraction. In Proceedings of the Fifth Workshop on Languages and Compilers for Parallel Computing, New Haven, CT, August 1992. Google ScholarDigital Library
- 10.S. Ghosh, M. Martonosi, and S. Malik. Cache miss equations: An analytical representation of cache misses. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997. Google ScholarDigital Library
- 11.F. Irigoin and R. Triolet. Supernode partitioning. In Proceedings of the Fifteenth Annual ACM Symposium on the Principles of Programming Languages, San Diego, CA, January 1988. Google ScholarDigital Library
- 12.M. Kandemir, A. Choudhary, J. Ramanujam, and P. Banerjee. Improving locality using loop and data transformations in an integrated framework. In Proceedings of the 31th IEEE/ACM International Symposium on Microarchitecture, Dallas, TX, November 1998. Google ScholarDigital Library
- 13.M. Kandemir, J. Ramanujam, and A. Choudhary. A compiler algorithm for optimizing locality in loop nests. In Proceedings of the 1997 ACM International Conference on Supercomputing, Vienna, Austria, July 1997. Google ScholarDigital Library
- 14.K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993. Google ScholarDigital Library
- 15.I. Kodukula and K. Pingali. An experimental evaluation of tiling and shacking for memory hierarchy management. In Proceedings of the 1999 ACM International Conference on Supercomputing, Rhodes, Greece, June 1999. Google ScholarDigital Library
- 16.M. Lam, E. Rothberg, and M. E. Wolf. The cache performance and optimizations of blocked algorithms. In Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IV), Santa Clara, CA, April 1991. Google ScholarDigital Library
- 17.N. Manjikian and T. Abdelrahman. Fusion of loops for parallelism and locality. IEEE Transactions on Parallel and Distributed Systems, 8(2):193-209, February 1997. Google ScholarDigital Library
- 18.K. S. McKinley, S. Carr, and C.-W. Tseng. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems, 18(4):424-453, July 1996. Google ScholarDigital Library
- 19.N. Mitchell, L. Carter, J. Ferrante, and K. Hogstedt. Quantifying the multi-level nature of tiling interactions. In Proceedings of the Tenth Workshop on Languages and Compilers for Parallel Computing, Minneapolis, MN, August 1997. Google ScholarDigital Library
- 20.G. Rivera and C.-W. Tseng. Data transformations for eliminating conflict misses. In Proceedings of the SIGPLAN '98 Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998. Google ScholarDigital Library
- 21.G. Rivera and C.-W. Tseng. Eliminating conflict misses for high performance architectures. In Proceedings of the 1998 ACM International Conference on Supercomputing, Melbourne, Australia, July 1998. Google ScholarDigital Library
- 22.G. Rivera and C.-W. Tseng. A comparison of compiler tiling algorithms. In Proceedings of the 8th International Conference on Compiler Construction (CC'99), Amsterdam, The Netherlands, March 1999. Google ScholarDigital Library
- 23.V. Sarkar. Automatic selection of higher order transformations in the IBM XL Fortran compilers. IBM Journal of Research and Development, 41(3):233- 264, May 1997. Google ScholarDigital Library
- 24.S. Singhai and K. S. McKinley. A parameterized loop fusion algorithm for improving parallelism and cache locality. The Computer Journal, 40(6):340- 355, 1997.Google ScholarCross Ref
- 25.Y. Song and Z. Li. New tiling techniques to improve cache temporal locality. In Proceedings of the SIG- PLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999. Google ScholarDigital Library
- 26.O. Temam, C. Fricker, and W. Jalby. Cache interference phenomena. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1994. Google ScholarDigital Library
- 27.R. Wilson et al. SUIF: An infrastructure for research on parallelizing and optimizing compilers. ACM SIGPLAN Notices, 29(12):31-37, December 1994. Google ScholarDigital Library
- 28.M. Wolf, D. Maydan, and D.-K. Chen. Combining loop transformations considering caches and scheduling. In Proceedings of the 29th IEEE/ACM International Symposium on Microarchitecture, Paris, France, December 1996. Google ScholarDigital Library
- 29.M. E. Wolf and M. Lam. A data locality optimizing algorithm. In Proceedings of the SIGPLAN '91 Conference on Programming Language Design and Implementation, Toronto, Canada, June 1991. Google ScholarDigital Library
- 30.M. E. Wolf and M. Lam. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems, 2(4):452-471, October 1991. Google ScholarDigital Library
- 31.M. J. Wolfe. More iteration space tiling. In Proceedingsof Supercomputing '89, Reno, NV, November 1989. Google ScholarDigital Library
Index Terms
Locality optimizations for multi-level caches
Recommendations
Exploiting reuse locality on inclusive shared last-level caches
Special Issue on High-Performance Embedded Architectures and CompilersOptimization of the replacement policy used for Shared Last-Level Cache (SLLC) management in a Chip-MultiProcessor (CMP) is critical for avoiding off-chip accesses. Temporal locality, while being exploited by first levels of private cache memories, is ...
Exploiting spatial locality in data caches using spatial footprints
Special Issue: Proceedings of the 25th annual international symposium on Computer architecture (ISCA '98)Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result in cache hits. Although this approach benefits from spatial locality, ...
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesThe replacement policies for the last-level caches (LLCs) are usually designed based on the access information available locally at the LLC. These policies are inherently sub-optimal due to lack of information about the activities in the inner-levels of ...
Comments