skip to main content
research-article

A survey on cache tuning from a power/energy perspective

Published:03 July 2013Publication History
Skip Abstract Section

Abstract

Low power and/or energy consumption is a requirement not only in embedded systems that run on batteries or have limited cooling capabilities, but also in desktop and mainframes where chips require costly cooling techniques. Since the cache subsystem is typically the most power/energy-consuming subsystem, caches are good candidates for power/energy optimizations, and therefore, cache tuning techniques are widely researched. This survey focuses on state-of-the-art offline static and online dynamic cache tuning techniques and summarizes the techniques' attributes, major challenges, and potential research trends to inspire novel ideas and future research avenues.

References

  1. Albonesi, D. H. 1999. Selective cache way: On-demand cache resource allocation. In Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture. IEEE, Washington, DC, 248--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Ammons, G., Ball, T., and Larus, J. R. 1997. Exploiting hardware performance counters with flow and context sensitive profiling, In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anderson, J. M., Berc, L. M., Dean, J., Ghemawat, S., Henzinger M. R., Leung S. A., Sites, R. L., Vandevoorde, M. T., Waldspurger C. A., and Weihl W. E. 1997. Continuous profiling: Where have all the cycles gone? ACM Trans. Comput. Syst. 15, 4, 357--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Austin, T., Larson, E., and Ernst, D. 2002. SimpleScalar: An infrastructure for comput. system modeling. IEEE Comput. 35, 2, 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Awasthi, M., Sudan, K., Balasubramonian, R., and Carter, J. 2009. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches. In Proceedings of Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, 250--261.Google ScholarGoogle Scholar
  6. Balasubramonian, R., Albonesi, D., Buyuktosunoglu, A. and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, ACM, New York, NY, 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Balasubramonian, R., Jouppi, N. P., and Muralimanohar, N. 2011. Multi-core cache hierarchies. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Rafael, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Beckmann, B., Marty, M., and Wood, D. 2006. ASR: Adaptive selective replication for CMP caches. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, Los Alamitos, CA, 443--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bedichek, R. 2004. SimNow: Fast platform simulation purely in software. In Proceedings of the Symposium on High Performance Chips (HOT CHIPS).Google ScholarGoogle Scholar
  10. Bellard, F. 2005. QEMU, a fast and portable dynamic translator, USENIX' 05 Technical Program. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Benitez, D., Moure, J. C., Rexachs, D. I., and Luque E. 2006. Evaluation of the field-prorammable cache: Performance and energy consumption, In Proceedings of the 3rd Conference on Computing Frontiers, ACM, New York, NY, 361--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Biesbrouck, M. V., Sherwood, T., and Calder. B. 2004. A co-phase matrix to guide simultaneous multithreading simulation. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, Washington, DC, 45--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., and Reinhardt, S. K. 2006. The M5 simulator: Modeling networked systems. IEEE Micro. 26, 4, 52-60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bohr, M. T., Chau, R. S., Ghani, T., and Mistry, K. 2007. The high-k solution, IEEE Spectrum. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Brehob, M. and Enbody, R. J. 1996. An analytical model of locality and caching. Tech. rep. Michigan State University, East Lansing, MI.Google ScholarGoogle Scholar
  16. Brooks, D. M., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of 27th International Symposium on Computer Architecture. IEEE, Washington, DC, 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Brooks, D. M., Bose, P., Srinivasan, V., Gschwind, M., Emma, P., and Rosenfield, M. 2003. New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors. IBM J. Res. Develop. 47, 5--6, 653--670. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Chandra, D., Guo, F., Kim, S., and Solihin, Y. 2005. Predicting inter-thread cache contention on a chip multi-processor architecture. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture. IEEE, Washington, DC, 340--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chang, J. and Sohi, G. 2006. Co-operative caching for chip multiprocessors. In Proceedings of the 33rd Annual International Symposium on Computer Architecture (ISCA). IEEE, Washington, DC, 264--276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chatterjee, S., Parker, E., Hanlon, P. J., and Lebeck, A. R. 2001. Exact analysis of the cache behavior of nested loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, ACM, New York, NY, 286--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Chatterjee, B., Sachdev, M., Hsu, S., Krishnamurthy, R., and Borkar, S. 2003. Effectiveness and scaling trends of leakage control techniques for sub-130 nm CMOS technologies. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED). IEEE, Washington, DC, 122--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Chen, C. F., Yang, S., Falsafi, B., and Moshovos, A. 2004. Accurate and complexity-effective spatial pattern prediction. In Proceedings of the 10th International Symposium on High Performance Computer Architecture. IEEE, Washington, DC, 276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chen, J., Dubois, M., and Stenstrom, P. 2007. SimWattch: Integrating complete-system and user-level performance and power simulators, IEEE Micro, 27, 4, 34--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Chen, X. E. and Aamodt, T. M. 2009. A first-order fine-grained multithreaded throughput model. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture. IEEE, Washington, DC, 329--340.Google ScholarGoogle Scholar
  25. Chen, J., Annavaram, M., and Dubois, M. 2009. SlackSim: A platform for parallel simulation of CMPs on CMPs. ACM SIGARCH Comput. Architect. News. 37, 2, 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Chidester, M. C. and George, A. D. 2002. Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul. (TOMACS) 12, 3, 176--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Chiou, D., Chiouy, D., Rudolph, L., Devadas, S., and Ang, B. S. 2000. Dynamic cache partitioning via columnization. Computation Structures Group Memo 430. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  28. Cho, S. and Jin, L. 2006. Managing distributed, shared L2 caches through OS-Level page allocation. In Proceedings of the ACM/IEEE International Symposium on Microarchitectures (MICRO). IEEE, Washington, DC, 455--468 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Cmelik, B. and Keppel, D. 1994. SHADE: A fast instruction-set simulator for execution profiling. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 128--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Conte, T. M., Hirsch, M. A., and Hwu, W. W. 1998. Combining trace sampling with single pass methods for efficient cache simulation. IEEE Trans. Comput. 47, 6, 714--720. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Dean, J., Hicks, J. E., Waldspurger, C. A., Weihl, W. E., and Chrysos, G. 1997. ProfileMe: Hardware support for instruction-level profiling in out-of-order processors. In Proceedings of the 30th Anual ACM/IEEE International Symposium on Microarchitecture. IEEE, Washington, DC, 292--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Dhodapkar, A. S. and Smith, J. E. 2002. Managing multi-configuration hardware via dynamic working set analysis. In Proceedings of the 29th Annual International Symposium on Computer Architecture. IEEE, Washington, DC, 233--244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dhodapkar, A. S. and Smith, J. E. 2003. Comparing program phase detection techniques. In Proceedings of the International Symposium on Microarchitecture. IEEE, Washington, DC, 217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Díaz, J., Hidalgo, J. I., Fernández, F., Garnica, O., and López, S. 2009. Improving SMT performance: An application of genetic algorithms to configure resizable caches. In Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference. ACM, New York, NY, 2029--2034. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ding, C. and Zhong, Y. 2003. Predicting whole-program locality through reuse distance analysis. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, NY, 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ding, C. and Chilimbi, T. 2009. A composable model for analyzing locality of multi-threaded programs. Tech. rep. MSR-TR-2009-107, Microsoft.Google ScholarGoogle Scholar
  37. Dropsho, S., Buyuktosunoglu, A., Balasubramonian, R., Albonesi, D. H., Dwarkadas, S., Semeraro, G., Magklis, G., and Scott, M. L. 2002. Integrating adaptive on-chip storage structures for reduced dynamic power. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. IEEE, Washington, DC, 141--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Dropsho, S., Kursun, V., Albonesi, D. H., Dwarkadas, S., and Friedman, E. G. 2002. Managing static leakage energy in microprocessor functional units. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'35). IEEE, Los Alamitos, CA, 321--332. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Duesterwald, E., Cascaval, C., and Dwarkadas, S. 2003. Characterizing and predicting program behavior and its variability. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. IEEE, Washington, DC, 220--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dybdahl H. and Stenstrom, P. 2007. An adaptive shared/private nuca cache partitioning scheme for chip multiprocessors. In Proceedings of the Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, 2--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Edler, J. and Hill, M. D. 1998. Dinero IV trace-driven uniprocessor cache simulator. http://www.cs.wisc.edu/∼markhill/DineroIV.Google ScholarGoogle Scholar
  42. Edmondon, J., Rubinfeld, P. I., Bannon P. J., Benschneider, B. J., Bernstein, D., Castelino, R. W., Cooper, E. M., Dever, D. E., Donchin, D. R., Fischer, T. C., Jain, A. K., Mehta, S., Meyer, J. E., Preston, R. P., Rajagopalan, V., Somanathan, C., Taylor, S. A., and Wolrich, G. M. 1995. Internal organization of the Alpha 21164, a 300-MHz 64-bit Quad-issue CMOS RISC microprocessor. Digi. Tech. J. Special 10th Anniversary Issue, 7, 1, 119--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Eeckhout, L., Nussbaum, S., Smith, J. E., and Bosschere, K. D. 2003. Statistical simulation: Adding efficiency to the computer designer's toolbox. IEEE Micro. 23, 5, 26--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Eeckhout, L. 2010. Computer architecture performance evaluation methods. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Rafael, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Eklov, D., Black-Schaffer, D., and Hagersten, E. 2011. Fast modeling of shared cache in multicore systems. In Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers. ACM New York, NY, 147--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Falcón, A., Faraboschi, P., and Ortega. D. 2008. An adaptive synchronization technique for parallel simulation of networked clusters. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 22--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Fang, C., Carr, S., Onder, S., and Wang, Z. 2004. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the Workshop on Memory System Performance. ACM, New York, NY, 60--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Flautner, K., Kim, N. S., Matin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage power, In Proceedings of the 29th Annual International Symposium on Computer Architecture. ACM, New York, NY, 148--157. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Genbrugge, D., Eeckhout, L., and Bosschere K. D. 2006. Accurate memory data flow modeling in statistical simulation. In Proceedings of the 20th Annual International Conference of Supercomputing. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Ghosh, S., Martonosi, M., and Malik, S. 1999. Cache miss equations: A compiler framework for analyzing and tuning memory behavior. ACM Trans. Program. Lang. Syst. 21, 4, 703--746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Ghosh, A., and Givargis, T. 2004. Cache optimization for embedded processor cores: An analytical approach. ACM Trans. Design Autom. Electron. Syst. 9, 4, 419--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Gluhovsky, I. and O'Krafka, B. 2005. Comprehensive multiprocessor cache miss rate generation using multivariate models. ACM Trans. Comput. Syst. 23, 2. 111--145. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Goldschmidt, S. and Hennessey, J. 1992. The accuracy of trace-driven simulations of multiprocessors. Tech rep. CSL-TR-92-546, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Gordon-Ross, A., Vahid, F., and Dutt, N. 2004. Automatic tuning of two level caches to embedded applications. In Proceedings of the Conference on Design, Automation and Test in Europe. IEEE, Washington, DC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Gordon-Ross, A. and Vahid, F. 2007. A self-tuning configurable cache. In Proceedings of the 44th Anual Design Automation Conference. ACM, New York, NY, 234--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Gordon-Ross, A., Viana, P., Vahid, F., Najjar, W., and Barros, E. 2007. A one-shot configurable-cache tuner for improved energy and performance. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, San Jose, CA, 755--760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Gordon-Ross, A., Lau, J., and Calder, B. 2008. Phase-based cache reconfiguration for a highly-configurable two-level cache hierarchy. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI. ACM, New York, NY, 379--382. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Gordon-Ross, A., Vahid, F., and Dutt, N. 2009. Fast configurable-cache tuning with a unified second-level cache. IEEE Trans. Very Large Scale Integ. (VLSI) Syst. 17, 1, 80--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Hamerly. G., Perelman, E., Lau, J., and Calder, B. 2005. SimPoint 3.0: Faster and more flexible program analysis. J. Instruct.-Level Parall. 7, 1--28.Google ScholarGoogle Scholar
  60. Hanson, H., Hrishikesh, M. S., Agarwal, V., Keckler, S. W., and Burger, D. 2003. Static energy reduction techniques for microprocessor caches. IEEE Trans. Very Large Scale Integr. Syst. 11, 3, 303--313. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Hardavellas, N., Ferdman, M., Falsafi, B., and Ailamaki, A. 2009. Reactive NUCA: Near-optimal block placement and replication in distributed caches. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA). ACM, New York, NY, 184--195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Harper, J. S., Kerbyson, D. J., and Nudd, G. R. 1999. Analytical modeling of set-associative cache behavior. IEEE Trans. Comput. 48, 10, 1009--1024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Heidelberger, P. and Stone, H. S. 1990. Parallel trace-driven cache simulation by time partitioning. In Proceedings of the 22nd Conference on Winter Simulation. IEEE, Piscataway, NJ, 734--737. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Hill, M. D. and Smith, A. J. 1989. Evaluating associativity in CPU caches. IEEE Trans. Comput. 38, 12, 1612--1630. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Hind, M., Rjan, V., and Sweeney, P. 2003. Phase shift detection: A problem classification. Tech. rep., IBM.Google ScholarGoogle Scholar
  66. Hsu, L., Reinhardt, S., Iyer, R., and Makineni, S. 2006. Communist, utilitarian, and capitalist cache policies on CMPs: Caches as a shared resource. In Proceedings of the International Conference on Parallel Architectures and Computation Technologies (PACT). ACM, New York, NY, 13--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Hu, J. S., Nadgir, A., Vijaykrishnan, N., Irwin, M. J., Kandemir, M. 2003. Exploiting program hotspots and code sequentiality for instruction cache leakage management. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED). ACM, New York, NY, 402--407. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Huang, M., Renau, J., and Torrellas, J. 2003. Positional adaptation of processors: Application to energy reduction. In Proceedings of the 30th Anual International Symposium on Computer Architecture. ACM, New York, NY, 157--168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Huang, W., Ghosh, S., Velusamy, S., Sankaranarayanan, K., Skadron, K., and Stan, M. R. 2006. HotSpot: A compact thermal modeling method for CMOS VLSI Systems. IEEE Trans. Very Large Scale Integ. Syst. 14, 5, 501--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Huang, C., Sheldon, D., and Vahid, F. 2008. Dynamic tuning of configurable architectures: The AWW online algorithm. In Proceedings of the 6th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. ACM, New York, NY, 97--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Huh, J., Kim, C., Shafi, H., Zhang, L., Burger, D., and Keckler, S. 2007. A NUCA substrate for flexible CMP cache sharing. IEEE Trans. Parallel Distribu. Syst. 18, 8, 1028--1040. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Hughes, C. J., Pai, V. S., Ranganathan, P., and Adve, S. V. 2002. Rsim: Simulating shared-memory multiprocessors with ILP processors. IEEE Computer 35, 2, 40--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Inoue, K., Moshnyaga, V., and Murakami, K. 2001. Trends in high-performance, low-power cache memory architectures. IEICE Trans. Electronics 85, 314.Google ScholarGoogle Scholar
  74. Iyer, R. 2003. On modeling and analyzing cache hierarchies using CASPER. In Proceedings of the 11th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, Washington, DC, 182--187.Google ScholarGoogle ScholarCross RefCross Ref
  75. Iyer, R. 2004. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proceedings of the 18th Annual International Conference on Supercomputing. ACM, New York, NY, 257--266. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Jaleel, A., Cohn, R. S., Luk, C. K., and Jacob. B. 2008a. CMP$im: A pinbased on-the-fly multi-core cache simulator. In Proceedings of the 4th Annual Workshop on Modeling Benchmarking and Simulation.Google ScholarGoogle Scholar
  77. Jaleel, A., Hasenplaugh, W., Qureshi, M., Sebot, J., Steely, Jr. S., and Emer, J. 2008b. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, ACM, New York, NY, 208--219. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Janapsatya, A., Lgnjatović A., and Parameswaran, S. 2006. Finding optimal L1 cache configuration for embedded systems. In Proceedings of the Asia and South Pacific Design Automation Conference. IEEE, Piscataway, NJ, 796--801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  79. Janapsatya, A., Lgnjatović, A., Parameswaran, S., and Henkel, J. 2007. Instruction trace compression for rapid instruction cache simulation. In Proceedings of the Conference on Design, Automation and Test in Europe. EDA Consortium, San Jose, CA, 803--808. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Joshi, A., Yi, J. J., Bell, R. H., Jr., Eeckhout, L. John, L., and Lilja, D. 2006. Evaluating the efficacy of statistical simulation for design space exploration. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. 70--79.Google ScholarGoogle Scholar
  81. Kaxiras, S., Hu, Z., and Martonosi, M. 2001. Cache decay: Exploiting generational behavior to reduce cache leakage power. In Proceedings of the 28th International Symposium on Computer Architecture. IEEE, Washington, DC, 240--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Kaxiras, S. and Martonosi, M. 2008. Computer architecture techniques for power-efficiency. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, San Rafael, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Kessler, R. E. and Hill, M. D. 1992. Page placement algorithms for large real-indexed caches. ACM Trans. Comput. Syst. 10, 4, 338--359. Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Kihm, J. L. and Connors, D. A. 2005. A mathematical model for accurately balancing co-phase effect in simulated multithreaded systems. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation held in conjunction with ISCA-32.Google ScholarGoogle Scholar
  85. Kim, N. S., Flautner, K., Blaauw, D., and Mudge, T. 2002. Drowsy instruction caches--leakage power reduction using dynamic voltage scaling. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO-35). IEEE, Los Alamitos, CA, 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  86. Kim, S., Chandra, D., and Solihin, Y. 2004. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, Washington, DC, 111--122. Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Kim, C. H., Kim, J., Mukhopadhyay, S., and Roy. K. 2005. A forward body-biased low-leakage SRAM cache: Device, circuit and architecture considerations. IEEE Trans. Very Large Scale Integr. Syst. 13, 3, 349--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Laha, S., Patel, J. H., and Iyer R. K. 1988. Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Trans. Comput. 37, 11, 1325--1336. Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Lau, J., Schoenmackers, S., and Calder, B. 2004. Structures for phase classification. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, Piscataway, New Jersey, 57--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  90. Lau, J., Schoenmackers, S., and Calder, B. 2005. Transition phase classification and prediction. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, Washington, DC, 278--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Lau, J., Perelman, E., and Calder, B. 2006. Selecting software phase markers with code structure analysis. In Proceedings of the International Symposium on Code Generation and Optimization. IEEE, Washington, DC, 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  92. Lebeck, A. and Wood, D. 1994. Cache profiling and the SPEC benchmarks: A case study. IEEE Computer, 27, 10, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Lee, H.-H. S., Tyson, G. S., and Farrens, M. K. 2000, Eager Writeback—A technique for improving bandwidth utilization. In Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture. ACM, New York, NY, USA. 11--21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Lee, K. Evans, S., and Cho, S. 2009. Accurately approximating superscalar processor performance from traces. In Proceedings of the International Symposium Performance Analysis of Systems and Software (ISPASS), IEEE, Piscataway, New Jersey, 238--248.Google ScholarGoogle Scholar
  95. Lee, H., Jin, L., Lee, K., Demetriades, S., Moeng, M., and Cho, S. 2010. Two-phase trace-driven simulation (TPTS): A fast multicore processor architecture simulation approach. J. Soft.-Pract. Expe. 40, 3, John Wiley & Sons, Inc. New York, NY, 239--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. Lee, K. and Cho, S. 2011. In-N-Out: Reproducing out-of-order superscalar processor behavior from reduced in-order traces. In Proceedings of the International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS). IEEE, Washington, DC, 126--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Lee, H., Cho, S., and Childers, B. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. Li, L., Kadayif, I., Tsai, Y. F., Vijaykrishnan, N., Kandemir, M., Irwin, M. J., and Sivasubramaniam, A. 2002. Leakage energy management in cache hierarchies. In Proceedings International Conference on Parallel Architectures and Compilation Techniques. IEEE, Washington, DC, 131--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  99. Li, Y., Parikh, D., Zhang, Y., Sankaranarayanan, K., Skadron, K., and Stan, M. 2004. State-preserving vs. non-state-perserving leakage control in caches. In Proceedings of the Conference on Design, Automation and Test in Europe, IEEE, Washington, DC, 10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Lin, J., Lu, Q., Ding, X., Zhang, Z., Zhang, X., and Sadayappan, P. 2009. Enabling software management for multicore caches with a lightweight hardware support. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, New York, NY. Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Liu, C., Sivasubramaniam, A., and Kandemir, M. 2004. Organizing the last line of defense before hitting the memory wall for CMPs. In Proceedings of the Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC. 176--185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. Luk, C.-K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V. J., and Hazelwood, K. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN Conference on Programming Languages Design and Implementation (PLDI). ACM, New York, NY. 190--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. Magnusson, P. S., Christensson, M., Eskilson, K. J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Computer. 35, 2. 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  104. Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design. ACM, New York, NY, USA. 241--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  105. Martin, M. M. K., Sorin, D. J., Beckmann, B. M., Marty, M. R., Xu, M., Alameldeen, A. R., Moore, K. E., Hill, M. D., and Wood, D. A. 2005. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Comput. Architec. News 33, 4. ACM New York, NY, 92--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L. 1970. Evaluation techniques for storage hierarchies. IBM Syst. J. 9, 2, 78--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. Meng, Y., Sherwood, T., and Kastner, R. 2005. Exploring the limits of leakage power reduction in caches. ACM Tran. Architec. Code Optim. 2, 3, 221--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Mihocka, D. and Schwartsman, S. 2008. Virtualization without direct execution or jitting: Designing a portable virtual machine infrastructure. In Proceedings of the Workshop on Architectural and Microarchitectural Support for Binary Translation, held in conjunction with ISCA.Google ScholarGoogle Scholar
  109. Miller, J. E., Kasture, H., Kurian, G., Gruenwald, C., Beckmann, N., Celio, C., Eastep, J., and Agarwal, A. 2010. Graphite: A distributed parallel simulator for multicores. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, 1--12.Google ScholarGoogle Scholar
  110. Mips R4000. Microprocessor user's manual, http://groups.csail.mit.edu/cag/raw/documents/R4400_Uman_book_Ed2.pdf.1994.Google ScholarGoogle Scholar
  111. Mips32. 4ktm Processor core family software user's manual, http://d3s.mff.cuni.cz/∼ceres/sch/osy/download/MIPS32-4K-Manual.pdf.2001.Google ScholarGoogle Scholar
  112. Montanaro, J., Witek, R. T., and Anne, K. Et Al. 1997. A 160-MHz, 32-b 0.5-W CMOS RISC microprocessor, Dig. Tech. J. 9, 1, 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. Namkung, J., Dohyung K., Gupta, R., Kozintsev, I., Bouget, J.-Y., and Dulong, C. 2006. Phase guided sampling for efficient parallel application simulation. In Proceedings of the International Conference Hardware/Software Codesign and System Synthesis (CODES + ISSS). ACM, New York, NY, 187--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  114. Ortego, P. M. and Sack, P. 2004. SESC: SuperESCalar Simulator. http://iacoma.cs.uiuc.edu/∼paulsack/sescdoc/.Google ScholarGoogle Scholar
  115. Perelman, E., Polito, M., Bouguet, J.-Y., Sampson, J., Calder, B., and Dulong, C. 2006. Detecting phases in parallel applications on shared memory architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium, IEEE, Washington, DC, 88--98. Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Powell, M. D., Yang, S., Falsafi, B., Roy, K., and Vijaykumar, T. N. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of the International Symposium on Low Power Electronics and Design, ACM, New York, NY, 90--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Powell, M., Yang, S.-H., Falsafi, B., Roy, K., and Vijaykumar, T. N. 2001. Reducing leakage in a high-performance deep-submicron instruction cache. IEEE Trans. VLSI Syst. 9, 1, 77--89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Qureshi, M. and Patt, Y. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the (MICRO). IEEE, Washington, DC, 423--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  119. Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA), ACM, New York, NY, 381--391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. Qureshi, M. K. 2009. Adaptive spill-receive for robust high-performance caching in CMPs. In Proceedings of the 15th International Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, 45--54.Google ScholarGoogle ScholarCross RefCross Ref
  121. Rajkumar, R., Lee, C., Lehoczky, J., and Siewiorek, D. 1997. A resource allocation model for QoS management. In Proceedings of the 18th IEEE Real-Time Systems Symposium. IEEE, Washington, DC, 298. Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Ramaswamy, S. and Yalamanchili, S. 2007. Improving cache efficiency via resizing + remapping. In Proceedings of the 25th International Conference on Computer Design. IEEE, Washington, DC, 47--54.Google ScholarGoogle Scholar
  123. Ranganathan, P., Adve, S., and Jouppi, N. P. 2000. Reconfigurable caches and their application to media processing. In Proceedings of the 27th Annual International Symposium on Computer Architecture. ACM, New York, NY, 214--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  124. Rawlins, M. and Gordon-Ross, A. 2011. CPACT -- the conditional parameter adjustment cache tuner for dual-core architectures. In Proceedings of the IEEE International Conference of Computer Design (ICCD). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  125. Rawlins, M. and Gordon-Ross, A. 2012. An application classification guided cache tuning heuristic for multi-core architectures. In Proceedings of the Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, Piscataway, NJ.Google ScholarGoogle Scholar
  126. Renau, J., Fraguela, B., Tuck, J., Liu, W., Prvulovic, M., Ceze, L., Strauss, K., Sarangi, S., Sack, P., and Montesinos, P. 2005. SESC Simulator. http://sesc.sourceforge.net.Google ScholarGoogle Scholar
  127. Rico, A., Duran, A., Cabarcas, F., Etsion, Y., Ramirez, A., and Valero, M. 2011. Trace-driven simulation of multithreaded applications. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, Piscataway, New Jersey, 87--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. Rosenblum, M., Bugnion, E., Devine, S., and Herrod S.A. 1997. Using the SimOS machine simulator to study complex computer systems. ACM Trans. Model. Comput. Simul. 7, 1.78--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. Sanchez, H., Kuttanna, B., Olson, T., Alexander, M., Gerosa, G., Philip, R., and Alvarez, J. 1997. Thermal management system for high performance PowerPC#8482; microprocessors. In Proceedings of the 42nd IEEE International Computer Conference. IEEE, Washington, DC, 325--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. Segars, S. 2001. Low power design techniques for microprocessors. In Proceedings of the International Solid State Circuit Conference.Google ScholarGoogle Scholar
  131. Shen, X., Zhong, Y., and Ding, C. 2004. Locality phase prediction. In Proceedings of the 11th International Conference on Architectural Support for Programming Languages and Operating Systens. ACM, New York, NY, 165--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. Shen, X., Zhong, Y., and Ding, C. 2005. Phase-based miss rate prediction across program inputs. In Proceedings of the 17th International Workshop on Languages and Compilers for High Performance Computing, Springer, Berlin, Heidelberg, Germany, 42--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  133. Sherwood, T., Perelman, E., and Calder, B. 2001. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques., IEEE, Washington, DC, 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  134. Sherwood, T., Sair, S., and Calder, B. 2003. Phase tracking and prediction. In Proceedings of the 30th Annual International Symposium on Computer Architecture. ACM, New York, NY, 336--349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  135. Sherwood, T., Perelman, E., Hamerly, G., Sair, S., and Calder, B. 2003. Discovering and exploiting program phases. IEEE Micro, IEEE, Los Alamitos, CA, 23, 6, 84--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  136. Shi, X., Su, F., Peir, J., Xia, Y., and Yang, Z. 2009. Modeling and stack simulation of CMP cache capacity and accessibility. IEEE Trans. Parallel Distrib. Syst. 20, 12, 1752--1763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. Shiue, W. and Chakrabarti, C. 2001. Memory design and exploration for low power, embedded systems. The J. VLSI Signal Process. Syst. 29, 3, 167--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. Srikantaiah, S., Kandemir, M., and Irwin, M. 2008. Adaptive set pinning: Managing shared caches in chip multiprocessors. In Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, New York, NY, 135--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  139. Srikantaiah, S., Kultursay, E., Zhang, T., Kandemir, M., Irwin, M., and Xie, Y. 2011. MorphCache: A reconfigurable adaptive multi-level cache hierarchy for CMPs. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA). IEEE, Washington, DC, 231--242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  140. Srivastava, A. and Eustace, A. 1994. ATOM: A system for building customized program analysis tools. Tech. rep. 94/2, Western Research Lab, Compaq.Google ScholarGoogle ScholarDigital LibraryDigital Library
  141. Suh, G. E., Rudolph, L., and Devadas, S. 2004. Dynamic partitioning of shared cache memory. J. Supercompu. 28, 1, 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  142. Sugumar, R. and Abraham, S. 1991. Efficient simulation of multiple cache configurations using binomial trees. Tech. rep. CSE-TR-111-91.Google ScholarGoogle Scholar
  143. Sugumar, R. A. 1993. Multi-reconfiguration simulation algorithms for the evaluation of computer architecture designs. Ph.D. Thesis, University of Michigan, Ann Arbor, MI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. Tarjan, D., Thoziyoor, S., and Jouppi, N. P. 2006. CACTI 4.0, Hewlett-Packard Laboratories Technical Report # HPL-2006-86.Google ScholarGoogle Scholar
  145. Thompson, J. G., and Smith, A. J. 1989. Efficient (stack) algorithms for analysis of write-back and sector memories. ACM Transactions on Computer Systems, 7, 1, 78--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  146. Ishihara, T. and Fallah, F. 2005. A non-uniform cache architecture for low power system design. IN Proceedings of the International Symposium on Low Power Electronics and Design. ACM, New York, NY, 363--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  147. Uhlig, R. A. and Mudge, T.N. 1997. Trace-driven memory simulation: A survey. ACM Comput. Surv. 29, 2, 128--170. Google ScholarGoogle ScholarDigital LibraryDigital Library
  148. Varadarajan, K., Nandy, S., Sharda, V., Bharadwaj, A., Iyer, R., Makineni, S., and Newell, D. 2006. Molecular caches: A caching structure for dynamic creation of application-specific heterogeneous cache regions. In Proceedings of the (MICRO), IEEE, Los Alamitos, CA, 433--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. Veidenbaum, A., Tang, W., Gupta, R., Nicolau, A., and Ji. X. 1999. Adapting cache line size to application behavior. In Proceedings of the International Conference on Supercomputing. ACM, New York, NY, 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  150. Venkatachalam, V. and Franz, M. 2005. Power reduction techniques for microprocessor systems. ACM Comput. Surv. 37, 3, 195--237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. Vera, X., Bermudo, N., Llosa, J., and Gonzalez, A. 2004. A fast and accurate framework to analyze and optimize cache memory behavior. ACM Trans. Program. Lang. Syst. 26, 2, 263--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  152. Viana, P., Gordon-Ross, A., Keogh, E., Barros, E., and Vahid, F. 2006. Configurable cache subsetting for fast cache tuning. In Proceedings of the ACM Design Automation Conference. ACM, New York, NY, 695--900. Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. Viana, P., Gordon-Ross, A., Baros, E., and Vahid, F. 2008. A table-based method for single-Pass cache optimization. In Proceedings of the 18th ACM Great Lakes Symposium on VLSI. ACM, New York, NY, 71--76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  154. Vivekanandarajah, K., Sirkanthan, T., and Clarke, C. T. 2006. Profile directed instruction cache tuning for embedded systems. In Proceedings of the IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures. IEEE, Washington, DC, 227. Google ScholarGoogle ScholarDigital LibraryDigital Library
  155. Wan, H., Gao, X., Long, X., and Wang, Z. 2009. GCSim: A GPU-based trace-driven simulator for multi-level cache. Advan. Parallel Process. Technol. 177--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  156. Wenisch, T. F., Wunderlich, R. E., Ferdman, M., Ailamaki, A., Falsafi, B., and Hoe, J. C. 2006. SimFlex:Statistical sampling of computer system simulation. IEEE Micro 26, 4, 18--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  157. Witchell, E. and Rosenblum, M. 1996. Embra: Fast and flexible machine simulation. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. ACM, New York, NY, 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  158. Wunderlich, R. E., Wenisch, T. F., Falsafi, B. and Hoe, J. C. 2003. SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA), IEEE, Washington, DC, 84--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  159. Xiang, X., Bao, B., Bai, T., Ding, C., and Chilimbi, T. 2011. All-window profiling and composable models of cache sharing. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. ACM New York, NY, 91--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  160. Xie, Y. and Loh, G. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. ACM SIGARCH Comput. Architec. News 37, 3, ACM New York, NY, 174--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  161. Xu, C., Chen, X. Dick, R. P., and Mao, Z. M. 2010. Cache contention and application performance prediction for multi-core systems. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS). IEEE, Piscataway, New Jersey, 76--86.Google ScholarGoogle Scholar
  162. Yeh, T. and Reinman, G. 2005. Fast and fair: Data-stream quality of service. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES). ACM New York, NY, 237--248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  163. Yourst, M. T. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), IEEE, Piscataway, NJ, 23--34.Google ScholarGoogle ScholarCross RefCross Ref
  164. Zang, W. and Gordon-Ross, A. 2011. T-SPaCS - a two-level single-pass cache simulation methodology. In Proceedings of the 16th Asia and South Pacific Design Automation Conference. IEEE, Piscataway, NJ, 419--424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  165. Zhang, W., Hu, J. S., Degalahal, V., Kandemir, M., Vijaykrishnan, N., and Irwin, M. J. 2002. Compiler-directed instruction cache leakage optimization. In Proceedings of the 35th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-35). IEEE, Los Alamitos, CA, 208--218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. Zhang, Y., Parikh, D., Sankaranarayanan, K. Skadron, K., and Stan, M. 2003. HotLeakage: A temperature-aware model of subthreshold and gate leakage for architects. Tech. rep. CS-2003-05, Department of Computer Science, University of Virginia, Charlottesville, VA.Google ScholarGoogle Scholar
  167. Zhang, C., Vahid, F., and Lysecky, R. 2004. A self-tuning cache architecture for embedded systems. Special issue on Dynamically Adaptable Embedded System. ACM Trans. Embed. Comput. Syst. 3, 2, 1--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  168. Zhou, H., Toburen, M. C., Rotenberg, E., and Conte, T. 2003. Adaptive mode control: A static-power-efficient cache design. ACM Trans. Embed. Comput. Syst. 2, 3, 347--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  169. Zhong, Y., Dropsho, S., and Ding, C. 2003. Miss rate prediction across all program inputs. In Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques. IEEE, Washington, DC, 91--101. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A survey on cache tuning from a power/energy perspective

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Computing Surveys
          ACM Computing Surveys  Volume 45, Issue 3
          June 2013
          575 pages
          ISSN:0360-0300
          EISSN:1557-7341
          DOI:10.1145/2480741
          Issue’s Table of Contents

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 July 2013
          • Accepted: 1 February 2012
          • Revised: 1 November 2011
          • Received: 1 May 2011
          Published in csur Volume 45, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader