skip to main content
10.1145/3132402.3132414acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article

Efficient STT-RAM last-level-cache architecture to replace DRAM cache

Published:02 October 2017Publication History

ABSTRACT

Recent research has proposed die-stacked Last Level Cache (LLC) to overcome the Memory Wall. Lately, Spin-Transfer-Torque Random Access Memory (STT-RAM) caches have been recommended as they provide improved energy efficiency compared to DRAM caches. However, the recently proposed STT-RAM cache architecture unnecessarily dissipates energy by fetching unneeded cache lines into the row buffer. In this paper, we propose a Selective Read Policy for STT-RAM. This is policy only fetches those cache lines into the row buffer that are likely to be reused. This is reduces the number of cache line reads and thereby reduces the energy consumption. Further, we propose two key performance optimizations namely Row Buffer Tags Bypass Policy and LLC Data Cache. Both optimizations reduce the LLC access latency and therefore improve the overall performance. For evaluation, we implement our proposed architecture in the Zesto simulator and run different combinations of SPEC2006 benchmarks on an 8-core system. We show that our synergetic policies reduce the average LLC dynamic energy consumption by 72.6% and improve the system performance by 1.3% compared to the recently proposed STT-RAM LLC. Compared to the state-of-the-art DRAM LLC, our architecture reduces the LLC dynamic energy consumption by 90.6% and improves system performance by 1.4%.

References

  1. 2013. Hybrid Memory Cube Consortium: Hybrid Memory Cube Specification. http://www.jedec.org/standards-documents/docs/jesd235. (2013).Google ScholarGoogle Scholar
  2. 2017. Standard Performance Evaluation Corporation. http://www.spec.org. (2017). {Online; accessed 10-March-2017}.Google ScholarGoogle Scholar
  3. R. X. Arroyo, R. J. Harrington, S. P. Hartman, and T. Nguyen. 2011. IBM POWER7 Systems. IBM Journal of Research and Development 55, 3 (2011), 2:1 -- 2:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. R. Bishnoi, F. Oboril, M. Ebrahimi, and M.B. Tahoori. 2014. Avoiding Unnecessary Write Operations in STT-MRAM for Low Power Implementation. In Proceedings of the 15th International Symposium on Quality Electronic Design (ISQED'14). 548--553.Google ScholarGoogle Scholar
  5. X. Dong, C. Xu, Y. Xie, and N.P. Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (July 2012), 994--1007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Darryl Gove. 2007. CPU2006 Working Set Size. SIGARCH Computer Architecture News 35, 1 (March 2007), 90--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fazal Hameed, L. Bauer, and J. Henkel. 2013. Adaptive Cache Management for a Combined SRAM and DRAM Cache Hierarchy for Multi-Cores. In Proceedings of the 15th conference on Design, Automation and Test in Europe (DATE). 77--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fazal Hameed, L. Bauer, and J. Henkel. 2013. Reducing Inter-Core Cache Contention with an Adaptive Bank Mapping Policy in DRAM Cache. In IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fazal Hameed, L. Bauer, and J. Henkel. 2013. Simultaneously Optimizing DRAM Cache Hit Latency and Miss Rate via Novel Set Mapping Policies. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fazal Hameed, L. Bauer, and J. Henkel. 2014. Reducing Latency in an SRAM/DRAM Cache Hierarchy via a Novel Tag-Cache Architecture. In Proceedings of the 51st Design Automation Conference (DAC'14). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Fazal Hameed, L. Bauer, and J. Henkel. 2016. Architecting On-Chip DRAM Cache for Simultaneous Miss Rate and Latency Reduction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 35, 4 (April 2016), 651--664.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fazal Hameed and Jeronimo Castrillon. 2017. Rethinking On-chip DRAM Cache for Simultaneous Performance and Energy Optimization (to appear). In Proceedings of the 19th conference on Design, Automation and Test in Europe (DATE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fazal Hameed and M. B. Tahoori. 2016. Architecting STT Last-Level-Cache for Performance and Energy Improvement. In 2016 17th International Symposium on Quality Electronic Design (ISQED). 319--324.Google ScholarGoogle Scholar
  14. G. Hamerly, E. Perelman, J. Lau, and B. Calder. 2005. SimPoint 3.0: Faster and More Flexible Program Analysis. Journal of Instruction Level Parallelism 7 (2005).Google ScholarGoogle Scholar
  15. John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Computer Architecture News 34, 4 (September 2006), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C-C. Huang and V. Nagarajan. 2014. ATCache: Reducing DRAM Cache Latency via a Small SRAM Tag Cache. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques (PACT). 51--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Jevdjic, G.H. Loh, C. Kaynak, and B. Falsafi. 2014. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 25--37. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Jevdjic, S. Volos, and B. Falsafi. 2013. Die-stacked DRAM caches for Servers: Hit Ratio, Latency, or Bandwidth? Have it All with Footprint Cache. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA). 404--415. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian. 2010. CHOP: Adaptive Filter-Based DRAM Caching for CMP Server Platforms. In Proceedings of the 16th IEEE Symposium on High-Performance Computer Architecture (HPCA). 1--12.Google ScholarGoogle Scholar
  20. X. Jiang, N. Madan, L. Zhao, M. Upton, R. Iyer, S. Makineni, D. Newell, Y. Solihin, and R. Balasubramonian. 2011. CHOP: Integrating DRAM Caches For CMP Server Platforms. IEEE Micro Magazine (Top Picks), IEEE Computer Society (2011), 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Jog, A.K. Mishra, Cong Xu, Y. Xie, V. Narayanan, R. Iyer, and C.R. Das. 2012. Cache Revive: Architecting Volatile STT-RAM Caches for Enhanced Performance in CMPs. In Proceedings of the 49th IEEE/ACM Design Automation Conference (DAC '12). 243--252. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. U. Kang, H.-J. Chung, S. Heo, S.-H. Ahn, H. Lee, S.-H. Cha, J.D. Ahn, J.H. Kim, J.-W. Lee, H.-S. Joo, W.-S. Kim, H.-K. Kim, E.-M. Lee, S.-R. Kim, K.-H. Ma, D.-H. Jang, N.-S. Kim, M.-S. Cho, S.-J. Oh, J.-B. Lee, T.-K. Jung, J.-H. Yoo, and C. Kim. 2010. 8 Gb 3-D DDR3 DRAM using Through-Silicon-Via Technology. In IEEE Journal of Solid State Circuits, Vol. 45. 111--119.Google ScholarGoogle ScholarCross RefCross Ref
  23. E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu. 2013. Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative. In International Symposium on Performance Analysis of Systems and Software (ISPASS). 256--267.Google ScholarGoogle Scholar
  24. D. U. Lee, K. W. Kim, K. W. Kim, H. Kim, J. Y. Kim, Y. J. Park, J. H. Kim, D. S. Kim, H. B. Park, J. W. Shin, J. H. Cho, K. H. Kwon, M. J. Kim, J. Lee, K. W. Park, B. Chung, and S. Hong. 2014. 25.2 A 1.2V 8Gb 8-channel 128GB/s high-bandwidth memory (HBM) stacked DRAM with effective microbump I/O test methods using 29nm process and TSV. In Proceedings of the International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 432--433.Google ScholarGoogle Scholar
  25. G.H. Loh. 2009. Extending the Effectiveness of 3D Stacked DRAM Caches with an Adaptive Multi-Queue Policy. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 174--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. G.H. Loh and M.D. Hill. 2011. Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 454--464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G.H. Loh and M.D. Hill. 2012. Supporting Very Large DRAM Caches with Compound Access Scheduling and MissMaps. IEEE Micro Magazine, Special Issue on Top Picks in Computer Architecture Conferences 32, 3 (2012), 70--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G.H. Loh, S. Subramaniam, and Y. Xie. 2009. Zesto: A Cycle-Level Simulator for Highly Detailed Microarchitecture Exploration. In International Symposium on Performance Analysis of Systems and Software (ISPASS).Google ScholarGoogle Scholar
  29. J. Meza, L. Jing, and O. Mutlu. 2012. A Case for Small Row Buffers in Non-volatile Main Memories. In Proceedings of the 30th International Symposium on Computer Design(ICCD). 484--485. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. N. Muralimanohart and N. Balasubramonian, R. and Jouppi. 2007. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M.K. Qureshi and G.H. Loh. 2012. Fundamental Latency Trade-offs in Architecting DRAM Caches. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 235--246. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. S. Rixner, W. Dally, U. Kapasi, P. Mattson, and J. Owens. 2000. Memory Access Scheduling. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA). 128--138. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Sim, G.H. Loh, H. Kim, M. O??Connor, and M. Thottethodi. 2012. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch. In Proceedings of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 247--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Sim, G.H. Loh, V. Sridharan, and M. O'Connor. 2013. Resilient die-stacked DRAM caches. In Proceedings of the 40th International Symposium on Computer Architecture (ISCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. C.W. Smullen, V. Mohan, A. Nigam, S. Gurumurthi, and M.R. Stan. 2011. Relaxing Non-volatility for Fast and Energy-efficient STT-RAM Caches. In Proceedings of the 17th IEEE Symposium on High-Performance Computer Architecture (HPCA). 50--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Sun, X. Bi, H.H. Li, W-Fai Wong, Z-Liang Ong, X. Zhu, and W. Wu. 2011. Multi Retention Level STT-RAM Cache Designs with a Dynamic Refresh Scheme. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '11). 329--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. S. Thoziyoor, J.H. Muralimanohart, R. and Ahn, and N. Jouppi. 2008. CACTI 5.1 HPL 2008/20, HP Labs. (April 2008).Google ScholarGoogle Scholar
  38. Christian Weis, Matthias Jung, and Nobert Wehn. 2016. 3D Memories. Book chapter in the Handbook of 3D Integration 4 (2016).Google ScholarGoogle Scholar
  39. D. Wendel, R. Kalla, R. Cargoni, J. Clables, J. Friedrich, R. Frech, J. Kahle, B. Sinharoy, W. Starke, S. Taylor, S. Weitzel, S.G. Chu, S. Islam, and V. Zyuban. 2010. The Implementation of POWER7TM: A Highly Parallel and Scalable Multi-core High-end Server Processor. In Proceedings of the International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 102--103.Google ScholarGoogle Scholar
  40. D.H. Woo, N.H. Seong, D.L. Lewis, and H-H.S. Lee. 2010. An Optimized 3D-stacked Memory Architecture by Exploiting Excessive, High-density TSV Bandwidth. In Proceedings of the 16th IEEE Symposium on High-Performance Computer Architecture (HPCA). 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  41. W.A. Wulf and S.A. McKee. 1995. Hitting the Memory Wall: Implications of the Obvious. Computer Architecture News 23, 1 (March 1995), 20--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. P. Zhou, B. Zhao, J. Yang, and Y. Zhang. 2009. Energy Reduction for STT-RAM Using Early Write Termination. In Proceedings of the IEEE/ACM International Conference on Computer-Aided Design (ICCAD '09). 264--268. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient STT-RAM last-level-cache architecture to replace DRAM cache

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      MEMSYS '17: Proceedings of the International Symposium on Memory Systems
      October 2017
      409 pages
      ISBN:9781450353359
      DOI:10.1145/3132402

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 October 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader