skip to main content
research-article
Free Access

Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems

Published:22 March 2018Publication History
Skip Abstract Section

Abstract

This article proposes Benzene, an energy-efficient distributed SRAM/STT-RAM hybrid cache for manycore systems running multiple applications. It is based on the observation that a naïve application of hybrid cache techniques to distributed caches in a manycore architecture suffers from limited energy reduction due to uneven utilization of scarce SRAM. We propose two-level optimization techniques: intra-bank and inter-bank. Intra-bank optimization leverages highly associative cache design, achieving more uniform distribution of writes within a bank. Inter-bank optimization evenly balances the amount of write-intensive data across the banks. Our evaluation results show that Benzene significantly reduces energy consumption of distributed hybrid caches.

References

  1. Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2014. DASCA: Dead write prediction assisted STT-RAM cache architecture. In Proceedings of the International Symposium on High Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  2. Junwhan Ahn, Sungjoo Yoo, and Kiyoung Choi. 2016. Prediction hybrid cache: An energy-efficient STT-RAM cache architecture. IEEE Trans. Comput. 65, 3 (2016), 940--951. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jorge Albericio, Pablo Ibáñez, Víctor Viñals, and José M. Llabería. 2013. The reuse cache: Downsizing the shared last-level cache. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Nathan Beckmann and Daniel Sanchez. 2013. Jigsaw: Scalable software-defined caches. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nathan Beckmann, Po-An Tsai, and Daniel Sanchez. 2015. Scaling distributed cache hierarchies through computation and data co-scheduling. In Proceedings of International Symposium in High Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  6. Shane Bell, Bruce Edwards, John Amann, Rich Conlin, Kevin Joyce, Vince Leung, John MacKay, Mike Reif, Liewei Bao, John Brown, Matthew Mattina, Chyi-Chang Miao, Carl Ramey, David Wentzlaff, Walker Anderson, Ethan Berger, Nat Fairbanks, Durlov Khan, Froilan Montenegro, Jay Stickney, and John Zook. 2008. TILE64-processor: A 64-core SoC with mesh interconnect. In International Solid-State Circuits Conference Digest of Technical Papers.Google ScholarGoogle ScholarCross RefCross Ref
  7. Xiuyuan Bi, Zhenyu Sun, Hai Li, and Wenqing Wu. 2012. Probabilistic design methodology to improve run-time stability and performance of STT-RAM caches. In Proceedings of the International Conference on Computer-Aided Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yu-Ting Chen, Jason Cong, Hui Huang, Chunyue Liu, Raghu Prabhakar, and Glenn Reinman. 2012. Static and dynamic co-optimizations for blocks mapping in hybrid caches. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hsiang-Yun Cheng, Jishen Zhao, Jack Sampson, Mary Jane Irwin, Aamer Jaleel, Yu Lu, and Yuan Xie. 2016. LAP: Loop-block aware inclusion properties for energy-efficient asymmetric last level caches. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Derek Chiou, Prabhat Jain, Srinivas Devadas, and Larry Rudolph. 2000. Dynamic cache partitioning via columnization. In Proceedings of Design Automation Conference.Google ScholarGoogle Scholar
  11. Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2003. Distance associativity for high-performance energy-efficient non-uniform cache architectures. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Zeshan Chishti, Michael D. Powell, and T. N. Vijaykumar. 2005. Optimizing replication, communication, and capacity allocation in CMPs. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. George Chrysos. 2012. Intel® Xeon Phi coprocessor (codename Knights Corner). In IEEE Hot Chips Symposium.Google ScholarGoogle ScholarCross RefCross Ref
  14. Xiangyu Dong, Xiaoxia Wu, Guangyu Sun, Yuan Xie, Hai Li, and Yiran Chen. 2008. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Xiangyu Dong, Cong Xu, Yuan Xie, and Norman P. Jouppi. 2012. NVSim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 31, 7 (2012), 994--1007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Arch. News 34, 4 (2006), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Adwait Jog, Asit K. Mishra, Cong Xu, Yuan Xie, Vijaykrishnan Narayanan, Ravishankar Iyer, and Chita R. Das. 2012. Cache revive: Architecting volatile STT-RAM caches for enhanced performance in CMPs. In Proceedings of the Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Georgios Keramidas, Pavlos Petoumenos, and Stefanos Kaxiras. 2007. Cache replacement based on reuse-distance prediction. In Proceedings of the International Conference on Computer Design.Google ScholarGoogle ScholarCross RefCross Ref
  19. Samira M. Khan, Yingying Tian, and Daniel A. Jimenez. 2010. Sampling dead block prediction for last-level caches. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hyunjin Lee, Sangyeun Cho, and Bruce R. Childers. 2011. CloudCache: Expanding and shrinking private caches. In Proceedings of the International Symposium on High Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jianhua Li, Liang Shi, Chun Jason Xue, Chengmo Yang, and Yinlong Xu. 2011. Exploiting set-level write non-uniformity for energy-efficient NVM-based hybrid cache. In Proceedings of the Symposium on Embedded Systems for Real-Time Multimedia.Google ScholarGoogle ScholarCross RefCross Ref
  22. Qingan Li, Jianhua Li, Liang Shi, Chun Jason Xue, and Yanxiang He. 2012. MAC: Migration-aware compilation for STT-RAM based hybrid cache in embedded systems. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qingan Li, Mengying Zhao, Chun Jason Xue, and Yanxiang He. 2012. Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache. In Proceedings of the International Conference on Languages, Compilers, Tools and Theory for Embedded Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sheng Li, Jung Ho Ahn, Richard D. Strong, Jay B. Brockman, Dean M. Tullsen, and Norman P. Jouppi. 2013. The McPAT framework formulticore and manycore architectures: Simultaneously modeling power, area, and timing. ACM Trans. Arch. Code Optim. 10, 1 (2013), 5:1--5:29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Asit K. Mishra, Xiangyu Dong, Guangyu Sun, Yuan Xie, Vijaykrishnan Narayanan, and Chita R. Das. 2011. Architecting on-chip interconnects for stacked 3D STT-RAM caches in CMPs. In Proceedings of International Symposium in Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. Technical Report HPL-2009-85. HP Laboratories.Google ScholarGoogle Scholar
  27. Rasmus Pagh and Flemming Friche Rodler. 2001. Cuckoo hashing. In Proceedings of the European Symposium on Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. 2006. A case for MLP-aware cache replacement. In Proceedings of International Symposium in Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Moinuddin K. Qureshi, David Thompson, and Yale N. Patt. 2005. The V-Way cache: Demand-based associativity via global replacement. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daniel Sanchez and Christos Kozyrakis. 2011. Vantage: Scalable and efficient fine-grain cache partitioning. In Proceedings of International Symposium in Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Daniel Sanchez and Christos Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In Proceedings of International Symposium in Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. André Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of International Symposium in Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Clinton W. Smullen IV, Vidyabhushan Mohan, Anurag Nigam, Sudhanva Gurumurthi, and Mircea R. Stan. 2011. Relaxing non-volatility for fast and energy-efficient STT-RAM caches. In Proceedings of the International Symposium on High Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Chen Sun, Chia-Hsin Owen Chen, George Kurian, Lan Wei, Jason Miller, Anant Agarwal, Li-Shiuan Peh, and Vladimir Stojanovic. 2012. DSENT-A tool connecting emerging photonics with electronics for opto-electronic networks-on-chip modeling. In Proceedings of the International Symposium on Networks on Chip. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Guangyu Sun, Xiangyu Dong, Yuan Xie, Jian Li, and Yiran Chen. 2009. A novel architecture of the 3D stacked MRAM L2 cache for CMPs. In Proceedings of the International Symposium on High Performance Computer Architecture.Google ScholarGoogle ScholarCross RefCross Ref
  38. Zhenyu Sun, Xiuyuan Bi, Hai Li, Weng-Fai Wong, Zhong-Liang Ong, Xiaochun Zhu, and Wenqing Wu. 2011. Multi retention level STT-RAM cache designs with a dynamic refresh scheme. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jue Wang, Xiangyu Dong, and Yuan Xie. 2013. OAP: An obstruction-aware cache management policy for STT-RAM last-level caches. In Proceedings of the Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhe Wang, Daniel A. Jimenez, Cong Xu, Guangyu Sun, and Yuan Xie. 2013. Adaptive placement and migration policy for an STT-RAM-based hybrid cache. In Proceedings of the International Symposium on High Performance Computer Architecture.Google ScholarGoogle Scholar
  41. Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, Ram Rajamony, and Yuan Xie. 2009. Hybrid cache architecture with disparate memory technologies. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaoxia Wu, Jian Li, Lixin Zhang, Evan Speight, and Yuan Xie. 2011. Power and performance of read-write aware hybrid caches with non-volatile memories. In Proceedings of the Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yuejian Xie and Gabriel H. Loh. 2009. PIPP: Promotion/insertion pseudo-partitioning of multi-core shared caches. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tianhao Zheng, Jaeyoung Park, Michael Orshansky, and Mattan Erez. 2013. Variable-energy write STT-RAM architecture with bit-wise write-completion monitoring. In Proceedings of the International Symposium on Low Power Electronics and Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Ping Zhou, Bo Zhao, Jun Yang, and Youtao Zhang. 2009. Energy reduction for STT-RAM using early write termination. In Proceedings of the International Conference on Computer-Aided Design. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Benzene: An Energy-Efficient Distributed Hybrid Cache Architecture for Manycore Systems

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Architecture and Code Optimization
              ACM Transactions on Architecture and Code Optimization  Volume 15, Issue 1
              March 2018
              401 pages
              ISSN:1544-3566
              EISSN:1544-3973
              DOI:10.1145/3199680
              Issue’s Table of Contents

              Copyright © 2018 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 22 March 2018
              • Accepted: 1 December 2017
              • Revised: 1 November 2017
              • Received: 1 May 2017
              Published in taco Volume 15, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader