ABSTRACT
Derived by the demand for ever increasing computing performance, a steadily widening performance gap between memory and processor architectures has emerged. While attempting to mitigate the effects for processing systems that already face the exascale barrier and beyond, energy-efficient computing was identified as the critical topic to provide further scaling. Memory architectures, persistently known as slow, energy-hungry and cost-intensive, require novel findings to aid in increasing the energy efficiency as well as bandwidth. A quick fix for the performance aspect seems to be 3D stacking of such planar memories, that is available in the form of the High Bandwidth Memory (HBM) and the Hybrid Memory Cube (HMC). With the latter allowing to embed custom logic, novel non-von Neumann architectures can be accomplished, overcoming the performance gap while achieving a new path for scaling the computing performance. Considering the broad spectrum of custom logic that could be integrated into a mesh of HMCs, comprehensive modeling tools are required, enabling holistic design-space explorations for computing systems in breadth and depth. Fulfilling this demand, an HMC-modeling tool was implemented, providing rapid simulation of multiple interconnected HMCs that can run either in a functional or in a bandwidth-accurate mode. Since flexibility is a key for subsequent studies, the HMC-modeling tool is parameterizable whereas internal components can be adjusted.
- Juha Alakarhu and Jarkko Niittylahti. 2002. DRAM simulator for design and analysis of digital systems. Microprocessors and Microsystems 26, 4 (2002), 189--198.Google ScholarCross Ref
- E. Azarkhish, D. Rossi, I. Loi, and L. Benini. 2015. High performance AXI-4.0 based interconnect for extensible smart memory cubes. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1317--1322. Google ScholarDigital Library
- Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2016. Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube. In Proceedings of the 29th International Conference on Architecture of Computing Systems - ARCS 2016 - Volume 9637. Springer-Verlag New York, Inc., New York, NY, USA, 19--31. Google ScholarDigital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (aug 2011), 1--7. Google ScholarDigital Library
- Lukai Cai and Daniel Gajski. 2003. Transaction Level Modeling: An Overview. In Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '03). ACM, New York, NY, USA, 19--24. Google ScholarDigital Library
- Arnaldo Carvalho de Melo. 2010. The New Linux 'perf' tools. In Linux Kongress.Google Scholar
- Karthik Chandrasekar, Christian Weis, Yonghui Li, Sven Goossens, Matthias Jung, Omar Naji, Benny Akesson, Norbert Wehn, and Kees Goossens. 2011. DRAMPower: Open-source DRAM Power & Energy Estimation Tool. http://www.drampower.info. (2011).Google Scholar
- Kevin Chang and Yoongu Kim. 2016. Ramulator#: A fast and lightweight DRAM simulator. https://github.com/CMU-SAFARI/RamulatorSharp. (2016).Google Scholar
- Niladrish Chatterjee, Rajeev Balasubramonian, Manjunath Shevgoor, Seth Pugsley, Aniruddha Udipi, Ali Shafiee, Kshitij Sudan, Manu Awasthi, and Zeshan Chishti. 2012. USIMM: the Utah SImulated Memory Module. Technical Report. University of Utah and Intel Corp.Google Scholar
- K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE). 33--38. Google ScholarDigital Library
- Hybrid Memory Cube Consortium. 2015. Hybrid Memory Cube Specification 2.1. Technical Report.Google Scholar
- Elliott Cooper-Balis. 2012. BUFFER-ON-BOARD MEMORY SYSTEM. Ph.D. Dissertation. University of Maryland.Google Scholar
- E. Cooper-Balis, P. Rosenfeld, and B. Jacob. 2012. Buffer-on-board memory systems. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on. 392--403. Google ScholarDigital Library
- X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (July 2012), 994--1007. Google ScholarDigital Library
- John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen North, Gordon Woodhull, Short Description, and Lucent Technologies. 2001. Graphviz --- open source graph drawing tools. In Lecture Notes in Computer Science. Springer-Verlag, 483--484.Google Scholar
- Maya Gokhale, Scott Lloyd, and Chris Macaraeg. 2015. Hybrid Memory Cube Performance Characterization on Data-centric Workloads. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3 '15). ACM, New York, NY, USA, Article 7, 8 pages. Google ScholarDigital Library
- Thorsten Grotker. 2002. System Design with SystemC. Kluwer Academic Publishers, Norwell, MA, USA. Google ScholarDigital Library
- Andreas Hansson, Neha Agarwal, Aasheesh Kolli, Thomas F. Wenisch, and Aniruddha N. Udipi. 2014. Simulating DRAM controllers for future system architecture exploration. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2014, Monterey, CA, USA, March 23--25, 2014. 201--210.Google ScholarCross Ref
- John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (sep 2006), 1--17. Google ScholarDigital Library
- Bruce Jacob. 2009. The Memory System: You Can'T Avoid It, You Can'T Ignore It, You Can'T Fake It. Morgan and Claypool Publishers. Google ScholarDigital Library
- R. Jagtap, S. Diestelhorst, A. Hansson, M. Jung, and N. When. 2016. Exploring system performance using elastic traces: Fast, accurate and portable. In 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). 96--105.Google Scholar
- D. I. Jeon and K. S. Chung. 2016. CasHMC: A Cycle-accurate Simulator for Hybrid Memory Cube. IEEE Computer Architecture Letters PP, 99 (2016), 1--1.Google Scholar
- Min Kyu Jeong, Doe Hyun Yoon, and Mattan Erez. {n. d.}. DrSim: A Platform for Flexible DRAM System Research. ({n. d.}). http://lph.ece.utexas.edu/public/DrSimGoogle Scholar
- Matthias Jung, Christian Weis, and Norbert Wehn. 2015. DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework. IPSJ Transactions on System LSI Design Methodology 8 (2015), 63--74.Google ScholarCross Ref
- M. J. Khurshid and M. Lipasti. 2013. Data compression for thermal mitigation in the Hybrid Memory Cube. In 2013 IEEE 31st International Conference on Computer Design (ICCD). 185--192.Google Scholar
- Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters 15, 1 (Jan 2016), 45--49. Google ScholarDigital Library
- Sangho Lee, Teresa Johnson, and Easwaran Raman. 2014. Feedback Directed Optimization of TCMalloc. In Proceedings of the Workshop on Memory Systems Performance and Correctness (MSPC '14). ACM, New York, NY, USA, Article 3, 8 pages. Google ScholarDigital Library
- John Leidel and Yong Chen. 2016. HMC-Sim-2.0: A Simulation Platform for Exploring Custom Memory Cube Operations. In Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES). 10.Google ScholarCross Ref
- J. D. Leidel and Y. Chen. 2014. HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices. In Parallel Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. 1465--1474. Google ScholarDigital Library
- Rolf Meyer, Jan Wagner, Bastian Farkas, Sven Horsinka, Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. A Scriptable Standard-Compliant Reporting and Logging Framework for SystemC. ACM Trans. Embed. Comput. Syst. 16, 1, Article 6 (oct 2016), 28 pages. Google ScholarDigital Library
- Sparsh Mittal, Matthew Poremba, Jeffrey Vetter, and Yuan Xie. 2015. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool. https://www.academia.edu/9741921/Exploring_Design_Space_of_3D_NVM_and_eDRAM_Caches_Using_DESTINY_ToolGoogle Scholar
- M. Motoyoshi. 2009. Through-Silicon Via (TSV). Proc. IEEE 97, 1 (Jan 2009), 43--48.Google ScholarCross Ref
- Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2007. CACTI 6.0: A Tool to Model Large Caches. Technical Report. HP Laboratories, Chicago. International Symposium on Microarchitecture.Google Scholar
- R. Nair. 2015. Evolution of Memory Architecture. Proc. IEEE 103, 8 (Aug 2015), 1331--1345.Google ScholarCross Ref
- R. Nair, S. F. Antao, C. Bertolli, P. Bose, J. R. Brunheroto, T. Chen, C. Y. Cher, C. H. A. Costa, J. Doi, C. Evangelinos, B. M. Fleischer, T. W. Fox, D. S. Gallo, L. Grinberg, J. A. Gunnels, A. C. Jacob, P. Jacob, H. M. Jacobson, T. Karkhanis, C. Kim, J. H. Moreno, J. K. O'Brien, M. Ohmacht, Y. Park, D. A. Prener, B. S. Rosenburg, K. D. Ryu, O. Sallenave, M. J. Serrano, P. D. M. Siegl, K. Sugavanam, and Z. Sura. 2015. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (March 2015), 17--1.Google ScholarDigital Library
- J. Thomas Pawlowski. 2011. Hybrid Memory Cube (HMC). (August 2011). HOTCHIPS23Google Scholar
- M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1543--1546. Google ScholarDigital Library
- M. Poremba, T. Zhang, and Y. Xie. 2015. NVMain 2.0: A User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems. IEEE Computer Architecture Letters 14, 2 (July 2015), 140--143. Google ScholarDigital Library
- A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. 2011. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev. 38, 4 (mar 2011), 37--42. Google ScholarDigital Library
- P. Rosenfeld. 2014. Performance Exploration of the Hybrid Memory Cube. Ph.D. Dissertation. University of Maryland. Ph.D. thesis.Google Scholar
- P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (Jan 2011), 16--19. Google ScholarDigital Library
- Boris Schling. 2011. The Boost C++ Libraries. XML Press. Google ScholarDigital Library
- Tezzaron Semiconductor. 2005. Tezzaron Unveils 3D SRAM. (January 2005).Google Scholar
- P. Siegl, R. Buchty, and M. Berekovic. 2016. Data-Centric Computing Frontiers: A Survey On Processing-In-Memory. In Proceedings of the 2016 International Symposium on Memory Systems. ACM. accepted. Google ScholarDigital Library
- David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Kathleen Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A Memory System Simulator. SIGARCH Comput. Archit. News 33, 4 (nov 2005), 100--107. Google ScholarDigital Library
- Christian Weis, Abdul Mutaal, Omar Naji, Matthias Jung, Andreas Hansson, and Norbert Wehn. 2016. DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool. International Journal of Parallel Programming (15 Nov 2016). Google ScholarDigital Library
- X. Zhang, Y. Zhang, and J. Yang. 2015. DLB: Dynamic lane borrowing for improving bandwidth and performance in Hybrid Memory Cube. In 2015 33rd IEEE International Conference on Computer Design (ICCD). 125--132. Google ScholarDigital Library
Index Terms
- A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool
Recommendations
A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System
The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smart-memory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited ...
Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
MEMSYS '16: Proceedings of the Second International Symposium on Memory SystemsA major shift from compute-centric to data-centric computing systems can be perceived, as novel big data workloads like cognitive computing and machine learning strongly enforce embarrassingly parallel and highly efficient processor architectures. With ...
Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory SystemsFirst defined two decades ago, the memory wall remains a fundamental limitation to system performance. Recent innovations in 3D-stacking technology enable DRAM devices with much higher bandwidths than traditional DIMMs. The first such products will soon ...
Comments