research-article

A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool

Authors:
Patrick Siegl

TU Braunschweig, Braunschweig, Germany

TU Braunschweig, Braunschweig, Germany
View Profile

,
Rainer Buchty

TU Braunschweig, Braunschweig, Germany

TU Braunschweig, Braunschweig, Germany
View Profile

,
Mladen Berekovic

TU Braunschweig, Braunschweig, Germany

TU Braunschweig, Braunschweig, Germany
View Profile

MEMSYS '17: Proceedings of the International Symposium on Memory SystemsOctober 2017Pages 71–82https://doi.org/10.1145/3132402.3132403

Published:02 October 2017Publication History

MEMSYS '17: Proceedings of the International Symposium on Memory Systems

Pages 71–82

ABSTRACT

Derived by the demand for ever increasing computing performance, a steadily widening performance gap between memory and processor architectures has emerged. While attempting to mitigate the effects for processing systems that already face the exascale barrier and beyond, energy-efficient computing was identified as the critical topic to provide further scaling. Memory architectures, persistently known as slow, energy-hungry and cost-intensive, require novel findings to aid in increasing the energy efficiency as well as bandwidth. A quick fix for the performance aspect seems to be 3D stacking of such planar memories, that is available in the form of the High Bandwidth Memory (HBM) and the Hybrid Memory Cube (HMC). With the latter allowing to embed custom logic, novel non-von Neumann architectures can be accomplished, overcoming the performance gap while achieving a new path for scaling the computing performance. Considering the broad spectrum of custom logic that could be integrated into a mesh of HMCs, comprehensive modeling tools are required, enabling holistic design-space explorations for computing systems in breadth and depth. Fulfilling this demand, an HMC-modeling tool was implemented, providing rapid simulation of multiple interconnected HMCs that can run either in a functional or in a bandwidth-accurate mode. Since flexibility is a key for subsequent studies, the HMC-modeling tool is parameterizable whereas internal components can be adjusted.

References

Juha Alakarhu and Jarkko Niittylahti. 2002. DRAM simulator for design and analysis of digital systems. Microprocessors and Microsystems 26, 4 (2002), 189--198.Google ScholarCross Ref
E. Azarkhish, D. Rossi, I. Loi, and L. Benini. 2015. High performance AXI-4.0 based interconnect for extensible smart memory cubes. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1317--1322. Google ScholarDigital Library
Erfan Azarkhish, Davide Rossi, Igor Loi, and Luca Benini. 2016. Design and Evaluation of a Processing-in-Memory Architecture for the Smart Memory Cube. In Proceedings of the 29th International Conference on Architecture of Computing Systems - ARCS 2016 - Volume 9637. Springer-Verlag New York, Inc., New York, NY, USA, 19--31. Google ScholarDigital Library
Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (aug 2011), 1--7. Google ScholarDigital Library
Lukai Cai and Daniel Gajski. 2003. Transaction Level Modeling: An Overview. In Proceedings of the 1st IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '03). ACM, New York, NY, USA, 19--24. Google ScholarDigital Library
Arnaldo Carvalho de Melo. 2010. The New Linux 'perf' tools. In Linux Kongress.Google Scholar
Karthik Chandrasekar, Christian Weis, Yonghui Li, Sven Goossens, Matthias Jung, Omar Naji, Benny Akesson, Norbert Wehn, and Kees Goossens. 2011. DRAMPower: Open-source DRAM Power & Energy Estimation Tool. http://www.drampower.info. (2011).Google Scholar
Kevin Chang and Yoongu Kim. 2016. Ramulator#: A fast and lightweight DRAM simulator. https://github.com/CMU-SAFARI/RamulatorSharp. (2016).Google Scholar
Niladrish Chatterjee, Rajeev Balasubramonian, Manjunath Shevgoor, Seth Pugsley, Aniruddha Udipi, Ali Shafiee, Kshitij Sudan, Manu Awasthi, and Zeshan Chishti. 2012. USIMM: the Utah SImulated Memory Module. Technical Report. University of Utah and Intel Corp.Google Scholar
K. Chen, S. Li, N. Muralimanohar, J. H. Ahn, J. B. Brockman, and N. P. Jouppi. 2012. CACTI-3DD: Architecture-level modeling for 3D die-stacked DRAM main memory. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE). 33--38. Google ScholarDigital Library
Hybrid Memory Cube Consortium. 2015. Hybrid Memory Cube Specification 2.1. Technical Report.Google Scholar
Elliott Cooper-Balis. 2012. BUFFER-ON-BOARD MEMORY SYSTEM. Ph.D. Dissertation. University of Maryland.Google Scholar
E. Cooper-Balis, P. Rosenfeld, and B. Jacob. 2012. Buffer-on-board memory systems. In Computer Architecture (ISCA), 2012 39th Annual International Symposium on. 392--403. Google ScholarDigital Library
X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. 2012. NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 7 (July 2012), 994--1007. Google ScholarDigital Library
John Ellson, Emden Gansner, Lefteris Koutsofios, Stephen North, Gordon Woodhull, Short Description, and Lucent Technologies. 2001. Graphviz --- open source graph drawing tools. In Lecture Notes in Computer Science. Springer-Verlag, 483--484.Google Scholar
Maya Gokhale, Scott Lloyd, and Chris Macaraeg. 2015. Hybrid Memory Cube Performance Characterization on Data-centric Workloads. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms (IA3 '15). ACM, New York, NY, USA, Article 7, 8 pages. Google ScholarDigital Library
Thorsten Grotker. 2002. System Design with SystemC. Kluwer Academic Publishers, Norwell, MA, USA. Google ScholarDigital Library
Andreas Hansson, Neha Agarwal, Aasheesh Kolli, Thomas F. Wenisch, and Aniruddha N. Udipi. 2014. Simulating DRAM controllers for future system architecture exploration. In 2014 IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2014, Monterey, CA, USA, March 23--25, 2014. 201--210.Google ScholarCross Ref
John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH Comput. Archit. News 34, 4 (sep 2006), 1--17. Google ScholarDigital Library
Bruce Jacob. 2009. The Memory System: You Can'T Avoid It, You Can'T Ignore It, You Can'T Fake It. Morgan and Claypool Publishers. Google ScholarDigital Library
R. Jagtap, S. Diestelhorst, A. Hansson, M. Jung, and N. When. 2016. Exploring system performance using elastic traces: Fast, accurate and portable. In 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS). 96--105.Google Scholar
D. I. Jeon and K. S. Chung. 2016. CasHMC: A Cycle-accurate Simulator for Hybrid Memory Cube. IEEE Computer Architecture Letters PP, 99 (2016), 1--1.Google Scholar
Min Kyu Jeong, Doe Hyun Yoon, and Mattan Erez. {n. d.}. DrSim: A Platform for Flexible DRAM System Research. ({n. d.}). http://lph.ece.utexas.edu/public/DrSimGoogle Scholar
Matthias Jung, Christian Weis, and Norbert Wehn. 2015. DRAMSys: A Flexible DRAM Subsystem Design Space Exploration Framework. IPSJ Transactions on System LSI Design Methodology 8 (2015), 63--74.Google ScholarCross Ref
M. J. Khurshid and M. Lipasti. 2013. Data compression for thermal mitigation in the Hybrid Memory Cube. In 2013 IEEE 31st International Conference on Computer Design (ICCD). 185--192.Google Scholar
Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Computer Architecture Letters 15, 1 (Jan 2016), 45--49. Google ScholarDigital Library
Sangho Lee, Teresa Johnson, and Easwaran Raman. 2014. Feedback Directed Optimization of TCMalloc. In Proceedings of the Workshop on Memory Systems Performance and Correctness (MSPC '14). ACM, New York, NY, USA, Article 3, 8 pages. Google ScholarDigital Library
John Leidel and Yong Chen. 2016. HMC-Sim-2.0: A Simulation Platform for Exploring Custom Memory Cube Operations. In Sixth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES). 10.Google ScholarCross Ref
J. D. Leidel and Y. Chen. 2014. HMC-Sim: A Simulation Framework for Hybrid Memory Cube Devices. In Parallel Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International. 1465--1474. Google ScholarDigital Library
Rolf Meyer, Jan Wagner, Bastian Farkas, Sven Horsinka, Patrick Siegl, Rainer Buchty, and Mladen Berekovic. 2016. A Scriptable Standard-Compliant Reporting and Logging Framework for SystemC. ACM Trans. Embed. Comput. Syst. 16, 1, Article 6 (oct 2016), 28 pages. Google ScholarDigital Library
Sparsh Mittal, Matthew Poremba, Jeffrey Vetter, and Yuan Xie. 2015. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool. https://www.academia.edu/9741921/Exploring_Design_Space_of_3D_NVM_and_eDRAM_Caches_Using_DESTINY_ToolGoogle Scholar
M. Motoyoshi. 2009. Through-Silicon Via (TSV). Proc. IEEE 97, 1 (Jan 2009), 43--48.Google ScholarCross Ref
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2007. CACTI 6.0: A Tool to Model Large Caches. Technical Report. HP Laboratories, Chicago. International Symposium on Microarchitecture.Google Scholar
R. Nair. 2015. Evolution of Memory Architecture. Proc. IEEE 103, 8 (Aug 2015), 1331--1345.Google ScholarCross Ref
R. Nair, S. F. Antao, C. Bertolli, P. Bose, J. R. Brunheroto, T. Chen, C. Y. Cher, C. H. A. Costa, J. Doi, C. Evangelinos, B. M. Fleischer, T. W. Fox, D. S. Gallo, L. Grinberg, J. A. Gunnels, A. C. Jacob, P. Jacob, H. M. Jacobson, T. Karkhanis, C. Kim, J. H. Moreno, J. K. O'Brien, M. Ohmacht, Y. Park, D. A. Prener, B. S. Rosenburg, K. D. Ryu, O. Sallenave, M. J. Serrano, P. D. M. Siegl, K. Sugavanam, and Z. Sura. 2015. Active Memory Cube: A processing-in-memory architecture for exascale systems. IBM Journal of Research and Development 59, 2/3 (March 2015), 17--1.Google ScholarDigital Library
J. Thomas Pawlowski. 2011. Hybrid Memory Cube (HMC). (August 2011). HOTCHIPS23Google Scholar
M. Poremba, S. Mittal, D. Li, J. S. Vetter, and Y. Xie. 2015. DESTINY: A tool for modeling emerging 3D NVM and eDRAM caches. In 2015 Design, Automation Test in Europe Conference Exhibition (DATE). 1543--1546. Google ScholarDigital Library
M. Poremba, T. Zhang, and Y. Xie. 2015. NVMain 2.0: A User-Friendly Memory Simulator to Model (Non-)Volatile Memory Systems. IEEE Computer Architecture Letters 14, 2 (July 2015), 140--143. Google ScholarDigital Library
A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. 2011. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev. 38, 4 (mar 2011), 37--42. Google ScholarDigital Library
P. Rosenfeld. 2014. Performance Exploration of the Hybrid Memory Cube. Ph.D. Dissertation. University of Maryland. Ph.D. thesis.Google Scholar
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. 2011. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE Computer Architecture Letters 10, 1 (Jan 2011), 16--19. Google ScholarDigital Library
Boris Schling. 2011. The Boost C++ Libraries. XML Press. Google ScholarDigital Library
Tezzaron Semiconductor. 2005. Tezzaron Unveils 3D SRAM. (January 2005).Google Scholar
P. Siegl, R. Buchty, and M. Berekovic. 2016. Data-Centric Computing Frontiers: A Survey On Processing-In-Memory. In Proceedings of the 2016 International Symposium on Memory Systems. ACM. accepted. Google ScholarDigital Library
David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Kathleen Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A Memory System Simulator. SIGARCH Comput. Archit. News 33, 4 (nov 2005), 100--107. Google ScholarDigital Library
Christian Weis, Abdul Mutaal, Omar Naji, Matthias Jung, Andreas Hansson, and Norbert Wehn. 2016. DRAMSpec: A High-Level DRAM Timing, Power and Area Exploration Tool. International Journal of Parallel Programming (15 Nov 2016). Google ScholarDigital Library
X. Zhang, Y. Zhang, and J. Yang. 2015. DLB: Dynamic lane borrowing for improving bandwidth and performance in Hybrid Memory Cube. In 2015 33rd IEEE International Conference on Computer Design (ICCD). 125--132. Google ScholarDigital Library

Index Terms

A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware
  1. Emerging technologies

Recommendations

A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smart-memory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited ...
Read More
Data-Centric Computing Frontiers: A Survey On Processing-In-Memory
MEMSYS '16: Proceedings of the Second International Symposium on Memory Systems

A major shift from compute-centric to data-centric computing systems can be perceived, as novel big data workloads like cognitive computing and machine learning strongly enforce embarrassingly parallel and highly efficient processor architectures. With ...
Read More
Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems

First defined two decades ago, the memory wall remains a fundamental limitation to system performance. Recent innovations in 3D-stacking technology enable DRAM devices with much higher bandwidths than traditional DIMMs. The first such products will soon ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MEMSYS '17: Proceedings of the International Symposium on Memory Systems
October 2017
409 pages
ISBN:9781450353359
DOI:10.1145/3132402
General Chair:
Bruce Jacob
University of Maryland
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
bandwidth wall
memory architectures
memory wall
modeling
processing-in-memory
simulation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 186
  Total Downloads
- Downloads (Last 12 months)14
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool

MEMSYS '17: Proceedings of the International Symposium on Memory Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Data-Centric Computing Frontiers: A Survey On Processing-In-Memory

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A bandwidth accurate, flexible and rapid simulating multi-HMC modeling tool

MEMSYS '17: Proceedings of the International Symposium on Memory Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

Data-Centric Computing Frontiers: A Survey On Processing-In-Memory

Another Trip to the Wall: How Much Will Stacked DRAM Benefit HPC?

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media