skip to main content
10.1145/3180665.3180672acmotherconferencesArticle/Chapter ViewAbstractPublication PagesrapidoConference Proceedingsconference-collections
research-article

NVMain Extension for Multi-Level Cache Systems

Published: 22 January 2018 Publication History

Abstract

In this paper, we present an extension of the NVMain memory simulator. The objective is to facilitate computer architects to model complex memory designs for future computing systems in an accurate simulation framework. The simulator supports commodity memory models for DRAM as well as emerging non-volatile memories technologies such STT-RAM, ReRAM, PCRAM and hybrid models. The current publicly available version of NVMain, NVMain 2.0, offers support for main memory (using DRAM and NVM technologies) and a die-stacked DRAM cache. We extend the cache model of the simulator by introducing an SRAM cache model and its supporting modules. With this addition, designers can model hybrid multi-level cache hierarchies by using the die-stacked DRAM cache and SRAM caches. We provide a reference implementation of an optimized cache organization scheme for die-stacked DRAM cache along with a tag-cache unit that, together, reduces cache miss latency. To enable integration of the new features in the existing memory hierarchy, we make necessary changes to the memory controller. We provide functional verification of the new modules and put forward our approach for timing and power verification. We run random mixes of the SPEC2006 benchmarks and observe ±10% difference in simulation results.

References

[1]
E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega. Cotson: Infrastructure for full system simulation. SIGOPS Oper. Syst. Rev., 43(1):52--61, Jan. 2009.
[2]
R. Bedicheck. Simnow: Fast platform simulation purely in software. In Hot Chips 16, 2004.
[3]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1--7, Aug. 2011.
[4]
P. Bohrer, J. Peterson, M. Elnozahy, R. Rajamony, A. Gheith, R. Rockhold, C. Lefurgy, H. Shafi, T. Nakra, R. Simpson, E. Speight, K. Sudeep, E. Van Hensbergen, and L. Zhang. Mambo: A full system simulator for the powerpc architecture. SIGMETRICS Perform. Eval. Rev., 31(4):8--12, Mar. 2004.
[5]
T. E. Carlson, W. Heirman, and L. Eeckhout. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 52:1--52:12, New York, NY, USA, 2011. ACM.
[6]
J. Castrillon, M. Lieber, S. Klüppelholz, M. Völp, N. Asmussen, U. Assmann, F. Baader, C. Baier, G. Fettweis, J. Fröhlich, A. Goens, S. Haas, D. Habich, H. Härtig, M. Hasler, I. Huismann, T. Karnagel, S. Karol, A. Kumar, W. Lehner, L. Leuschner, S. Ling, S. Märcker, C. Menard, J. Mey, W. Nagel, B. Nöthen, R. Peñaloza, M. Raitza, J. Stiller, A. Ungethüm, A. Voigt, and S. Wunderlich. A hardware/software stack for heterogeneous systems. IEEE Transactions on Multi-Scale Computing Systems, Nov. 2017.
[7]
K. Chandrasekar, B. Akesson, and K. Goossens. Improved power modeling of ddr sdrams. In Proceedings of the 2011 14th Euromicro Conference on Digital System Design, DSD '11, pages 99--108, Washington, DC, USA, 2011. IEEE Computer Society.
[8]
X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. Nvsim: A circuit-level performance, energy, and area model for emerging nonvolatile memory. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 31(7):994--1007, July 2012.
[9]
F. Hameed, L. Bauer, and J. Henkel. Simultaneously optimizing dram cache hit latency and miss rate via novel set mapping policies. In 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), pages 1--10, Sept 2013.
[10]
F. Hameed, L. Bauer, and J. Henkel. Architecting on-chip dram cache for simultaneous miss rate and latency reduction. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 35(4):651--664, April 2016.
[11]
F. Hameed, C. Menard, and J. Castrillon. Efficient stt-ram last-level-cache architecture to replace dram cache. In Proceedings of the International Symposium on Memory Systems (MemSys'17), MEMSYS '17, pages 141--151, New York, NY, USA, Oct. 2017. ACM.
[12]
G. HAMERLY. Simpoint 3.0: Faster and more flexible program analysis. Workshop on Modeling, Benchmarking and Simulation, 2005, 2005.
[13]
J. L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34(4):1--17, Sept. 2006.
[14]
C.-C. Huang and V. Nagarajan. Atcache: Reducing dram cache latency via a small sram tag cache. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT '14, pages 51--60, New York, NY, USA, 2014. ACM.
[15]
M. Jung, C. Weis, and N. Wehn. Dramsys: A flexible dram subsystem design space exploration framework. IPSJ Transactions on System LSI Design Methodology, 8:63--74, 2015.
[16]
Y. Kim, W. Yang, and O. Mutlu. Ramulator: A fast and extensible dram simulator. IEEE Comput. Archit. Lett., 15(1):45--49, Jan. 2016.
[17]
G. Loh and M. D. Hill. Supporting very large dram caches with compound-access scheduling and missmap. IEEE Micro, 32(3):70--78, May 2012.
[18]
G. H. Loh, S. Subramaniam, and Y. Xie. Zesto: A cycle-level simulator for highly detailed microarchitecture exploration. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pages 53--64, April 2009.
[19]
C. Menard, J. Castrillón, M. Jung, and N. Wehn. System simulation with gem5 and systemc the keystone for full interoperability. 2017.
[20]
J. Meza, J. Chang, H. Yoon, O. Mutlu, and P. Ranganathan. Enabling efficient and scalable hybrid memories using fine-granularity dram cache management. IEEE Computer Architecture Letters, 11(2):61--64, July 2012.
[21]
J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture, pages 1--12, Jan 2010.
[22]
N. Muralimanohart and N. Balasubramonian, R. and Jouppi. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 3--14, December 2007.
[23]
M. Pellauer, M. Adler, M. Kinsy, A. Parashar, and J. Emer. Hasim: Fpga-based high-detail multicore simulation using time-division multiplexing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture, pages 406--417, Feb 2011.
[24]
M. Poremba and Y. Xie. Nvmain: An architectural-level main memory simulator for emerging non-volatile memories. In 2012 IEEE Computer Society Annual Symposium on VLSI, pages 392--397, Aug 2012.
[25]
M. Poremba, T. Zhang, and Y. Xie. Nvmain 2.0: A user-friendly memory simulator to model (non-)volatile memory systems. IEEE Computer Architecture Letters, 14(2):140--143, July 2015.
[26]
M. K. Qureshi and G. H. Loh. Fundamental latency trade-off in architecting dram caches: Outperforming impractical sram-tags with a simple and practical design. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, pages 235--246, Washington, DC, USA, 2012. IEEE Computer Society.
[27]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. Dramsim2: A cycle accurate memory system simulator. IEEE Comput. Archit. Lett., 10(1):16--19, Jan. 2011.
[28]
J. Stevens, P. Tschirhart, C. Mu-Tien, I. Bhati, P. Enns, J. Greensky, Z. Chisti, L. Shih-Lien, and B. Jacob. An integrated simulation infrastructure for the entire memory hierarchy: Cache, dram, nonvolati le memory, and disk. Intel Technology Journal, 17(1):184--200, 2013.
[29]
D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob. Dramsim: A memory system simulator. SIGARCH Comput. Archit. News, 33(4):100--107, Nov. 2005.
[30]
K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of ibm mambo system simulator in functional modes. SIGOPS Oper. Syst. Rev., 42(1):71--76, Jan. 2008.

Cited By

View all
  • (2022)HyCSim: A rapid design space exploration tool for emerging hybrid last-level cachesSystem Engineering for constrained embedded systems10.1145/3522784.3522801(53-58)Online publication date: 17-Jan-2022
  • (2022)Design and Simulation of Content-Aware Hybrid DRAM-PCM Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312353933:7(1666-1677)Online publication date: 1-Jul-2022
  • (2021)Improving the Performance of Block-based DRAM Caches Via Tag-Data DecouplingIEEE Transactions on Computers10.1109/TC.2020.302961570:11(1914-1927)Online publication date: 1-Nov-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
RAPIDO '18: Proceedings of the Rapido'18 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
January 2018
51 pages
ISBN:9781450364171
DOI:10.1145/3180665
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

In-Cooperation

  • HiPEAC: HiPEAC Network of Excellence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 January 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cache Organization
  2. Memory Simulator
  3. Row Buffer
  4. SRAM Cache
  5. Tag-cache

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

RAPIDO '18
RAPIDO '18: Methods and Tools
January 22 - 24, 2018
Manchester, United Kingdom

Acceptance Rates

Overall Acceptance Rate 14 of 28 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)HyCSim: A rapid design space exploration tool for emerging hybrid last-level cachesSystem Engineering for constrained embedded systems10.1145/3522784.3522801(53-58)Online publication date: 17-Jan-2022
  • (2022)Design and Simulation of Content-Aware Hybrid DRAM-PCM Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312353933:7(1666-1677)Online publication date: 1-Jul-2022
  • (2021)Improving the Performance of Block-based DRAM Caches Via Tag-Data DecouplingIEEE Transactions on Computers10.1109/TC.2020.302961570:11(1914-1927)Online publication date: 1-Nov-2021
  • (2020)Prefetching in hybrid main memory systemsProceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems10.5555/3488733.3488744(11-11)Online publication date: 13-Jul-2020
  • (2020)System simulation with PULP virtual platform and SystemCProceedings of the Conference on Rapid Simulation and Performance Evaluation: Methods and Tools10.1145/3375246.3375256(1-7)Online publication date: 21-Jan-2020
  • (2019)RTSim: A Cycle-Accurate Simulator for Racetrack MemoriesIEEE Computer Architecture Letters10.1109/LCA.2019.289930618:1(43-46)Online publication date: 1-Jan-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media