Abstract
Recent DRAM specifications exhibit increasing refresh latencies. A refresh command blocks a full rank, decreasing available parallelism in the memory subsystem significantly, thus decreasing performance. Fine Granularity Refresh (FGR) is a feature recently announced as part of JEDEC's DDR4 DRAM specification that attempts to tackle this problem by creating a range of refresh options that provide a trade-off between refresh latency and frequency.
In this paper, we first conduct an analysis of DDR4 DRAM's FGR feature, and show that there is no one-size-fits-all option across a variety of applications. We then present Adaptive Refresh (AR), a simple yet effective mechanism that dynamically chooses the best FGR mode for each application and phase within the application.
When looking at the refresh problem more closely, we identify in high-density DRAM systems a phenomenon that we call command queue seizure, whereby the memory controller's command queue seizes up temporarily because it is full with commands to a rank that is being refreshed. To attack this problem, we propose two complementary mechanisms called Delayed Command Expansion (DCE) and Preemptive Command Drain (PCD).
Our results show that AR does exploit DDR4's FGR effectively. However, once our proposed DCE and PCD mechanisms are added, DDR4's FGR becomes redundant in most cases, except in a few highly memory-sensitive applications, where the use of AR does provide some additional benefit. In all, our simulations show that the proposed mechanisms yield 8% (14%) mean speedup with respect to traditional refresh, at normal (extended) DRAM operating temperatures, for a set of diverse parallel applications.
- JEDEC DDR4 SDRAM Standard, 2012. http://www.jedec.org/standards-documents/docs/jesd79--4.Google Scholar
- ASHRAE Technical Committee. 2011 Thermal Guidelines for Data Processing Environments - Expanded Data Center Classes and Usage Guidance. http://www.eni.com/green-data-center/it_IT/static/pdf/ASHRAE_1.pdf.Google Scholar
- V. Aslot and R. Eigenmann. Quantitative performance analysis of the SPEC OMPM2001 benchmarks. Scientific Programming, 11(2):105--124, 2003. Google ScholarDigital Library
- D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, S. Fineberg, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. NAS parallel benchmarks. Technical Report RNR-94-007, NASA Ames Research Center, March 1994.Google Scholar
- M. Ghosh and H. S. Lee. Smart refresh: An enhanced memory controller design for reducing energy in conventional and 3D die-stacked DRAMs. In Proceedings of the 40th Intl. Symp. on Microarchitecture, 2007. Google ScholarDigital Library
- U. Kang, H. Chung, S. Heo, D. Park, H. Lee, J. H. Kim, S. Ahn, S. Cha, J. Ahn, D. Kwon, J. Lee, H. Joo, W. Kim, D. H. Jang, N. Kim, J.-H. Choi, T. Chung, J. Yoo, J. Choi, C. Kim, and Y. Jun. 8 Gb 3-D DDR3 DRAM using through-silicon-via technology for quasi-non-volatile DRAM. In IEEE Journal of Solid State Circuits, 2010.Google ScholarCross Ref
- C. A. Kilmer, K. H. Kim, W. E. Maule, and V. Patel. Memory system with a programmable refresh cycle. United States Patent Application #0151131 A1, 2012.Google Scholar
- J. Liu, B. Jaiyen, R. Veras, and O. Mutlu. RAIDR: Retention-aware intelligent dram refresh. In ISCA, 2012. Google ScholarDigital Library
- S. Liu, K. Pattabiraman, T. Moscibroda, and B. G. Zorn. Flikker: Saving DRAM refresh-power through critical data partitioning. In ASPLOS, 2011. Google ScholarDigital Library
- J. Pisharath, Y. Liu, W. Liao, A. Choudhary, G. Memik, and J. Parhi. NU-MineBench 2.0. Technical Report CUCIS-2005-08-01, Northwestern University, August 2005.Google Scholar
- J. Renau, B. Fraguela, J. Tuck, W. Liu, M. Prvulovic, L. Ceze, S. Sarangi, P. Sack, K. Strauss, and P. Montesinos. SESC simulator. http://sesc.sourceforge.net, January 2005.Google Scholar
- S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens. Memory access scheduling. In ISCA, 2000. Google ScholarDigital Library
- P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A cycle accurate memory system simulator. IEEE Computer Architecture Letters, 10(1):16--19, Jan. 2011. Google ScholarDigital Library
- B. Sinharoy, R. Kalla, W. J. Starke, H. Q. Le, R. Cargnoni, J. A. Van Norstrand, B. J. Ronchetti, J. Stuecheli, J. Leenstra, G. L. Guthrie, D. Q. Nguyen, B. Blaner, C. F. Marino, E. Retter, and P. Williams. IBM POWER7 multicore server processor. IBM Journal of Research and Technology, 55(3):1--29, 2011. Google ScholarDigital Library
- K. Sohn, T. Na, I. Song, Y. Shim, W. Bae, S. Kang, D. Lee, H. Jung, S. Hyun, H. Jeoung, K. W. Lee, J. Park, J. Lee, B. Lee, I. Jun, J. Park, J. Park, H. Choi, S. Kim, H. Chung, Y. Choi, D. Jung, B. Kim, J. Choi, S. Jang, C. Kim, J. Lee, and J. Choi. A 1.2v 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme. In ISSCC, 2012.Google ScholarCross Ref
- S. P. Song. Method and system for selective DRAM refresh to reduce power consumption. United States Patent #6094705, 2000.Google Scholar
- J. Stuecheli, D. Kaseridis, H. C. Hunter, and L. K. John. Elastic refresh: Techniques to mitigate refresh penalties in high density memory. In MICRO, 2010. Google ScholarDigital Library
- R. K. Venkatesan, S. Herr, and E. Rotenberg. Retention-aware placement in DRAM (RAPID): software methods for quasi-non-volatile DRAM. In HPCA, 2006.Google ScholarCross Ref
- S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. The SPLASH-2 programs: Characterization and methodological considerations. In ISCA, 1995. Google ScholarDigital Library
- J. Worrel. Intel to introduce DDR4 memory with Haswell-EX server platform. In http://fudzilla.com, Apr. 2012.Google Scholar
- J. Yoon and G. Tressler. Advanced flash technology status, scaling trends and implications to enterprise SSD technology enablement. In Flash Memory Summit, 2012.Google Scholar
Index Terms
- Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems
Recommendations
Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer ArchitectureRecent DRAM specifications exhibit increasing refresh latencies. A refresh command blocks a full rank, decreasing available parallelism in the memory subsystem significantly, thus decreasing performance. Fine Granularity Refresh (FGR) is a feature ...
Per-bank refresh with adaptive early termination for high density DRAM
ICCIP '18: Proceedings of the 4th International Conference on Communication and Information ProcessingDRAM, which is mainly used as main memory, requires a refresh operation to maintain the integrity of stored data. Since memory read and write operations to a bank are not allowed while the bank is being refreshed, a lot of memory accesses may be blocked ...
Hardware-Software Co-design to Mitigate DRAM Refresh Overheads: A Case for Refresh-Aware Process Scheduling
ASPLOS '17DRAM cells need periodic refresh to maintain data integrity. With high capacity DRAMs, DRAM refresh poses a significant performance bottleneck as the number of rows to be refreshed (and hence the refresh cycle time, tRFC) with each refresh command ...
Comments