Abstract
DRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU clock cycles. Modern computer systems rely on caches or other latency tolerance techniques to lower the average access latency. However, not all applications have ample parallelism or locality that would help hide or reduce the latency. Moreover, applications' demands for memory space continue to grow, while the capacity gap between last-level caches and main memory is unlikely to shrink. Consequently, reducing the main-memory latency is important for application performance. Unfortunately, previous proposals have not adequately addressed this problem, as they have focused only on improving the bandwidth and capacity or reduced the latency at the cost of significant area overhead.
We propose asymmetric DRAM bank organizations to reduce the average main-memory access latency. We first analyze the access and cycle times of a modern DRAM device to identify key delay components for latency reduction. Then we reorganize a subset of DRAM banks to reduce their access and cycle times by half with low area overhead. By synergistically combining these reorganized DRAM banks with support for non-uniform bank accesses, we introduce a novel DRAM bank organization with center high-aspect-ratio mats called CHARM. Experiments on a simulated chip-multiprocessor system show that CHARM improves both the instructions per cycle and system-wide energy-delay product up to 21% and 32%, respectively, with only a 3% increase in die area.
- "The SAP HANA Database," http://www.sap.com.Google Scholar
- "Virtual Channel DRAM. Elpida Memory, Inc." http://www.elpida.com/en/products/eol/vcdram.html.Google Scholar
- J. Ahn, "ccTSA: A Coverage-Centric Threaded Sequence Assembler," PLoS ONE, vol. 7, no. 6, 2012.Google Scholar
- J. Ahn et al., "Improving System Energy Efficiency with Memory Rank Subsetting," ACM TACO, vol. 9, no. 1, 2012. Google ScholarDigital Library
- J. Ahn et al., "McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling," in ISPASS, Apr 2013.Google Scholar
- R. Alverson et al., "The Tera Computer System," in ICS, Jun 1990. Google ScholarDigital Library
- D. L. Anand et al., "Embedded DRAM in 45-nm Technology and Beyond," Design Test of Computers, IEEE, vol. 28, no. 1, 2011. Google ScholarDigital Library
- S.-J. Bae et al., "A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable Clock-tracking BW," in ISSCC, Feb 2011.Google Scholar
- A. Bhattacharjee and M. Martonosi, "Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors," in ISCA, Jun 2009. Google ScholarDigital Library
- C. Bienia et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in PACT, Oct 2008. Google ScholarDigital Library
- E. Cooper-Balis and B. Jacob, "Fine-Grained Activation for Power Reduction in DRAM," IEEE Micro, vol. 30, no. 3, 2010. Google ScholarDigital Library
- B. Ganesh et al., "Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling," in HPCA, Feb 2007. Google ScholarDigital Library
- P. N. Glaskowsky, "MoSys Explains 1T-SRAM Technology," Microprocessor Report, Sep. 1999.Google Scholar
- M. Hashimoto et al., "An Embedded DRAM Module using a Dual Sense Amplifier Architecture in a Logic Process," in ISSCC, Feb 1997.Google Scholar
- J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, 2007. Google ScholarDigital Library
- B. Jacob et al., Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., 2007. Google ScholarDigital Library
- D. James, "Recent Innovations in DRAM Manufacturing," in Advanced Semiconductor Manufacturing Conference, Jul 2010.Google Scholar
- U. J. Kapasi et al., "Programmable Stream Processors," IEEE Computer, vol. 36, no. 8, 2003. Google ScholarDigital Library
- D. Kaseridis et al., "Minimalist Open-page: a DRAM Page-mode Scheduling Policy for the Many-core Era," in MICRO, Dec 2011. Google ScholarDigital Library
- B. Keeth et al., DRAM Circuit Design, 2nd ed. IEEE, 2008.Google Scholar
- C. Kim et al., "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," in ASPLOS, Oct 2002. Google ScholarDigital Library
- J.-S. Kim et al., "A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4x 128 I/Os using TSV-based stacking," in ISSCC, Feb 2011.Google Scholar
- Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," in ISCA, Jun 2012. Google ScholarDigital Library
- C. Kozyrakis, "Scalable Vector Media-processors for Embedded Systems," Ph.D. dissertation, University of California at Berkeley, 2002. Google ScholarDigital Library
- D. Lee et al., "LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies," IEEE TC, vol. 50, no. 12, 2001. Google ScholarDigital Library
- D. Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, Feb 2013. Google ScholarDigital Library
- S. Li et al., "The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing," ACM TACO, vol. 10, no. 1, 2013. Google ScholarDigital Library
- E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, 2008. Google ScholarDigital Library
- G. H. Loh, "A Register-file Approach for Row Buffer Caches in Die-stacked DRAMs," in MICRO, Dec 2011. Google ScholarDigital Library
- G. H. Loh and M. D. Hill, "Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches," in MICRO, Dec 2011. Google ScholarDigital Library
- N. Madan et al., "Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy," in HPCA, Feb 2009.Google Scholar
- J. D. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers," University of Virginia, Tech. Rep., 1991.Google Scholar
- Micron Technology Inc., LPDDR2 SDRAM Datasheet, 2010.Google Scholar
- Micron Technology Inc., RLDRAM3 Datasheet, 2011.Google Scholar
- O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," in ISCA, Jun 2008. Google ScholarDigital Library
- D. Patterson et al., "A Case for Intelligent RAM," Micro, IEEE, vol. 17, no. 2, 1997. Google ScholarDigital Library
- D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, 5th ed. Morgan Kaufmann Publishers Inc., 2012. Google ScholarDigital Library
- J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.Google Scholar
- L. E. Ramos et al., "Page Placement in Hybrid Memory Systems," in ICS, Jun 2011. Google ScholarDigital Library
- S. Rixner et al., "Memory Access Scheduling," in ISCA, Jun 2000. Google ScholarDigital Library
- Samsung Electronics, DDR3 SDRAM Datasheet, 2012.Google Scholar
- Y. Sato et al., "Fast Cycle RAM (FCRAM); a 20-ns Random Row Access, Pipelined Operating DRAM," in VLSI, Jun 1998.Google Scholar
- T. Sherwood et al., "Automatically Characterizing Large Scale Program Behavior," in ASPLOS, Oct 2002. Google ScholarDigital Library
- A. Snavely and D. Tullsen, "Symbiotic Job Scheduling for a Simultaneous Mutlithreading Processor," in ASPLOS, Nov 2000. Google ScholarDigital Library
- K. Sudan et al., "Micro-pages: Increasing DRAM Efficiency with Locality-aware Data Placement," in ASPLOS, Oct 2010. Google ScholarDigital Library
- A. N. Udipi et al., "Combining Memory and a Controller with Photonics through 3D-stacking to Enable Scalable and Energy-efficient Systems," in ISCA, Jun 2011. Google ScholarDigital Library
- A. N. Udipi et al., "Rethinking DRAM Design and Organization for Energy-constrained Multi-cores," in ISCA, Jun 2010. Google ScholarDigital Library
- B. Verghese et al., "Operating System Support for Improving Data Locality on cc-NUMA Compute Servers," in ASPLOS, Oct 1996. Google ScholarDigital Library
- T. Vogelsang, "Understanding the Energy Consumption of Dynamic Random Access Memories," in MICRO, Dec 2010. Google ScholarDigital Library
- S. C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," in ISCA, Jun 1995. Google ScholarDigital Library
- W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, vol. 23, no. 1, 1995. Google ScholarDigital Library
- Y. Yanagawa et al., "In-substrate-bitline Sense Amplifier with Array-noise-gating Scheme for Low-noise 4F2 DRAM Array Operable at 10-fF Cell Capacitance," in VLSI, Jun 2011.Google Scholar
- D. H. Yoon et al., "BOOM: Enabling Mobile Memory Based Low-Power Server DIMMs," in ISCA, Jun 2012. Google ScholarDigital Library
- D. H. Yoon and M. Erez, "Virtualized ECC: Flexible Reliability in Main Memory," IEEE Micro, vol. 31, no. 1, 2011. Google ScholarDigital Library
- D. H. Yoon et al., "Adaptive Granularity Memory Systems: a Tradeoff Between Storage Efficiency and Throughput," in ISCA, Jun 2011. Google ScholarDigital Library
- Z. Zhang et al., "Cached DRAM for ILP Processor Memory Access Latency Reduction," IEEE Micro, vol. 21, no. 4, 2001. Google ScholarDigital Library
- W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45nm Design Exploration," in ISQED, Mar 2006. Google ScholarDigital Library
- H. Zheng et al., "Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency," in MICRO, Nov 2008. Google ScholarDigital Library
Index Terms
- Reducing memory access latency with asymmetric DRAM bank organizations
Recommendations
Reducing memory access latency with asymmetric DRAM bank organizations
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer ArchitectureDRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation ConferenceHybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms
Performance evaluation reviewVariation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of ...
Comments