research-article

Reducing memory access latency with asymmetric DRAM bank organizations

Authors:
Young Hoon Son

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
O. Seongil

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
Yuhwan Ro

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

,
Jae W. Lee

Sungkyunkwan University, Suwon, Korea

Sungkyunkwan University, Suwon, Korea
View Profile

,
Jung Ho Ahn

Seoul National University, Seoul, Korea

Seoul National University, Seoul, Korea
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 41 Issue 3June 2013pp 380–391https://doi.org/10.1145/2508148.2485955

Published:23 June 2013Publication History

ACM SIGARCH Computer Architecture News

Abstract

DRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU clock cycles. Modern computer systems rely on caches or other latency tolerance techniques to lower the average access latency. However, not all applications have ample parallelism or locality that would help hide or reduce the latency. Moreover, applications' demands for memory space continue to grow, while the capacity gap between last-level caches and main memory is unlikely to shrink. Consequently, reducing the main-memory latency is important for application performance. Unfortunately, previous proposals have not adequately addressed this problem, as they have focused only on improving the bandwidth and capacity or reduced the latency at the cost of significant area overhead.

We propose asymmetric DRAM bank organizations to reduce the average main-memory access latency. We first analyze the access and cycle times of a modern DRAM device to identify key delay components for latency reduction. Then we reorganize a subset of DRAM banks to reduce their access and cycle times by half with low area overhead. By synergistically combining these reorganized DRAM banks with support for non-uniform bank accesses, we introduce a novel DRAM bank organization with center high-aspect-ratio mats called CHARM. Experiments on a simulated chip-multiprocessor system show that CHARM improves both the instructions per cycle and system-wide energy-delay product up to 21% and 32%, respectively, with only a 3% increase in die area.

References

"The SAP HANA Database," http://www.sap.com.Google Scholar
"Virtual Channel DRAM. Elpida Memory, Inc." http://www.elpida.com/en/products/eol/vcdram.html.Google Scholar
J. Ahn, "ccTSA: A Coverage-Centric Threaded Sequence Assembler," PLoS ONE, vol. 7, no. 6, 2012.Google Scholar
J. Ahn et al., "Improving System Energy Efficiency with Memory Rank Subsetting," ACM TACO, vol. 9, no. 1, 2012. Google ScholarDigital Library
J. Ahn et al., "McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling," in ISPASS, Apr 2013.Google Scholar
R. Alverson et al., "The Tera Computer System," in ICS, Jun 1990. Google ScholarDigital Library
D. L. Anand et al., "Embedded DRAM in 45-nm Technology and Beyond," Design Test of Computers, IEEE, vol. 28, no. 1, 2011. Google ScholarDigital Library
S.-J. Bae et al., "A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable Clock-tracking BW," in ISSCC, Feb 2011.Google Scholar
A. Bhattacharjee and M. Martonosi, "Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors," in ISCA, Jun 2009. Google ScholarDigital Library
C. Bienia et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in PACT, Oct 2008. Google ScholarDigital Library
E. Cooper-Balis and B. Jacob, "Fine-Grained Activation for Power Reduction in DRAM," IEEE Micro, vol. 30, no. 3, 2010. Google ScholarDigital Library
B. Ganesh et al., "Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling," in HPCA, Feb 2007. Google ScholarDigital Library
P. N. Glaskowsky, "MoSys Explains 1T-SRAM Technology," Microprocessor Report, Sep. 1999.Google Scholar
M. Hashimoto et al., "An Embedded DRAM Module using a Dual Sense Amplifier Architecture in a Logic Process," in ISSCC, Feb 1997.Google Scholar
J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, 2007. Google ScholarDigital Library
B. Jacob et al., Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., 2007. Google ScholarDigital Library
D. James, "Recent Innovations in DRAM Manufacturing," in Advanced Semiconductor Manufacturing Conference, Jul 2010.Google Scholar
U. J. Kapasi et al., "Programmable Stream Processors," IEEE Computer, vol. 36, no. 8, 2003. Google ScholarDigital Library
D. Kaseridis et al., "Minimalist Open-page: a DRAM Page-mode Scheduling Policy for the Many-core Era," in MICRO, Dec 2011. Google ScholarDigital Library
B. Keeth et al., DRAM Circuit Design, 2nd ed. IEEE, 2008.Google Scholar
C. Kim et al., "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," in ASPLOS, Oct 2002. Google ScholarDigital Library
J.-S. Kim et al., "A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4x 128 I/Os using TSV-based stacking," in ISSCC, Feb 2011.Google Scholar
Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," in ISCA, Jun 2012. Google ScholarDigital Library
C. Kozyrakis, "Scalable Vector Media-processors for Embedded Systems," Ph.D. dissertation, University of California at Berkeley, 2002. Google ScholarDigital Library
D. Lee et al., "LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies," IEEE TC, vol. 50, no. 12, 2001. Google ScholarDigital Library
D. Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, Feb 2013. Google ScholarDigital Library
S. Li et al., "The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing," ACM TACO, vol. 10, no. 1, 2013. Google ScholarDigital Library
E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, 2008. Google ScholarDigital Library
G. H. Loh, "A Register-file Approach for Row Buffer Caches in Die-stacked DRAMs," in MICRO, Dec 2011. Google ScholarDigital Library
G. H. Loh and M. D. Hill, "Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches," in MICRO, Dec 2011. Google ScholarDigital Library
N. Madan et al., "Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy," in HPCA, Feb 2009.Google Scholar
J. D. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers," University of Virginia, Tech. Rep., 1991.Google Scholar
Micron Technology Inc., LPDDR2 SDRAM Datasheet, 2010.Google Scholar
Micron Technology Inc., RLDRAM3 Datasheet, 2011.Google Scholar
O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," in ISCA, Jun 2008. Google ScholarDigital Library
D. Patterson et al., "A Case for Intelligent RAM," Micro, IEEE, vol. 17, no. 2, 1997. Google ScholarDigital Library
D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, 5th ed. Morgan Kaufmann Publishers Inc., 2012. Google ScholarDigital Library
J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.Google Scholar
L. E. Ramos et al., "Page Placement in Hybrid Memory Systems," in ICS, Jun 2011. Google ScholarDigital Library
S. Rixner et al., "Memory Access Scheduling," in ISCA, Jun 2000. Google ScholarDigital Library
Samsung Electronics, DDR3 SDRAM Datasheet, 2012.Google Scholar
Y. Sato et al., "Fast Cycle RAM (FCRAM); a 20-ns Random Row Access, Pipelined Operating DRAM," in VLSI, Jun 1998.Google Scholar
T. Sherwood et al., "Automatically Characterizing Large Scale Program Behavior," in ASPLOS, Oct 2002. Google ScholarDigital Library
A. Snavely and D. Tullsen, "Symbiotic Job Scheduling for a Simultaneous Mutlithreading Processor," in ASPLOS, Nov 2000. Google ScholarDigital Library
K. Sudan et al., "Micro-pages: Increasing DRAM Efficiency with Locality-aware Data Placement," in ASPLOS, Oct 2010. Google ScholarDigital Library
A. N. Udipi et al., "Combining Memory and a Controller with Photonics through 3D-stacking to Enable Scalable and Energy-efficient Systems," in ISCA, Jun 2011. Google ScholarDigital Library
A. N. Udipi et al., "Rethinking DRAM Design and Organization for Energy-constrained Multi-cores," in ISCA, Jun 2010. Google ScholarDigital Library
B. Verghese et al., "Operating System Support for Improving Data Locality on cc-NUMA Compute Servers," in ASPLOS, Oct 1996. Google ScholarDigital Library
T. Vogelsang, "Understanding the Energy Consumption of Dynamic Random Access Memories," in MICRO, Dec 2010. Google ScholarDigital Library
S. C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," in ISCA, Jun 1995. Google ScholarDigital Library
W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, vol. 23, no. 1, 1995. Google ScholarDigital Library
Y. Yanagawa et al., "In-substrate-bitline Sense Amplifier with Array-noise-gating Scheme for Low-noise 4F² DRAM Array Operable at 10-fF Cell Capacitance," in VLSI, Jun 2011.Google Scholar
D. H. Yoon et al., "BOOM: Enabling Mobile Memory Based Low-Power Server DIMMs," in ISCA, Jun 2012. Google ScholarDigital Library
D. H. Yoon and M. Erez, "Virtualized ECC: Flexible Reliability in Main Memory," IEEE Micro, vol. 31, no. 1, 2011. Google ScholarDigital Library
D. H. Yoon et al., "Adaptive Granularity Memory Systems: a Tradeoff Between Storage Efficiency and Throughput," in ISCA, Jun 2011. Google ScholarDigital Library
Z. Zhang et al., "Cached DRAM for ILP Processor Memory Access Latency Reduction," IEEE Micro, vol. 21, no. 4, 2001. Google ScholarDigital Library
W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45nm Design Exploration," in ISQED, Mar 2006. Google ScholarDigital Library
H. Zheng et al., "Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency," in MICRO, Nov 2008. Google ScholarDigital Library

Index Terms

Reducing memory access latency with asymmetric DRAM bank organizations
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Reducing memory access latency with asymmetric DRAM bank organizations
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

DRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU ...
Read More
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...
Read More
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms
Performance evaluation review

Variation has been shown to exist across the cells within a modern DRAM chip. Prior work has studied and exploited several forms of variation, such as manufacturing-process- or temperature-induced variation. We empirically demonstrate a new form of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
General Chair:
Avi Mendelson
Technion
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2013
Check for updates
Author Tags
DRAM
asymmetric bank organizations
high-aspect-ratio mats
microarchitecture
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 95
  Total Citations
  View Citations
- 1,943
  Total Downloads
- Downloads (Last 12 months)136
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Reducing memory access latency with asymmetric DRAM bank organizations

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

Reducing memory access latency with asymmetric DRAM bank organizations

Power management of hybrid DRAM/PRAM-based main memory

Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms