skip to main content
research-article

Reducing memory access latency with asymmetric DRAM bank organizations

Published:23 June 2013Publication History
Skip Abstract Section

Abstract

DRAM has been a de facto standard for main memory, and advances in process technology have led to a rapid increase in its capacity and bandwidth. In contrast, its random access latency has remained relatively stagnant, as it is still around 100 CPU clock cycles. Modern computer systems rely on caches or other latency tolerance techniques to lower the average access latency. However, not all applications have ample parallelism or locality that would help hide or reduce the latency. Moreover, applications' demands for memory space continue to grow, while the capacity gap between last-level caches and main memory is unlikely to shrink. Consequently, reducing the main-memory latency is important for application performance. Unfortunately, previous proposals have not adequately addressed this problem, as they have focused only on improving the bandwidth and capacity or reduced the latency at the cost of significant area overhead.

We propose asymmetric DRAM bank organizations to reduce the average main-memory access latency. We first analyze the access and cycle times of a modern DRAM device to identify key delay components for latency reduction. Then we reorganize a subset of DRAM banks to reduce their access and cycle times by half with low area overhead. By synergistically combining these reorganized DRAM banks with support for non-uniform bank accesses, we introduce a novel DRAM bank organization with center high-aspect-ratio mats called CHARM. Experiments on a simulated chip-multiprocessor system show that CHARM improves both the instructions per cycle and system-wide energy-delay product up to 21% and 32%, respectively, with only a 3% increase in die area.

References

  1. "The SAP HANA Database," http://www.sap.com.Google ScholarGoogle Scholar
  2. "Virtual Channel DRAM. Elpida Memory, Inc." http://www.elpida.com/en/products/eol/vcdram.html.Google ScholarGoogle Scholar
  3. J. Ahn, "ccTSA: A Coverage-Centric Threaded Sequence Assembler," PLoS ONE, vol. 7, no. 6, 2012.Google ScholarGoogle Scholar
  4. J. Ahn et al., "Improving System Energy Efficiency with Memory Rank Subsetting," ACM TACO, vol. 9, no. 1, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Ahn et al., "McSimA+: A Manycore Simulator with Application-level+ Simulation and Detailed Microarchitecture Modeling," in ISPASS, Apr 2013.Google ScholarGoogle Scholar
  6. R. Alverson et al., "The Tera Computer System," in ICS, Jun 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. L. Anand et al., "Embedded DRAM in 45-nm Technology and Beyond," Design Test of Computers, IEEE, vol. 28, no. 1, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S.-J. Bae et al., "A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable Clock-tracking BW," in ISSCC, Feb 2011.Google ScholarGoogle Scholar
  9. A. Bhattacharjee and M. Martonosi, "Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors," in ISCA, Jun 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Bienia et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," in PACT, Oct 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Cooper-Balis and B. Jacob, "Fine-Grained Activation for Power Reduction in DRAM," IEEE Micro, vol. 30, no. 3, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Ganesh et al., "Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling," in HPCA, Feb 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. N. Glaskowsky, "MoSys Explains 1T-SRAM Technology," Microprocessor Report, Sep. 1999.Google ScholarGoogle Scholar
  14. M. Hashimoto et al., "An Embedded DRAM Module using a Dual Sense Amplifier Architecture in a Logic Process," in ISSCC, Feb 1997.Google ScholarGoogle Scholar
  15. J. L. Henning, "SPEC CPU2006 Memory Footprint," Computer Architecture News, vol. 35, no. 1, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Jacob et al., Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann Publishers Inc., 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. James, "Recent Innovations in DRAM Manufacturing," in Advanced Semiconductor Manufacturing Conference, Jul 2010.Google ScholarGoogle Scholar
  18. U. J. Kapasi et al., "Programmable Stream Processors," IEEE Computer, vol. 36, no. 8, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Kaseridis et al., "Minimalist Open-page: a DRAM Page-mode Scheduling Policy for the Many-core Era," in MICRO, Dec 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Keeth et al., DRAM Circuit Design, 2nd ed. IEEE, 2008.Google ScholarGoogle Scholar
  21. C. Kim et al., "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," in ASPLOS, Oct 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J.-S. Kim et al., "A 1.2V 12.8GB/s 2Gb mobile Wide-I/O DRAM with 4x 128 I/Os using TSV-based stacking," in ISSCC, Feb 2011.Google ScholarGoogle Scholar
  23. Y. Kim et al., "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," in ISCA, Jun 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Kozyrakis, "Scalable Vector Media-processors for Embedded Systems," Ph.D. dissertation, University of California at Berkeley, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Lee et al., "LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies," IEEE TC, vol. 50, no. 12, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Lee et al., "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," in HPCA, Feb 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Li et al., "The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing," ACM TACO, vol. 10, no. 1, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Lindholm et al., "NVIDIA Tesla: A Unified Graphics and Computing Architecture," IEEE Micro, vol. 28, no. 2, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. G. H. Loh, "A Register-file Approach for Row Buffer Caches in Die-stacked DRAMs," in MICRO, Dec 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. H. Loh and M. D. Hill, "Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches," in MICRO, Dec 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Madan et al., "Optimizing Communication and Capacity in a 3D Stacked Reconfigurable Cache Hierarchy," in HPCA, Feb 2009.Google ScholarGoogle Scholar
  32. J. D. McCalpin, "STREAM: Sustainable Memory Bandwidth in High Performance Computers," University of Virginia, Tech. Rep., 1991.Google ScholarGoogle Scholar
  33. Micron Technology Inc., LPDDR2 SDRAM Datasheet, 2010.Google ScholarGoogle Scholar
  34. Micron Technology Inc., RLDRAM3 Datasheet, 2011.Google ScholarGoogle Scholar
  35. O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," in ISCA, Jun 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Patterson et al., "A Case for Intelligent RAM," Micro, IEEE, vol. 17, no. 2, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. A. Patterson and J. L. Hennessy, Computer Architecture: A Quantitative Approach, 5th ed. Morgan Kaufmann Publishers Inc., 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. T. Pawlowski, "Hybrid Memory Cube," in Hot Chips, Aug 2011.Google ScholarGoogle Scholar
  39. L. E. Ramos et al., "Page Placement in Hybrid Memory Systems," in ICS, Jun 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Rixner et al., "Memory Access Scheduling," in ISCA, Jun 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Samsung Electronics, DDR3 SDRAM Datasheet, 2012.Google ScholarGoogle Scholar
  42. Y. Sato et al., "Fast Cycle RAM (FCRAM); a 20-ns Random Row Access, Pipelined Operating DRAM," in VLSI, Jun 1998.Google ScholarGoogle Scholar
  43. T. Sherwood et al., "Automatically Characterizing Large Scale Program Behavior," in ASPLOS, Oct 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Snavely and D. Tullsen, "Symbiotic Job Scheduling for a Simultaneous Mutlithreading Processor," in ASPLOS, Nov 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Sudan et al., "Micro-pages: Increasing DRAM Efficiency with Locality-aware Data Placement," in ASPLOS, Oct 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. N. Udipi et al., "Combining Memory and a Controller with Photonics through 3D-stacking to Enable Scalable and Energy-efficient Systems," in ISCA, Jun 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. A. N. Udipi et al., "Rethinking DRAM Design and Organization for Energy-constrained Multi-cores," in ISCA, Jun 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. B. Verghese et al., "Operating System Support for Improving Data Locality on cc-NUMA Compute Servers," in ASPLOS, Oct 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. T. Vogelsang, "Understanding the Energy Consumption of Dynamic Random Access Memories," in MICRO, Dec 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. C. Woo et al., "The SPLASH-2 Programs: Characterization and Methodological Considerations," in ISCA, Jun 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. W. A. Wulf and S. A. McKee, "Hitting the Memory Wall: Implications of the Obvious," Computer Architecture News, vol. 23, no. 1, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Y. Yanagawa et al., "In-substrate-bitline Sense Amplifier with Array-noise-gating Scheme for Low-noise 4F2 DRAM Array Operable at 10-fF Cell Capacitance," in VLSI, Jun 2011.Google ScholarGoogle Scholar
  53. D. H. Yoon et al., "BOOM: Enabling Mobile Memory Based Low-Power Server DIMMs," in ISCA, Jun 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. D. H. Yoon and M. Erez, "Virtualized ECC: Flexible Reliability in Main Memory," IEEE Micro, vol. 31, no. 1, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. D. H. Yoon et al., "Adaptive Granularity Memory Systems: a Tradeoff Between Storage Efficiency and Throughput," in ISCA, Jun 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Z. Zhang et al., "Cached DRAM for ILP Processor Memory Access Latency Reduction," IEEE Micro, vol. 21, no. 4, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. W. Zhao and Y. Cao, "New Generation of Predictive Technology Model for Sub-45nm Design Exploration," in ISQED, Mar 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. H. Zheng et al., "Mini-Rank: Adaptive DRAM Architecture for Improving Memory Power Efficiency," in MICRO, Nov 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Reducing memory access latency with asymmetric DRAM bank organizations

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGARCH Computer Architecture News
      ACM SIGARCH Computer Architecture News  Volume 41, Issue 3
      ICSA '13
      June 2013
      666 pages
      ISSN:0163-5964
      DOI:10.1145/2508148
      Issue’s Table of Contents
      • cover image ACM Other conferences
        ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
        June 2013
        686 pages
        ISBN:9781450320795
        DOI:10.1145/2485922

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 23 June 2013

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader