skip to main content
10.1145/1669112.1669141acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Variation-tolerant non-uniform 3D cache management in die stacked multicore processor

Published: 12 December 2009 Publication History

Abstract

Process variations in integrated circuits have significant impact on their performance, leakage and stability. This is particularly evident in large, regular and dense structures such as DRAMs. DRAMs are built using minimized transistors with presumably uniform speed in an organized array structure. Process variation can introduce latency disparity among different memory arrays. With the proliferation of 3D stacking technology, DRAMs become a favorable choice for stacking on top of a multicore processor as a last level cache for large capacity, high bandwidth, and low power. Hence, variations in bank speed creates a unique problem of non-uniform cache accesses in 3D space.
In this paper, we investigate cache management techniques for tolerating process variation in a 3D DRAM stacked onto a multicore processor. We modeled the process variation in a 4-layer DRAM memory to characterize the latency variations among different banks. As a result, the notion of fast and slow banks from the core's standpoint is no longer associated with their physical distances with the banks. They are determined by the different bank latencies due to process variation. We develop cache migration schemes that utilizes fast banks while limiting the cost due to migration. Our experiments show that there is a great performance benefit in exploiting fast memory banks through migration. On average, a variation-aware management can improve the performance of a workload over the baseline (where one of the slowest bank speed is assumed for all banks) by 17.8%. We are also only 0.45% away in performance from an ideal memory where no process variation is present.

References

[1]
A. Agarwal, B. C. Paul, H. Mahmoodi, A. Datta, K. Roy, "A Process-Tolerant Cache Architecture for Improved Yield in Nanoscale Technologies," IEEE Transactions on Very Large Scale Integrated Systems, vol. 13, pp. 27--38, 2005.
[2]
A. Agarwal, B. C. Paul, S. Mukhopadhyay, K. Roy, "Process Variation in Embedded Memories: Failure Analysis and Variation Aware Architecture," IEEE Journal of Solid-State Circuits, 40(9), pp. 1804--1814, 2005.
[3]
M. Agasthi, V. Venkatesan, R. Balasubramonian, "Understanding the Impact of 3D Stacked Layouts on ILP," Journal of Instruction-Level Parallelism, Vol. 9, pp. 1--27, 2007.
[4]
B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang, G. Loh, D. McCaule, P. Morrow, D. Nelson, D. Pantuso, P. Reed, J. Rupley, S. Shankar, J. Shen, C. Webb, "Die Stacking (3D) Microarchitecture," International Symposium on Microarchitecture, pp. 469--479, 2006
[5]
K. A. Bowman, S. G. Duvall, J. D. Meindl, "Impact of Die-to-Die And Within-die Parameter Fluctuations on the Maximum Clock Frequency Distribution for Gigascale Integration," IEEE Jounal of Solid-State Circuits, Vol. 37, No. 2, pp. 183--190, 2002.
[6]
Y. Cao, L. T. Clark, "Mapping Statistical Process Variations Toward Circuit Performance Variability: An Analytical Modeling Approach," Design Automation Conference, pp. 658--663, 2005.
[7]
Z. Chishti, M. Powell, T. N. Vijaykumar, "Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures," International Symposium on Microarchitecture, 2003
[8]
S. Cho, L. Jin, "Managing Distributed, Shared L2 Caches Through OS-Level Page Allocation," International Symposium on Microarchitecture, pp. 455--465, 2006.
[9]
E. Chun, Z. Chishti, T. N. Vijaykumar, "Shapeshifter: Dynamically Changing Pipeline Width and Speed to Address Process Variations," International Symposium on Microarchitecture, pp. 411--422, 2008.
[10]
N. Cressie, "Statistics for Spatial Data", Wiley, 1993.
[11]
A. Das, B. Ozisikyilmaz, S. Zademir, G. Memik, J. Zambreno, A. Choudhary, "Evaluating the Effects of Cache Redundancy on Profit," International Symposium on Microarchitecture, pp. 388--398, 2008.
[12]
P. Friedberg, Y. Cao, J. Cain, R. Wang, J. Rabaey, C. Spanos, "Modeling Within-Die Spatial Correlation Effects for Process-Design Co-Optimization," International Symposium on Quality Electronic Design, 2005.
[13]
X. Fu, T. Li, J. Fortes, "NBTI Tolerant Microarchitecture Design in the Presence of Process Variation," International Symposium on Microarchitecture, pp. 398--410, 2008.
[14]
X. Fu, T. Li, J. Fortes, "Soft Error Vulnerability Aware Process Variation Mitigation," High-Performance Computer Architecture, pp. 2009.
[15]
S. Hebert, D. Marculescu, "Variation-Aware Dynamic Voltage/Frequency Scaling," High-Performance Computer Architecture, pp. 2009.
[16]
C. Kim, D. Burger, S. W. Keckler, "An Adaptive, Non-uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 211--222, 2002.
[17]
T. Kgil, S. D'Souza, A. Saidi, N. Binkert, R. Dreslinski, T. Mudge, S. Reinhardt, K. Flautner, "PicoServer: Using 3D Stacking Technology to Enable a Compact Energy Efficient Chip Multiprocessor," International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 117--128, 2006.
[18]
J. P. Kulkarni, K. Kim, S. P. Park, K. Roy, "Process Variation Tolerant SRAM Array for Ultra Low Voltage Applications," Design Automation Conference, pp. 108--113, 2008.
[19]
F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, M. Kandemir, "Design and Management of 3D Chip Multiprocessors Using Network-in-Memory," International Symposium on Computer Architecture, pp. 130--141, 2006.
[20]
X. Liang and D. Brooks, "Mitigating the Impact of Process Variations on CPU Register File and Execution Units," International Symposium on Microarchitecture, pp. 504--514, 2006.
[21]
X. Liang, R. Canal, G.-Y. Wei, D. Brooks, "Process Variation Tolerant 3T1D-Based Cache Architectures," International Symposium on Microarchitecture, pp. 15--26, 2007.
[22]
X. Liang, G.-Y. Wei, D. Brooks, "ReVIVaL: A Variation-Tolerant Architecture Using Voltage Interpolation and Variable Latency," International Symposium on Computer Architecture, pp. 191--202, 2008.
[23]
C. C. Liu, I. Ganusov, M. Burtscher, S. Tiwari, "Bridging the Processor-Memory Performance Gap with 3D IC Technology," IEEE Design and Test of Computers, 22(6), pp. 556--564, 2005.
[24]
G. Loh, "3D-Stacked Memory Architecture for Multi-Core Processors," International Symposium on Computer Architecture, pp. 453--464, 2008.
[25]
G. L. Loi, B. Agarwal, N. Srivastava, S. Lin, T. Sherwood, "A Thermally-Aware Performance Analysis of Vertically Integrated (3D) Processor-Memory Hierarchy," Design Automation Conference, pp. 991--996, 2006.
[26]
N. Madan, L. Zhao, N. Muralimanohar, A. Udipi, R. Balasubramonian, R. Iyer, S. Makineni, D. Newell, "Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy," International Symposium on High Performance Computer Architecture, pp. 262--274, 2009.
[27]
R. E. Maatick, S. E. Schuster, "Logic-based eDRAM: origins and rationale for use," IBM Journal of Research and Development, pp. 145--165, 2005.
[28]
W. Mueller, et al., "Challenges for the DRAM Cell Scaling to 40nm" IEEE International Electron Devices Meeting, 4 pages, Dec 2005
[29]
S. R. Nassif, "Modeling and Forecasting of Manufacturing Variations," Asia and South Pacific Design Automation Conference, pp. 145--149, 2001.
[30]
S. Ozdemir, D. Sinha, G. Memik, J. Adams, H. Zhou, "Yield-Aware Cache Architectures," International Symposium on Microarchitecture, pp. 15--25, 2006.
[31]
K. Puttaswamy, G. H. Loh, "Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D Integrated Processors," International Symposium on High Performance Computer Architecture, pp. 193--204, 2007.
[32]
P. Ribeiro Jr., P. Diggle, "geoR: A Package for Geostatistical Analysis," R-NEWS, vol. 1, no. 2, 2001.
[33]
S. Sarangi, B. Greskamp, A. Tiwari, J. Torrellas, "EVAL: Utilizing Processors with Variation-Induced Timing Errors," International Symposium on Microarchitecture, pp. 423--434, 2008.
[34]
S. Sarangi, B. Greskamp, R. Teodorescu, J. Nakano, A. Tiwari, J. Torrellas, "VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects," IEEE Transactions on Semiconductor Manufacturing, Vol. 21, No. 1, 2008.
[35]
J. Singh, J. Mathew, D. K. Pradhan, S. P. Mohanty, "Failure Analysis for Ultra Low Power Nano-CMOS SRAM Under Process Variations," International SOC Conference, pp. 251--254, 2008
[36]
A. Srivastava, D. Sylvester, and D. Blaauw, "Statistical Analysis and Optimization for VLSI: Timing and Power," New York Springer, 2005
[37]
G. Sun, X. Dong, Y. Xie, J. Li, Y. Chen, "A Novel Architecture of the 3D Stacked MRAM L2 Cache for CMPs," International Symposium on High Performance Computer Architecture, pp. 239--249, 2009.
[38]
R. Teodorescu, J. Torrellas, "Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors," International Symposium on Computer Architecture, pp. 363--374, 2008.
[39]
S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, N. P. Jouppi, "A Comprehensive Memory Modeling Tool and its Application to the Design and Analysis of Future Memory Hierarchies," International Symposium on Computer Architecture, pp. 51--62, 2008.
[40]
A. Tiwari, S. R. Sarangi, J. Torellas, "ReCycle: Pipeline Adaptation to Tolerate Process Variation," International Symposium on Computer Architecture, pp. 323--334, 2007.
[41]
X. Wu, Y. Xie, J. Li, L. Zhang, E. Speight, R. Rajamony, "Hybrid Cache Architecture with Disparate Memory Technologies," International Symposium on Computer Architecture, 2009.
[42]
S. Lee, C. Choi, J. Kong, W. Lee, J. Yoo, "An Efficient Statistical Analysis Methodology and Its Application to High-density DRAMs," International Conference on Computer-Aided Design, pp. 678--683, 1997
[43]
W. Zhao, Y. Cao, "New Generation of Predictive Technology Model for Sub-45nm Early Design Exploration," IEEE Transactions on Electron Devices, Vol. 53, No. 11, pp. 2816--2823, 2006.
[44]
Arizona State University, "Predictive Technology Model (PTM)," http://www.eas.asu.edu/~ptm/
[45]
UltraSPARC T2 Processor, http://www.sun.com/processors/UltraSPARC-T2/
[46]
Tezzaron Semiconductors, FaStack Memory, http://www.tezzaron.com/memory/FaStack_memory.html
[47]
Tezzaron Semiconductors, 3D Stacked DRAM, http://www.tezzaron.com/memory/Overview_3D_DRAM.htm
[48]
Tezzaron Semiconductors, Bi-STAR Technology, http://www.tezzaron.com/technology/Bi-STAR.htm
[49]
R Development Core Team, "R: A Language and Environment for Statistical Computing," R Foundation for Statistical Computing, http://www.R-project.org, 2006.
[50]
Virtutech Simics, http://www.virtutech.com
[51]
The PARSEC Benchmark Suite, http://parsec.cs.princeton.edu

Cited By

View all

Index Terms

  1. Variation-tolerant non-uniform 3D cache management in die stacked multicore processor

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
    December 2009
    601 pages
    ISBN:9781605587981
    DOI:10.1145/1669112
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 December 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. 3D die stacking
    2. DRAM
    3. NUCA
    4. process variation

    Qualifiers

    • Research-article

    Conference

    Micro-42
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Process variation aware DRAM-Cache resizingJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102364123:COnline publication date: 1-Feb-2022
    • (2020)Leakage-Aware Dynamic Thermal Management of 3D MemoriesACM Transactions on Design Automation of Electronic Systems10.1145/341946826:2(1-31)Online publication date: 23-Oct-2020
    • (2020)Reducing DRAM Access Latency via Helper Rows2020 57th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18072.2020.9218719(1-6)Online publication date: Jul-2020
    • (2017)DrMP: Mixed Precision-Aware DRAM for High Performance Approximate and Precise Computing2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)10.1109/PACT.2017.34(53-63)Online publication date: Sep-2017
    • (2016)A Survey of Architectural Techniques for Managing Process VariationACM Computing Surveys10.1145/287116748:4(1-29)Online publication date: 9-Feb-2016
    • (2016)A Survey Of Techniques for Architecting DRAM CachesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2015.246115527:6(1852-1863)Online publication date: 1-Jun-2016
    • (2016)Understanding and alleviating intra-die and intra-DIMM parameter variation in the memory system2016 IEEE 34th International Conference on Computer Design (ICCD)10.1109/ICCD.2016.7753283(217-224)Online publication date: Oct-2016
    • (2015)Achieving Yield, Density and Performance Effective DRAM at Extreme Technology SizesProceedings of the 2015 International Symposium on Memory Systems10.1145/2818950.2818963(78-84)Online publication date: 5-Oct-2015
    • (2015)An Energy-Efficient Last-Level Cache Architecture for Process Variation-Tolerant 3D MicroprocessorsIEEE Transactions on Computers10.1109/TC.2014.237829164:9(2460-2475)Online publication date: 1-Sep-2015
    • (2014)3D stacking of high-performance processors2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2014.6835959(500-511)Online publication date: Feb-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media