skip to main content
10.1145/1837274.1837309acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

Published: 13 June 2010 Publication History

Abstract

We present a methodology for off-chip memory bandwidth minimization through application-driven L2 cache partitioning in multi-core systems. A major challenge with multi-core system design is the widening gap between the memory demand generated by the processor cores and the limited off-chip memory bandwidth and memory service speed. This severely restricts the number of cores that can be integrated into a multi-core system and the parallelism that can be actually achieved and efficiently exploited for not only memory demanding applications, but also for workloads consisting of many tasks utilizing a large number of cores and thus exceeding the available off-chip bandwidth.
Last level shared cache partitioning has been shown to be a promising technique to enhance cache utilization and reduce missrates. While most cache partitioning techniques focus on cache miss rates, our work takes a different approach in which tasks' memory bandwidth requirements are taken into account when identifying a cache partitioning for multi-programmed and/or multi-threaded workloads. Cache resources are allocated with the objective that the overall system bandwidth requirement is minimized for the target workload. The key insight is that cache miss-rate information may severely misrepresent the actual bandwidth demand of the task, which ultimately determines the overall system performance and power consumption.

References

[1]
D. Burger, J. R. Goodman, and A. Kägi, "Memory bandwidth limitations of future microprocessors," in International Symposium on Computer Architecture (ISCA), 1996, pp. 78--89.
[2]
B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin, "Scaling the bandwidth wall: challenges in and avenues for cmp scaling," SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 371--382, 2009.
[3]
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni, "Communist, utilitarian, and capitalist cache policies on cmps: caches as a shared resource," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2006, pp. 13--22.
[4]
C. Liu, A. Sivasubramaniam, and M. Kandemir, "Organizing the last line of defense before hitting the memory wall for cmps," in International Symposium on High Performance Computer Architecture (HPCA), 2004, p. 176.
[5]
J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan, "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in International Symposium On High Performance Computer Architecture (HPCA), 2008, pp. 367--378.
[6]
A. Gordon-Ross and F. Vahid, "A self-tuning configurable cache," in Design Automation Conference (DAC), 2007, pp. 234--237.
[7]
M. Qureshi and Y. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in International Symposium on Microarchitecture (MICRO), 2006, pp. 423--432.
[8]
S. Kim, D. Chandra, and Y. Solihin, "Fair cache sharing and partitioning in a chip multiprocessor architecture," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2004, pp. 111--122.
[9]
N. Rafique, W.-T. Lim, and M. Thottethodi, "Architectural support for operating system-driven cmp cache management," in International Cconference on Parallel architectures and Compilation Techniques (PACT), 2006, pp. 2--12.
[10]
K. J. Nesbit, J. Laudon, and J. E. Smith, "Virtual private caches," SIGARCH Computuer Architecture News, vol. 35, no. 2, pp. 57--68, 2007.
[11]
J. Chang and G. S. Sohi, "Cooperative cache partitioning for chip multiprocessors," in International Conference on Supercomputing (ICS), 2007, pp. 242--252.
[12]
L. S. David Tam, Reza Azimi and M. Stumm, "Managing shared 12 caches on multicore systems in software," in WIOSCA '07, 2007.
[13]
N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi, and S. Reinhardt, "The m5 simulator: Modeling networked systems," IEEE Micro, vol. 26, no. 4, pp. 52--60, 2006.
[14]
D. H. Albonesi, "Selective cache ways: On-demand cache resource allocation," in International Symposium on Microarchitecture (MICRO), November 1999, pp. 248--259.
[15]
S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar, "Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay," Symposium on High-Performance Computer Architecture (HPCA), vol. 00, p. 0151, 2002.
[16]
C. Zhang, F. Vahid, and W. Najjar, "A highly configurable cache architecture for embedded systems," in International Symposium on Computer Architecture (ISCA), 2003, pp. 136--146.

Cited By

View all
  • (2023)CAMP: a hierarchical cache architecture for multi-core mixed criticality processorsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.229391339:3(317-352)Online publication date: 19-Dec-2023
  • (2021)LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00073(815-826)Online publication date: Feb-2021
  • (2020)Miss Rate Estimation (MRE) an Novel Approach Toward L2 Cache Partitioning Algorithm’s for Multicore SystemIntelligent System Design10.1007/978-981-15-5400-1_58(593-603)Online publication date: 11-Aug-2020
  • Show More Cited By

Index Terms

  1. Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    DAC '10: Proceedings of the 47th Design Automation Conference
    June 2010
    1036 pages
    ISBN:9781450300025
    DOI:10.1145/1837274
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 June 2010

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. L2 cache partitioning
    2. off-chip bandwidth reduction

    Qualifiers

    • Research-article

    Conference

    DAC '10
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25
    62nd ACM/IEEE Design Automation Conference
    June 22 - 26, 2025
    San Francisco , CA , USA

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 08 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)CAMP: a hierarchical cache architecture for multi-core mixed criticality processorsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.229391339:3(317-352)Online publication date: 19-Dec-2023
    • (2021)LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00073(815-826)Online publication date: Feb-2021
    • (2020)Miss Rate Estimation (MRE) an Novel Approach Toward L2 Cache Partitioning Algorithm’s for Multicore SystemIntelligent System Design10.1007/978-981-15-5400-1_58(593-603)Online publication date: 11-Aug-2020
    • (2019)LFOCProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337925(1-10)Online publication date: 5-Aug-2019
    • (2018)Access Adaptive and Thread-Aware Cache Partitioning in Multicore SystemsElectronics10.3390/electronics70901727:9(172)Online publication date: 1-Sep-2018
    • (2018)Combining Software Cache Partitioning and Loop Tiling for Effective Shared Cache ManagementACM Transactions on Embedded Computing Systems10.1145/320266317:3(1-25)Online publication date: 22-May-2018
    • (2018)Reusing Trace Buffers as Victim CachesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.282792826:9(1699-1712)Online publication date: Sep-2018
    • (2018)High Bandwidth Off-Chip Memory Access Through Hybrid Switching and Inter-Chip Wireless Links2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00028(100-105)Online publication date: Jul-2018
    • (2018)Optimal Partitioning of LLC in CAT-enabled CPUs to Prevent Side-Channel AttacksCyberspace Safety and Security10.1007/978-3-030-01689-0_9(115-123)Online publication date: 23-Sep-2018
    • (2017)A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioningProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130572(800-805)Online publication date: 27-Mar-2017
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media