research-article

Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

Authors:

Peter PetrovAuthors Info & Claims

DAC '10: Proceedings of the 47th Design Automation Conference

Pages 132 - 137

https://doi.org/10.1145/1837274.1837309

Published: 13 June 2010 Publication History

Abstract

We present a methodology for off-chip memory bandwidth minimization through application-driven L2 cache partitioning in multi-core systems. A major challenge with multi-core system design is the widening gap between the memory demand generated by the processor cores and the limited off-chip memory bandwidth and memory service speed. This severely restricts the number of cores that can be integrated into a multi-core system and the parallelism that can be actually achieved and efficiently exploited for not only memory demanding applications, but also for workloads consisting of many tasks utilizing a large number of cores and thus exceeding the available off-chip bandwidth.

Last level shared cache partitioning has been shown to be a promising technique to enhance cache utilization and reduce missrates. While most cache partitioning techniques focus on cache miss rates, our work takes a different approach in which tasks' memory bandwidth requirements are taken into account when identifying a cache partitioning for multi-programmed and/or multi-threaded workloads. Cache resources are allocated with the objective that the overall system bandwidth requirement is minimized for the target workload. The key insight is that cache miss-rate information may severely misrepresent the actual bandwidth demand of the task, which ultimately determines the overall system performance and power consumption.

References

[1]

D. Burger, J. R. Goodman, and A. Kägi, "Memory bandwidth limitations of future microprocessors," in International Symposium on Computer Architecture (ISCA), 1996, pp. 78--89.

Digital Library

[2]

B. M. Rogers, A. Krishna, G. B. Bell, K. Vu, X. Jiang, and Y. Solihin, "Scaling the bandwidth wall: challenges in and avenues for cmp scaling," SIGARCH Computer Architecture News, vol. 37, no. 3, pp. 371--382, 2009.

Digital Library

[3]

L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni, "Communist, utilitarian, and capitalist cache policies on cmps: caches as a shared resource," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2006, pp. 13--22.

Digital Library

[4]

C. Liu, A. Sivasubramaniam, and M. Kandemir, "Organizing the last line of defense before hitting the memory wall for cmps," in International Symposium on High Performance Computer Architecture (HPCA), 2004, p. 176.

Digital Library

[5]

J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan, "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," in International Symposium On High Performance Computer Architecture (HPCA), 2008, pp. 367--378.

[6]

A. Gordon-Ross and F. Vahid, "A self-tuning configurable cache," in Design Automation Conference (DAC), 2007, pp. 234--237.

Digital Library

[7]

M. Qureshi and Y. Patt, "Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches," in International Symposium on Microarchitecture (MICRO), 2006, pp. 423--432.

Digital Library

[8]

S. Kim, D. Chandra, and Y. Solihin, "Fair cache sharing and partitioning in a chip multiprocessor architecture," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2004, pp. 111--122.

Digital Library

[9]

N. Rafique, W.-T. Lim, and M. Thottethodi, "Architectural support for operating system-driven cmp cache management," in International Cconference on Parallel architectures and Compilation Techniques (PACT), 2006, pp. 2--12.

Digital Library

[10]

K. J. Nesbit, J. Laudon, and J. E. Smith, "Virtual private caches," SIGARCH Computuer Architecture News, vol. 35, no. 2, pp. 57--68, 2007.

Digital Library

[11]

J. Chang and G. S. Sohi, "Cooperative cache partitioning for chip multiprocessors," in International Conference on Supercomputing (ICS), 2007, pp. 242--252.

Digital Library

[12]

L. S. David Tam, Reza Azimi and M. Stumm, "Managing shared 12 caches on multicore systems in software," in WIOSCA '07, 2007.

[13]

N. Binkert, R. Dreslinski, L. Hsu, K. Lim, A. Saidi, and S. Reinhardt, "The m5 simulator: Modeling networked systems," IEEE Micro, vol. 26, no. 4, pp. 52--60, 2006.

Digital Library

[14]

D. H. Albonesi, "Selective cache ways: On-demand cache resource allocation," in International Symposium on Microarchitecture (MICRO), November 1999, pp. 248--259.

Digital Library

[15]

S.-H. Yang, B. Falsafi, M. D. Powell, and T. N. Vijaykumar, "Exploiting choice in resizable cache design to optimize deep-submicron processor energy-delay," Symposium on High-Performance Computer Architecture (HPCA), vol. 00, p. 0151, 2002.

Digital Library

[16]

C. Zhang, F. Vahid, and W. Najjar, "A highly configurable cache architecture for embedded systems," in International Symposium on Computer Architecture (ISCA), 2003, pp. 136--146.

Digital Library

Cited By

Nair APatil GAgarwal APai ARaveendran BPunnekkat S(2023)CAMP: a hierarchical cache architecture for multi-core mixed criticality processorsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.229391339:3(317-352)Online publication date: 19-Dec-2023
https://doi.org/10.1080/17445760.2023.2293913
Zhang YChen JJiang XLiu QSteiner IHerdrich AShu KDas RCui LJiang L(2021)LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00073(815-826)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00073
Joshi PRathnamma MSrujan Raju KPawar U(2020)Miss Rate Estimation (MRE) an Novel Approach Toward L2 Cache Partitioning Algorithm’s for Multicore SystemIntelligent System Design10.1007/978-981-15-5400-1_58(593-603)Online publication date: 11-Aug-2020
https://doi.org/10.1007/978-981-15-5400-1_58
Show More Cited By

Index Terms

Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory

Recommendations

Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches

In this paper, we propose a novel on-chip L2 cache organization for chip multiprocessors (CMPs) with private L2 caches. The proposed approach, called reusability-aware cache sharing (RACS), combines the advantages of both a private L2 cache and a shared ...
PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

In multi-core systems, hardware prefetchers aggravate the preemption of some access-intensive programs for shared last level cache (LLC) resources, resulting in lower system performance. As a solution, we propose a prefetch-aware multi-core shared cache ...
An adaptive chip multiprocessor cache hierarchy

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '10: Proceedings of the 47th Design Automation Conference

June 2010

1036 pages

ISBN:9781450300025

DOI:10.1145/1837274

General Chair:
Sachin Sapatnekar
Univ. of Minnesota, Minneapolis, MN

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2010

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

DAC '10

Sponsor:

EDAC
SIGDA

DAC '10: The 47th Annual Design Automation Conference 2010

June 13 - 18, 2010

California, Anaheim

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

33
Total Citations
View Citations
485
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)3

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Nair APatil GAgarwal APai ARaveendran BPunnekkat S(2023)CAMP: a hierarchical cache architecture for multi-core mixed criticality processorsInternational Journal of Parallel, Emergent and Distributed Systems10.1080/17445760.2023.229391339:3(317-352)Online publication date: 19-Dec-2023
https://doi.org/10.1080/17445760.2023.2293913
Zhang YChen JJiang XLiu QSteiner IHerdrich AShu KDas RCui LJiang L(2021)LIBRA: Clearing the Cloud Through Dynamic Memory Bandwidth Management2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00073(815-826)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00073
Joshi PRathnamma MSrujan Raju KPawar U(2020)Miss Rate Estimation (MRE) an Novel Approach Toward L2 Cache Partitioning Algorithm’s for Multicore SystemIntelligent System Design10.1007/978-981-15-5400-1_58(593-603)Online publication date: 11-Aug-2020
https://doi.org/10.1007/978-981-15-5400-1_58
Garcia-Garcia ASaez JCastro FPrieto-Matias M(2019)LFOCProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337925(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337925
Huang KWang KZheng DZhang XYan X(2018)Access Adaptive and Thread-Aware Cache Partitioning in Multicore SystemsElectronics10.3390/electronics70901727:9(172)Online publication date: 1-Sep-2018
https://doi.org/10.3390/electronics7090172
Vasilios KGeorgios KNikolaos V(2018)Combining Software Cache Partitioning and Loop Tiling for Effective Shared Cache ManagementACM Transactions on Embedded Computing Systems10.1145/320266317:3(1-25)Online publication date: 22-May-2018
https://dl.acm.org/doi/10.1145/3202663
Jindal NPanda PSarangi S(2018)Reusing Trace Buffers as Victim CachesIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.282792826:9(1699-1712)Online publication date: Sep-2018
https://doi.org/10.1109/TVLSI.2018.2827928
Gade SMondal HDeb S(2018)High Bandwidth Off-Chip Memory Access Through Hybrid Switching and Inter-Chip Wireless Links2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2018.00028(100-105)Online publication date: Jul-2018
https://doi.org/10.1109/ISVLSI.2018.00028
Fiore UFlorea AGellert AVintan LZanetti P(2018)Optimal Partitioning of LLC in CAT-enabled CPUs to Prevent Side-Channel AttacksCyberspace Safety and Security10.1007/978-3-030-01689-0_9(115-123)Online publication date: 23-Sep-2018
https://doi.org/10.1007/978-3-030-01689-0_9
Jain RPanda PSubramoney S(2017)A coordinated multi-agent reinforcement learning approach to multi-level cache co-partitioningProceedings of the Conference on Design, Automation & Test in Europe10.5555/3130379.3130572(800-805)Online publication date: 27-Mar-2017
https://dl.acm.org/doi/10.5555/3130379.3130572
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten