research-article

Cooperative cache partitioning for chip multiprocessors

Authors:
Jichuan Chang

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Gurindar S. Sohi

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

ACM International Conference on Supercomputing 25th Anniversary VolumeJune 2014Pages 402–412https://doi.org/10.1145/2591635.2667188

Published:17 June 2007Publication History

ACM International Conference on Supercomputing 25th Anniversary Volume

Pages 402–412

ABSTRACT

This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads concurrently running on CMPs. Unlike cache partitioning schemes that use a single spatial partition repeatedly throughout a stable program phase, CCP resolves cache contention with multiple time-sharing partitions. Timesharing cache resources among partitions allows each thrashing thread to speed up dramatically in at least one partition by unfairly shrinking other threads' capacity allocations, while improving fairness by giving different partitions equal chance to execute. Quality-of-Service (QoS) is guaranteed over the long term by orchestrating the shrink and expansion of each thread's capacity across partitions to bound the average slowdown. Time-sharing based cache partitioning is further integrated with CMP cooperative caching [6] to exploit the benefits of LRU-based latency optimizations, which leads to a simplified partitioning algorithm and better performance for workloads that do not benefit from cache partitioning. We evaluate the effectiveness of CCP by simulating a 4-core CMP running all combinations of 7 representative SPEC2000 benchmarks. For workloads that can benefit from cache partitioning, CCP achieves up to 60%, and on average 12%, better performance than the exhaustive search of optimal static partitions. Overall, CCP provides the best results on almost all evaluation metrics for different cache sizes.

References

L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27), 2000. Google ScholarDigital Library
B. M. Beckmann, M. R. Marty, and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proceedings of the 39th Annual International Symposium on Microarchitecture (MICRO-39), 2006. Google ScholarDigital Library
D. Bertsekas and R. Gallager. Data Networks (2nd ed.). Prentice-Hall, 1992. Google ScholarDigital Library
J. Bruno, E. Gabber, B. Özden, and A. Silberschatz. The Eclipse Operating System: Providing Quality of Service via Reservation Domains. In USENIX 1998 Annual Technical Conference, 1998. Google ScholarDigital Library
R. W. Carr and J. L. Hennessy. WSClock - a Simple and Effective Algorithm for Virtual Memory Management. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP-8), 1981. Google ScholarDigital Library
J. Chang and G. S. Sohi. Cooperative Caching for Chip Multiprocessors. In Proceedings of the 33th Annual International Symposium on Computer Architecture (ISCA-33), 2006. Google ScholarDigital Library
D. T. Chiou. Extending the Reach of Microprocessors: Column and Curious Caching. PhD thesis, MIT, 1999. Google ScholarDigital Library
Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing Replication, Communication and Capacity Allocation in CMPs. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarDigital Library
P. J. Denning. Thrashing: Its Causes and Prevention. In AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968. Google ScholarDigital Library
H. Dybdahl and P. Stenstrom. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture (HPCA-13), 2007. Google ScholarDigital Library
H. Dybdahl, P. Stenstrom, and L. Natvig. A Cache-Partition Aware Replacement Policy for Chip Multiprocessors. In ACM 2006 Conference on High Performance Computing (HiPC-13), 2006. Google ScholarDigital Library
R. Fagin and J. H. Williams. A fair carpool scheduling algorithm. IBM Journal of Research and Development, 27(2):133--139, 1983. Google ScholarDigital Library
A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance Of Multithreaded Chip Multiprocessors And Implications For Operating System Design. In USENIX 2005 Annual Technical Conference, 2005. Google ScholarDigital Library
L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2):71--84, 2000. Google ScholarDigital Library
J. L. Hellerstein. Achieving service rate objectives with decay usage scheduling. IEEE Transaction of Software Engineering, 19(8), 1993. Google ScholarDigital Library
L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques (PACT-15), 2006. Google ScholarDigital Library
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A NUCA Substrate for Flexible CMP Cache Sharing. In Proceedings of the 19th ACM International Conference on Supercomputing (ICS-19), 2005. Google ScholarDigital Library
R. Iyer. CQoS: a Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proceedings of the 18th ACM International Conference on Supercomputing (ICS-18), 2004. Google ScholarDigital Library
L. K. John. More on Finding a Single number to Indicate Overall Performance of a Benchmark Suite. SIGARCH Computer Architecture News, 32(1), 2004. Google ScholarDigital Library
S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques (PACT-13), 2004. Google ScholarDigital Library
C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs. In Proceedings of the 10th International Symposium on High-Performance Computer Architecture (HPCA-10), 2004. Google ScholarDigital Library
K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2001), 2001.Google Scholar
P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb 2002. Google ScholarDigital Library
M. M. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, 2005. Google ScholarDigital Library
K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair Queuing Memory Systems. In Proceedings of the 39th Annual International Symposium on Microarchitecture (MICRO-39), 2006. Google ScholarDigital Library
A. K. Parekh and R. G. Gallager. A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: the Single-node Case. IEEE Transaction of Networks, 1(3):344--357, 1993. Google ScholarDigital Library
P. Petoumenos, G. Keramidas, H. Zeffer, S. Kaxiras, and E. Hagersten. STATSHARE: A Statistical Model for Managing Cache Sharing via Decay. In Second Annual Workshop on Modeling, Benchmarking and Simulation (MoBS 2006), 2006.Google Scholar
M. M. Planas, F. Cazorla, A. Ramirez, and M. Valero. Explaining Dynamic Cache Partitioning Speed Ups. IEEE Computer Architecture Letters, 6(1), 2007. Google ScholarDigital Library
M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Annual International Symposium on Microarchitecture (MICRO-39), 2006. Google ScholarDigital Library
M. K. Qureshi, D. Thompson, and Y. N. Patt. Tl e V-way Cache: Demand Based Associativity via Global Replacement. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarDigital Library
N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques (PACT-15), 2006. Google ScholarDigital Library
J. E. Smith. Characterizing Computer Performance with a Single Number. Communication of ACM, 31(10), 1988. Google ScholarDigital Library
A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), 2000. Google ScholarDigital Library
A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2002. Google ScholarDigital Library
E. Speight, H. Shafi, L. Zhang, and R. Rajamony. Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarDigital Library
H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Transaction of Computers, 41(9):1054--1068, 1992. Google ScholarDigital Library
G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-aware Scheduling and Partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-8), 2002. Google ScholarDigital Library
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 28(1):7--26, 2004. Google ScholarDigital Library
J. M. Tendler, J. S. Dodson, J. S. F. Jr., H. Le, and B. Sinharoy. IBM Power4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, 2002. Google ScholarDigital Library
B. Verghese, A. Gupta, and M. Rosenblum. Performance Isolation: Sharing and Isolation in Shared-memory Multiprocessors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), 1998. Google ScholarDigital Library
C. A. Waldspurger. Lottery and stride scheduling: Flexible proportional-share resource management. Technical Report MIT/LCS/TR-667, Cambridge, MA, USA, 1995. Google ScholarDigital Library
T. Y. Yeh and G. Reinman. Fast and Fair: Data-stream Quality of Service. In Proceedings of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'05), 2005. Google ScholarDigital Library
M. Zhang and K. Asanovic. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled CMPs. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarDigital Library

Index Terms

Cooperative cache partitioning for chip multiprocessors
1. General and reference
  1. Cross-computing tools and techniques
    1. Design
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Cooperative Caching for Chip Multiprocessors

This paper presents CMP Cooperative Caching, a unified framework to manage a CMP's aggregate on-chip cache resources. Cooperative caching combines the strengths of private and shared cache organizations by forming an aggregate "shared" cache through ...
Read More
Cooperative cache partitioning for chip multiprocessors
ICS '07: Proceedings of the 21st annual international conference on Supercomputing

This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads concurrently running on CMPs. Unlike cache partitioning schemes that use a single spatial partition repeatedly throughout a stable program phase, CCP ...
Read More
IPC-Based Cache Partitioning: An IPC-Oriented Dynamic Shared Cache Partitioning Mechanism
ICHIT '08: Proceedings of the 2008 International Conference on Convergence and Hybrid Information Technology

In a chip-multiprocessor with a shared cache structure, the last level cache is shared by multiple applications executing simultaneously. The competing accesses from different applications degrade the system performance, resulting in non-predicting ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ACM International Conference on Supercomputing 25th Anniversary Volume
June 2014
94 pages
ISBN:9781450328401
DOI:10.1145/2591635
Editor:
Utpal Banerjee
Copyright © 2007 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 June 2007
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
CMP
QoS
cooperative cache partitioning
fairness
multiple time-sharing partitions
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 660
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cooperative cache partitioning for chip multiprocessors

ACM International Conference on Supercomputing 25th Anniversary Volume

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cooperative Caching for Chip Multiprocessors

Cooperative cache partitioning for chip multiprocessors

IPC-Based Cache Partitioning: An IPC-Oriented Dynamic Shared Cache Partitioning Mechanism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cooperative cache partitioning for chip multiprocessors

ACM International Conference on Supercomputing 25th Anniversary Volume

ABSTRACT

References

Cited By

Index Terms

Recommendations

Cooperative Caching for Chip Multiprocessors

Cooperative cache partitioning for chip multiprocessors

IPC-Based Cache Partitioning: An IPC-Oriented Dynamic Shared Cache Partitioning Mechanism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media