skip to main content
10.1145/2591635.2667188acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Cooperative cache partitioning for chip multiprocessors

Published:17 June 2007Publication History

ABSTRACT

This paper presents Cooperative Cache Partitioning (CCP) to allocate cache resources among threads concurrently running on CMPs. Unlike cache partitioning schemes that use a single spatial partition repeatedly throughout a stable program phase, CCP resolves cache contention with multiple time-sharing partitions. Timesharing cache resources among partitions allows each thrashing thread to speed up dramatically in at least one partition by unfairly shrinking other threads' capacity allocations, while improving fairness by giving different partitions equal chance to execute. Quality-of-Service (QoS) is guaranteed over the long term by orchestrating the shrink and expansion of each thread's capacity across partitions to bound the average slowdown. Time-sharing based cache partitioning is further integrated with CMP cooperative caching [6] to exploit the benefits of LRU-based latency optimizations, which leads to a simplified partitioning algorithm and better performance for workloads that do not benefit from cache partitioning. We evaluate the effectiveness of CCP by simulating a 4-core CMP running all combinations of 7 representative SPEC2000 benchmarks. For workloads that can benefit from cache partitioning, CCP achieves up to 60%, and on average 12%, better performance than the exhaustive search of optimal static partitions. Overall, CCP provides the best results on almost all evaluation metrics for different cache sizes.

References

  1. L. A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. In Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA-27), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. B. M. Beckmann, M. R. Marty, and D. A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proceedings of the 39th Annual International Symposium on Microarchitecture (MICRO-39), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Bertsekas and R. Gallager. Data Networks (2nd ed.). Prentice-Hall, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Bruno, E. Gabber, B. Özden, and A. Silberschatz. The Eclipse Operating System: Providing Quality of Service via Reservation Domains. In USENIX 1998 Annual Technical Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. W. Carr and J. L. Hennessy. WSClock - a Simple and Effective Algorithm for Virtual Memory Management. In Proceedings of the 8th ACM Symposium on Operating Systems Principles (SOSP-8), 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Chang and G. S. Sohi. Cooperative Caching for Chip Multiprocessors. In Proceedings of the 33th Annual International Symposium on Computer Architecture (ISCA-33), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. T. Chiou. Extending the Reach of Microprocessors: Column and Curious Caching. PhD thesis, MIT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Z. Chishti, M. D. Powell, and T. N. Vijaykumar. Optimizing Replication, Communication and Capacity Allocation in CMPs. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. J. Denning. Thrashing: Its Causes and Prevention. In AFIPS 1968 Fall Joint Computer Conference, volume 33, pages 915--922, 1968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Dybdahl and P. Stenstrom. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors. In Proceedings of the 13th International Symposium on High-Performance Computer Architecture (HPCA-13), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Dybdahl, P. Stenstrom, and L. Natvig. A Cache-Partition Aware Replacement Policy for Chip Multiprocessors. In ACM 2006 Conference on High Performance Computing (HiPC-13), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Fagin and J. H. Williams. A fair carpool scheduling algorithm. IBM Journal of Research and Development, 27(2):133--139, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Fedorova, M. Seltzer, C. Small, and D. Nussbaum. Performance Of Multithreaded Chip Multiprocessors And Implications For Operating System Design. In USENIX 2005 Annual Technical Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Hammond, B. A. Hubbert, M. Siu, M. K. Prabhu, M. Chen, and K. Olukotun. The Stanford Hydra CMP. IEEE Micro, 20(2):71--84, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. L. Hellerstein. Achieving service rate objectives with decay usage scheduling. IEEE Transaction of Software Engineering, 19(8), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. L. R. Hsu, S. K. Reinhardt, R. Iyer, and S. Makineni. Communist, Utilitarian, and Capitalist Cache Policies on CMPs: Caches as a Shared Resource. In Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques (PACT-15), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger, and S. W. Keckler. A NUCA Substrate for Flexible CMP Cache Sharing. In Proceedings of the 19th ACM International Conference on Supercomputing (ICS-19), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. R. Iyer. CQoS: a Framework for Enabling QoS in Shared Caches of CMP Platforms. In Proceedings of the 18th ACM International Conference on Supercomputing (ICS-18), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. K. John. More on Finding a Single number to Indicate Overall Performance of a Benchmark Suite. SIGARCH Computer Architecture News, 32(1), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kim, D. Chandra, and Y. Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proceedings of the 13th International Conference on Parallel Architecture and Compilation Techniques (PACT-13), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Liu, A. Sivasubramaniam, and M. Kandemir. Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs. In Proceedings of the 10th International Symposium on High-Performance Computer Architecture (HPCA-10), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Luo, J. Gummaraju, and M. Franklin. Balancing Throughput and Fairness in SMT Processors. In Proceedings of the 2001 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2001), 2001.Google ScholarGoogle Scholar
  23. P. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg, J. Högberg, F. Larsson, A. Moestedt, and B. Werner. Simics: A Full System Simulation Platform. IEEE Computer, 35(2):50--58, Feb 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. M. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. Multifacet's General Execution-driven Multiprocessor Simulator (GEMS) Toolset. Computer Architecture News, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair Queuing Memory Systems. In Proceedings of the 39th Annual International Symposium on Microarchitecture (MICRO-39), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. K. Parekh and R. G. Gallager. A Generalized Processor Sharing Approach to Flow Control in Integrated Services Networks: the Single-node Case. IEEE Transaction of Networks, 1(3):344--357, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Petoumenos, G. Keramidas, H. Zeffer, S. Kaxiras, and E. Hagersten. STATSHARE: A Statistical Model for Managing Cache Sharing via Decay. In Second Annual Workshop on Modeling, Benchmarking and Simulation (MoBS 2006), 2006.Google ScholarGoogle Scholar
  28. M. M. Planas, F. Cazorla, A. Ramirez, and M. Valero. Explaining Dynamic Cache Partitioning Speed Ups. IEEE Computer Architecture Letters, 6(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. K. Qureshi and Y. N. Patt. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th Annual International Symposium on Microarchitecture (MICRO-39), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. K. Qureshi, D. Thompson, and Y. N. Patt. Tl e V-way Cache: Demand Based Associativity via Global Replacement. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Rafique, W.-T. Lim, and M. Thottethodi. Architectural Support for Operating System-Driven CMP Cache Management. In Proceedings of the 15th International Conference on Parallel Architecture and Compilation Techniques (PACT-15), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. E. Smith. Characterizing Computer Performance with a Single Number. Communication of ACM, 31(10), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Snavely, D. M. Tullsen, and G. Voelker. Symbiotic jobscheduling with priorities for a simultaneous multithreading processor. In Proceedings of 2002 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. E. Speight, H. Shafi, L. Zhang, and R. Rajamony. Adaptive Mechanisms and Policies for Managing Cache Hierarchies in Chip Multiprocessors. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. S. Stone, J. Turek, and J. L. Wolf. Optimal partitioning of cache memory. IEEE Transaction of Computers, 41(9):1054--1068, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. E. Suh, S. Devadas, and L. Rudolph. A New Memory Monitoring Scheme for Memory-aware Scheduling and Partitioning. In Proceedings of the 8th International Symposium on High-Performance Computer Architecture (HPCA-8), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. E. Suh, L. Rudolph, and S. Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 28(1):7--26, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. M. Tendler, J. S. Dodson, J. S. F. Jr., H. Le, and B. Sinharoy. IBM Power4 system microarchitecture. IBM Journal of Research and Development, 46(1):5--26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. B. Verghese, A. Gupta, and M. Rosenblum. Performance Isolation: Sharing and Isolation in Shared-memory Multiprocessors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. A. Waldspurger. Lottery and stride scheduling: Flexible proportional-share resource management. Technical Report MIT/LCS/TR-667, Cambridge, MA, USA, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. T. Y. Yeh and G. Reinman. Fast and Fair: Data-stream Quality of Service. In Proceedings of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'05), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zhang and K. Asanovic. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled CMPs. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Cooperative cache partitioning for chip multiprocessors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ACM International Conference on Supercomputing 25th Anniversary Volume
        June 2014
        94 pages
        ISBN:9781450328401
        DOI:10.1145/2591635

        Copyright © 2007 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 17 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate584of2,055submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader