skip to main content
10.1145/2000064.2000073acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Vantage: scalable and efficient fine-grain cache partitioning

Published:04 June 2011Publication History

ABSTRACT

Cache partitioning has a wide range of uses in CMPs, from guaranteeing quality of service and controlled sharing to security-related techniques. However, existing cache partitioning schemes (such as way-partitioning) are limited to coarse-grain allocations, can only support few partitions, and reduce cache associativity, hurting performance. Hence, these techniques can only be applied to CMPs with 2-4 cores, but fail to scale to tens of cores.

We present Vantage, a novel cache partitioning technique that overcomes the limitations of existing schemes: caches can have tens of partitions with sizes specified at cache line granularity, while maintaining high associativity and strong isolation among partitions. Vantage leverages cache arrays with good hashing and associativity, which enable soft-pinning a large portion of cache lines. It enforces capacity allocations by controlling the replacement process. Unlike prior schemes, Vantage provides strict isolation guarantees by partitioning most (e.g. 90%) of the cache instead of all of it. Vantage is derived from analytical models, which allow us to provide strong guarantees and bounds on associativity and sizing independent of the number of partitions and their behaviors. It is simple to implement, requiring around 1.5% state overhead and simple changes to the cache controller.

We evaluate Vantage using extensive simulations. On a 32-core system, using 350 multiprogrammed workloads and one partition per core, partitioning the last-level cache with conventional techniques degrades throughput for 71% of the workloads versus an unpartitioned cache (by 7% average, 25% maximum degradation), even when using 64-way caches. In contrast, Vantage improves throughput for 98% of the workloads, by 8% on average (up to 20%), using a 4-way cache.

Skip Supplemental Material Section

Supplemental Material

isca_3a_1.mp4

mp4

134.2 MB

References

  1. J. L. Carter and M. N. Wegman. Universal classes of hash functions (extended abstract). In Proc. of the 9th annual ACM Symposium on Theory of Computing, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Ceze, J. Tuck, J. Torrellas, and C. Cascaval. Bulk disambiguation of speculative threads in multiprocessors. In Proc. of the 33rd annual Intl. Symp. on Computer Architecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. Chiou, P. Jain, S. Devadas, and L. Rudolph. Dynamic cache partitioning via columnization. In Proc. of the 37th annual Design Automation Conf., 2000.Google ScholarGoogle Scholar
  4. D. Chiou, P. Jain, L. Rudolph, and S. Devadas. Application-specific memory management for embedded systems using software-controlled caches. In Proc. of the 37th annual Design Automation Conf., 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. H. Cook, K. Asanović, and D. A. Patterson. Virtual local stores: Enabling software-managed memory hierarchies in mainstream computing environments. Technical report, EECS Department, U. of California, Berkeley, 2009.Google ScholarGoogle Scholar
  6. G. Gerosa et al. A sub-1W to 2W low-power IA processor for mobile internet devices and ultra-mobile PCs in 45nm hi-K metal gate CMOS. In IEEE Intl. Solid-State Circuits Conf., 2008.Google ScholarGoogle Scholar
  7. F. Guo, H. Kannan, L. Zhao, R. Illikkal, R. Iyer, D. Newell, Y. Solihin, and C. Kozyrakis. From Chaos to QoS: Case Studies in CMP Resource Management. ACM SIGARCH Computer Architecture News, 35(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. L. Hammond, V. Wong, M. Chen, B. D. Carlstrom, J. D. Davis, B. Hertzberg, M. K. Prabhu, H. Wijaya, C. Kozyrakis, and K. Olukotun. Transactional memory coherence and consistency. In Proc. of the 31st annual Intl. Symp. on Computer Architecture. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Hsu, S. Reinhardt, R. Iyer, and S. Makineni. Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In Proc. of the 15th intl. conf. on Parallel Architectures and Compilation Techniques, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Iyer. CQoS: A framework for enabling QoS in shared caches of CMP platforms. In Proc. of the 18th annual intl. conf. on Supercomputing, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely, Jr., and J. Emer. Adaptive insertion policies for managing shared caches. In Proc. of the 17th intl. conf. on Parallel Architectures and Compilation Techniques, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Jaleel, K. Theobald, S. C. S. Jr, and J. Emer. High performance cache replacement using re-reference interval prediction (RRIP). In Proc. of the 37th annual Intl. Symp. on Computer Architecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Kurd et al. Westmere: A family of 32nm IA processors. In IEEE Intl. Solid-State Circuits Conf., 2010.Google ScholarGoogle Scholar
  14. J. Lin, Q. Lu, X. Ding, Z. Zhang, X. Zhang, and P. Sadayappan. Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems. In Proc. of the 14th IEEE intl. symp. on High Performance Computer Architecture, 2008.Google ScholarGoogle Scholar
  15. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proc. of the ACM SIGPLAN conf. on Programming Language Design and Implementation, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. V. Nagarajan and R. Gupta. ECMon: exposing cache events for monitoring. In Proc. of the 36th annual Intl. Symp. on Computer Architecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Percival. Cache missing for fun and profit. BSDCan, 2005.Google ScholarGoogle Scholar
  18. M. Qureshi. Adaptive spill-receive for robust high-performance caching in cmps. In Proc. of the 10th intl. symp. on High Performance Computer Architecture, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Qureshi and Y. Patt. Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In Proc. of the 39th annual IEEE/ACM intl. symp. on Microarchitecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Ranganathan, S. Adve, and N. Jouppi. Reconfigurable caches and their application to media processing. In Proc. of the 27th annual Intl. Symp. on Computer Architecture, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Sanchez and C. Kozyrakis. The ZCache: Decoupling Ways and Associativity. In Proc. of the 43rd annual IEEE/ACM intl. symp. on Microarchitecture, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Seznec. A case for two-way skewed-associative caches. In Proc. of the 20th annual Intl. Symp. on Computer Architecture, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Shin et al. A 40nm 16-core 128-thread CMT SPARC SoC processor. In Intl. Solid-State Circuits Conf., 2010.Google ScholarGoogle Scholar
  24. G. Suh, S. Devadas, and L. Rudolph. A new memory monitoring scheme for memory-aware scheduling and partitioning. In Proc of the 8th IEEE intl. symp. on High Performance Computer Architecture, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. K. Varadarajan, S. Nandy, V. Sharda, A. Bharadwaj, R. Iyer, S. Makineni, and D. Newell. Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions. In Proc. of the 39th annual IEEE/ACM intl. symp. on Microarchitecture, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Wu and M. Martonosi. A Comparison of Capacity Management Schemes for Shared CMP Caches. In Proc. of the 7th Workshop on Duplicating, Deconstructing, and Debunking, 2008.Google ScholarGoogle Scholar
  27. Y. Xie and G. H. Loh. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches. In Proc. of the 36th annual Intl. Symp. on Computer Architecture, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Vantage: scalable and efficient fine-grain cache partitioning

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ISCA '11: Proceedings of the 38th annual international symposium on Computer architecture
        June 2011
        488 pages
        ISBN:9781450304726
        DOI:10.1145/2000064
        • cover image ACM SIGARCH Computer Architecture News
          ACM SIGARCH Computer Architecture News  Volume 39, Issue 3
          ISCA '11
          June 2011
          462 pages
          ISSN:0163-5964
          DOI:10.1145/2024723
          Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 June 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate543of3,203submissions,17%

        Upcoming Conference

        ISCA '24

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader