skip to main content
10.1145/3337821.3337925acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores

Published:05 August 2019Publication History

ABSTRACT

Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects such as throughput and fairness. Assigning the various applications in the workload to separate LLC partitions with possibly different sizes, has been proven effective to mitigate shared-resource contention effects.

In this article we propose LFOC, a clustering-based cache partitioning scheme that strives to deliver fairness while providing acceptable system throughput. LFOC leverages the Intel Cache Allocation Technology (CAT), which enables the system software to divide the LLC into different partitions. To accomplish its goals, LFOC tries to mimic the behavior of the optimal cache-clustering solution, which we could approximate by means of a simulator in different scenarios. To this end, LFOC effectively identifies streaming aggressor programs and cache sensitive applications, which are then assigned to separate cache partitions.

We implemented LFOC in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of two state-of-the-art policies that optimize fairness and throughput, respectively. Our experimental analysis reveals that LFOC is able to bring a higher reduction in unfairness by leveraging a lightweight algorithm suitable for adoption in a real OS.

References

  1. J. Brock et al. 2015. Optimal Cache Partition-Sharing. In Proceedings of the 2015 44th International Conference on Parallel Processing (ICPP) (ICPP '15). 749--758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. Ebrahimi et al. 2010. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In 15th Int'l Conf. Architectural Support Programming Lang. and Oper. Syst. (ASPLOS 10). 335--346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. El-Sayed et al. 2018. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 104--117.Google ScholarGoogle Scholar
  4. N. El-Sayed et al. 2018. Source Code of KPart. https://github.com/Nosayba/kpart. Accessed: 2019--02--20.Google ScholarGoogle Scholar
  5. S. Eyerman and L. Eeckhout. 2008. System-Level Performance Metrics for Multi-program Workloads. IEEE Micro 28, 3 (May 2008), 42--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Feliu et al. 2016. Perf & Fair: a Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores. IEEE Trans. Comput. PP, 99 (2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. L. Funaro, O. A. Ben-Yehuda, and A. Schuster. 2016. Ginseng: Market-driven LLC Allocation. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC '16). 295--308. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Garcia-Garcia, J. Casas, and J. C. Saez. 2019. PBBCache: A parallel branch-and-bound based cache-partitioning simulator. https://github.com/pbbcache/cachesim. Accessed: 2019--05--10.Google ScholarGoogle Scholar
  9. A. Garcia-Garcia, J. C. Saez, and M. Prieto-Matias. 2018. Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems. IEEE Trans. Comput. 67, 12 (Dec 2018), 1703--1719.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. M. Khan et al. 2014. Improving cache performance using read-write partitioning. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014. 452--463.Google ScholarGoogle ScholarCross RefCross Ref
  11. D. Lo et al. 2015. Heracles: improving resource efficiency at scale. In Proc. of the 42nd Annual International Symposium on Computer Architecture. 450--462. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. R. Love. 2010. Linux Kernel Development (3rd ed.). Addison-Wesley Professional. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Manikantan, K. Rajan, and R. Govindarajan. 2012. Probabilistic Shared Cache Management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). 428--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Mittal. 2017. A Survey of Techniques for Cache Partitioning in Multicore Processors. ACM Comput. Surv. 50, 2, Article 27 (May 2017), 27:1--27:39 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. T.Y. Morad et al. 2016. EFS: Energy-Friendly Scheduler for memory bandwidth constrained systems. J. Parallel and Distrib. Comput. 95 (2016), 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Mukkara, N. Beckmann, and D. Sanchez. 2016. Whirlpool: Improving Dynamic Cache Management with Static Data Classification. In Proc. of the 21st Int'l Conf. on Arch. Support for Programming Lang. and Oper. Syst. (ASPLOS '16). 113--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. O. Mutlu and T. Moscibroda. 2007. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In 40th Ann. IEEE/ACM Int'l Symp. on Microarchitecture (MICRO 07). 146--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Nguyen. 2016. Introduction to Cache Allocation Technology in the Intel Xeon Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology. Accessed: 2019--03--20.Google ScholarGoogle Scholar
  19. M.K. Qureshi and Y.N. Patt. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of MICRO 06. 423--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J.C. Saez et al. 2017. PMCTrack: Delivering Performance Monitoring Counter Support to the OS Scheduler. Comput. J. 60, 1 (2017), 60--85.Google ScholarGoogle ScholarCross RefCross Ref
  21. J.C. Saez et al. 2017. Towards completely fair scheduling on asymmetric single-ISA multicore processors. J. Parallel and Distrib. Comput. 102 (2017), 115--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J.C. Saez, J.I. Gomez, and M. Prieto. 2008. Improving Priority Enforcement via Non-Work-Conserving Scheduling. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing. 99--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Scolari, D.B. Bartolini, and M.D. Santambrogio. 2016. A Software Cache Partitioning System for Hash-Based Caches. ACM Trans. Archit. Code Optim. 13, 4, Article 57 (Dec. 2016), 57:1--57:24 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. V. Selfa et al. 2017. Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 194--205.Google ScholarGoogle ScholarCross RefCross Ref
  25. T. Sherwood, B. Calder, and J. Emer. 1999. Reducing Cache Misses Using Hardware and Software Page Placement. In Proceedings of the 13th International Conference on Supercomputing (ICS '99). 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. L. Subramanian et al. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). 62--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Van Craeynest et al. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In 22nd Int'l Conf. Parallel Arch. Compilation Techniques (PACT 13). 177--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. Wang and L. Chen. 2014. Futility Scaling: High-Associativity Cache Partitioning. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). 356--367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Xu et al. 2012. Providing Fairness on Shared-memory Multiprocessors via Process Scheduling. In Proc. ACM Int'l Conf. Measurement and Modeling Comp. Syst. (SIGMETRICS 12). 295--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Ye et al. 2014. COLORIS: A Dynamic Cache Partitioning System Using Page Coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). 381--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Yu and P. Petrov. 2010. Off-chip Memory Bandwidth Minimization Through Cache Partitioning for Multi-core Platforms. In Proceedings of the 47th Design Automation Conference (DAC '10). 132--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. H. Yun et al. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 20th Real-Time Embedded Tech. and Applications Symp. (RTAS 14). 155--166.Google ScholarGoogle ScholarCross RefCross Ref
  33. H. Yun et al. 2016. Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms. IEEE Trans. Comput. 65, 2 (Feb 2016), 562--576. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. X. Zhang, S. Dwarkadas, and K. Shen. 2009. Towards Practical Page Coloring-based Multicore Cache Management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09). 89--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Zhu and M. Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In Proc. of the 21st Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). 33--47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Zhuravlev et al. 2012. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. ACM Comput. Surv. 45, 1, Article 4 (Dec. 2012), 28 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
      August 2019
      1107 pages
      ISBN:9781450362955
      DOI:10.1145/3337821

      Copyright © 2019 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 August 2019

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate91of313submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader