research-article

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores

Authors:
Adrian Garcia-Garcia

Complutense University of Madrid

Complutense University of Madrid
View Profile

,
Juan Carlos Saez

Complutense University of Madrid

Complutense University of Madrid
View Profile

,
Fernando Castro

Complutense University of Madrid

Complutense University of Madrid
View Profile

,
Manuel Prieto-Matias

Complutense University of Madrid

Complutense University of Madrid
View Profile

ICPP '19: Proceedings of the 48th International Conference on Parallel ProcessingAugust 2019Article No.: 14Pages 1–10https://doi.org/10.1145/3337821.3337925

Published:05 August 2019Publication History

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

Pages 1–10

ABSTRACT

Multicore processors constitute the main architecture choice for modern computing systems in different market segments. Despite their benefits, the contention that naturally appears when multiple applications compete for the use of shared resources among cores, such as the last-level cache (LLC), may lead to substantial performance degradation. This may have a negative impact on key system aspects such as throughput and fairness. Assigning the various applications in the workload to separate LLC partitions with possibly different sizes, has been proven effective to mitigate shared-resource contention effects.

In this article we propose LFOC, a clustering-based cache partitioning scheme that strives to deliver fairness while providing acceptable system throughput. LFOC leverages the Intel Cache Allocation Technology (CAT), which enables the system software to divide the LLC into different partitions. To accomplish its goals, LFOC tries to mimic the behavior of the optimal cache-clustering solution, which we could approximate by means of a simulator in different scenarios. To this end, LFOC effectively identifies streaming aggressor programs and cache sensitive applications, which are then assigned to separate cache partitions.

We implemented LFOC in the Linux kernel and evaluated it on a real system featuring an Intel Skylake processor, where we compare its effectiveness to that of two state-of-the-art policies that optimize fairness and throughput, respectively. Our experimental analysis reveals that LFOC is able to bring a higher reduction in unfairness by leveraging a lightweight algorithm suitable for adoption in a real OS.

References

J. Brock et al. 2015. Optimal Cache Partition-Sharing. In Proceedings of the 2015 44th International Conference on Parallel Processing (ICPP) (ICPP '15). 749--758. Google ScholarDigital Library
E. Ebrahimi et al. 2010. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In 15th Int'l Conf. Architectural Support Programming Lang. and Oper. Syst. (ASPLOS 10). 335--346. Google ScholarDigital Library
N. El-Sayed et al. 2018. KPart: A Hybrid Cache Partitioning-Sharing Technique for Commodity Multicores. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 104--117.Google Scholar
N. El-Sayed et al. 2018. Source Code of KPart. https://github.com/Nosayba/kpart. Accessed: 2019--02--20.Google Scholar
S. Eyerman and L. Eeckhout. 2008. System-Level Performance Metrics for Multi-program Workloads. IEEE Micro 28, 3 (May 2008), 42--53. Google ScholarDigital Library
J. Feliu et al. 2016. Perf & Fair: a Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores. IEEE Trans. Comput. PP, 99 (2016). Google ScholarDigital Library
L. Funaro, O. A. Ben-Yehuda, and A. Schuster. 2016. Ginseng: Market-driven LLC Allocation. In Proceedings of the 2016 USENIX Annual Technical Conference (USENIX ATC '16). 295--308. Google ScholarDigital Library
A. Garcia-Garcia, J. Casas, and J. C. Saez. 2019. PBBCache: A parallel branch-and-bound based cache-partitioning simulator. https://github.com/pbbcache/cachesim. Accessed: 2019--05--10.Google Scholar
A. Garcia-Garcia, J. C. Saez, and M. Prieto-Matias. 2018. Contention-Aware Fair Scheduling for Asymmetric Single-ISA Multicore Systems. IEEE Trans. Comput. 67, 12 (Dec 2018), 1703--1719.Google ScholarDigital Library
S. M. Khan et al. 2014. Improving cache performance using read-write partitioning. In 20th IEEE International Symposium on High Performance Computer Architecture, HPCA 2014. 452--463.Google ScholarCross Ref
D. Lo et al. 2015. Heracles: improving resource efficiency at scale. In Proc. of the 42nd Annual International Symposium on Computer Architecture. 450--462. Google ScholarDigital Library
R. Love. 2010. Linux Kernel Development (3rd ed.). Addison-Wesley Professional. Google ScholarDigital Library
R. Manikantan, K. Rajan, and R. Govindarajan. 2012. Probabilistic Shared Cache Management (PriSM). In Proceedings of the 39th Annual International Symposium on Computer Architecture (ISCA '12). 428--439. Google ScholarDigital Library
S. Mittal. 2017. A Survey of Techniques for Cache Partitioning in Multicore Processors. ACM Comput. Surv. 50, 2, Article 27 (May 2017), 27:1--27:39 pages. Google ScholarDigital Library
T.Y. Morad et al. 2016. EFS: Energy-Friendly Scheduler for memory bandwidth constrained systems. J. Parallel and Distrib. Comput. 95 (2016), 3--14. Google ScholarDigital Library
A. Mukkara, N. Beckmann, and D. Sanchez. 2016. Whirlpool: Improving Dynamic Cache Management with Static Data Classification. In Proc. of the 21st Int'l Conf. on Arch. Support for Programming Lang. and Oper. Syst. (ASPLOS '16). 113--127. Google ScholarDigital Library
O. Mutlu and T. Moscibroda. 2007. Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors. In 40th Ann. IEEE/ACM Int'l Symp. on Microarchitecture (MICRO 07). 146--160. Google ScholarDigital Library
K. Nguyen. 2016. Introduction to Cache Allocation Technology in the Intel Xeon Processor E5 v4 Family. https://software.intel.com/en-us/articles/introduction-to-cache-allocation-technology. Accessed: 2019--03--20.Google Scholar
M.K. Qureshi and Y.N. Patt. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. In Proceedings of MICRO 06. 423--432. Google ScholarDigital Library
J.C. Saez et al. 2017. PMCTrack: Delivering Performance Monitoring Counter Support to the OS Scheduler. Comput. J. 60, 1 (2017), 60--85.Google ScholarCross Ref
J.C. Saez et al. 2017. Towards completely fair scheduling on asymmetric single-ISA multicore processors. J. Parallel and Distrib. Comput. 102 (2017), 115--131. Google ScholarDigital Library
J.C. Saez, J.I. Gomez, and M. Prieto. 2008. Improving Priority Enforcement via Non-Work-Conserving Scheduling. In ICPP '08: Proceedings of the 2008 37th International Conference on Parallel Processing. 99--106. Google ScholarDigital Library
A. Scolari, D.B. Bartolini, and M.D. Santambrogio. 2016. A Software Cache Partitioning System for Hash-Based Caches. ACM Trans. Archit. Code Optim. 13, 4, Article 57 (Dec. 2016), 57:1--57:24 pages. Google ScholarDigital Library
V. Selfa et al. 2017. Application Clustering Policies to Address System Fairness with Intel's Cache Allocation Technology. In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT). 194--205.Google ScholarCross Ref
T. Sherwood, B. Calder, and J. Emer. 1999. Reducing Cache Misses Using Hardware and Software Page Placement. In Proceedings of the 13th International Conference on Supercomputing (ICS '99). 155--164. Google ScholarDigital Library
L. Subramanian et al. 2015. The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-application Interference at Shared Caches and Main Memory. In Proceedings of the 48th International Symposium on Microarchitecture (MICRO-48). 62--75. Google ScholarDigital Library
K. Van Craeynest et al. 2013. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. In 22nd Int'l Conf. Parallel Arch. Compilation Techniques (PACT 13). 177--187. Google ScholarDigital Library
R. Wang and L. Chen. 2014. Futility Scaling: High-Associativity Cache Partitioning. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-47). 356--367. Google ScholarDigital Library
D. Xu et al. 2012. Providing Fairness on Shared-memory Multiprocessors via Process Scheduling. In Proc. ACM Int'l Conf. Measurement and Modeling Comp. Syst. (SIGMETRICS 12). 295--306. Google ScholarDigital Library
Y. Ye et al. 2014. COLORIS: A Dynamic Cache Partitioning System Using Page Coloring. In Proceedings of the 23rd International Conference on Parallel Architectures and Compilation (PACT '14). 381--392. Google ScholarDigital Library
C. Yu and P. Petrov. 2010. Off-chip Memory Bandwidth Minimization Through Cache Partitioning for Multi-core Platforms. In Proceedings of the 47th Design Automation Conference (DAC '10). 132--137. Google ScholarDigital Library
H. Yun et al. 2014. PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms. In 20th Real-Time Embedded Tech. and Applications Symp. (RTAS 14). 155--166.Google ScholarCross Ref
H. Yun et al. 2016. Memory Bandwidth Management for Efficient Performance Isolation in Multi-Core Platforms. IEEE Trans. Comput. 65, 2 (Feb 2016), 562--576. Google ScholarDigital Library
X. Zhang, S. Dwarkadas, and K. Shen. 2009. Towards Practical Page Coloring-based Multicore Cache Management. In Proceedings of the 4th ACM European Conference on Computer Systems (EuroSys '09). 89--102. Google ScholarDigital Library
H. Zhu and M. Erez. 2016. Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems. In Proc. of the 21st Int'l Conf. on Architectural Support for Programming Languages and Operating Systems (ASPLOS '16). 33--47. Google ScholarDigital Library
S. Zhuravlev et al. 2012. Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors. ACM Comput. Surv. 45, 1, Article 4 (Dec. 2012), 28 pages. Google ScholarDigital Library

Index Terms

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Scheduling

Recommendations

Exploring cache bypassing and partitioning for multi-tasking on GPUs
ICCAD '17: Proceedings of the 36th International Conference on Computer-Aided Design

Graphics Processing Units (GPUs) computing has become ubiquitous for embedded system, evidenced by its wide adoption for various general purpose applications. As more and more applications are accelerated by GPUs, multi-tasking scenario starts to ...
Read More
Towards completely fair scheduling on asymmetric single-ISA multicore processors

Single-ISA asymmetric multicore processors (AMPs), which combine high-performance big cores with low-power small cores, were shown to deliver higher performance per watt than symmetric CMPs (Chip Multi-Processors). Previous work has highlighted that ...
Read More
Parallelism via Multithreaded and Multicore CPUs

Multicore and multithreaded CPUs have become the new approach to obtaining increases in CPU performance. Numeric applications mostly benefit from a large number of computationally powerful cores. Servers typically benefit more if chip circuitry is used ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing
August 2019
1107 pages
ISBN:9781450362955
DOI:10.1145/3337821

Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 August 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Intel Cache Allocation Technology
Linux kernel
Multicore processors
cache partitioning
clustering
fairness
operating system
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate91of313submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 226
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring cache bypassing and partitioning for multi-tasking on GPUs

Towards completely fair scheduling on asymmetric single-ISA multicore processors

Parallelism via Multithreaded and Multicore CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

LFOC: A Lightweight Fairness-Oriented Cache Clustering Policy for Commodity Multicores

ICPP '19: Proceedings of the 48th International Conference on Parallel Processing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploring cache bypassing and partitioning for multi-tasking on GPUs

Towards completely fair scheduling on asymmetric single-ISA multicore processors

Parallelism via Multithreaded and Multicore CPUs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media