research-article

Variation Aware Cache Partitioning for Multithreaded Programs

Authors:
Vivek Kozhikkottu

School of Electrical and Computer Engineering, Purdue University

School of Electrical and Computer Engineering, Purdue University
View Profile

,
Abhisek Pan

School of Electrical and Computer Engineering, Purdue University

School of Electrical and Computer Engineering, Purdue University
View Profile

,
Vijay Pai

School of Electrical and Computer Engineering, Purdue University

School of Electrical and Computer Engineering, Purdue University
View Profile

,
Sujit Dey

School of Electrical and Computer Engineering, UC San Diego

School of Electrical and Computer Engineering, UC San Diego
View Profile

,
Anand Raghunathan

School of Electrical and Computer Engineering, Purdue University

School of Electrical and Computer Engineering, Purdue University
View Profile

DAC '14: Proceedings of the 51st Annual Design Automation ConferenceJune 2014Pages 1–6https://doi.org/10.1145/2593069.2593240

Published:01 June 2014Publication History

DAC '14: Proceedings of the 51st Annual Design Automation Conference

Pages 1–6

ABSTRACT

Multithreaded programs are commonly written and optimized for homogeneous multi-core processors assuming equal performance from all the cores. This assumption greatly simplifies the partitioning and balancing of an application's workload across threads; however, it no longer holds when the frequencies of the cores differ due to within-die variations, leading to a degradation in performance. We observe that, in addition to the frequency of the core that it executes on, the performance of a thread is also dependent on the share of shared system resources, such as last-level cache, that it receives. We propose variation-aware cache partitioning as an approach to redress the variation-induced imbalance in the execution times of threads, thereby improving the performance of multi-threaded programs. We discuss the challenges involved in realizing our proposal, including synchronization (e.g., barriers) across threads, which results in faster threads being limited by slower threads, the complex and non-linear relationship between a thread's performance and the cache capacity allocated to it, and the fact that different program phases, can respond quite differently to varying cache capacity. We propose a runtime scheme to perform spatio-temporal cache partitioning while considering both chip characteristics (frequency variations) and program characteristics. We evaluate the proposed technique by applying it to an ensemble of variation-impacted multi-cores executing multi-threaded programs from the PARSEC and SPEC-OMP suites, and demonstrate that it results in an average performance improvement of 15% by mitigating the impact of frequency variations.

References

S. Dighe et al. Within-die variation-aware dynamic voltage frequency scaling with optimal core allocation and thread hopping for the 80-core teraflops processor. Trans. JSSC, 46(1), 2011.Google Scholar
J. Sartori et al. Variation-aware speed binning of multi-core processors. In Proc. ISQED, pages 307--314, 2010.Google ScholarCross Ref
G. E. Suh, L. Rudolph, and S. Devadas. Dynamic partitioning of shared cache memory. J. Supercomput., 28(1):7--26, April 2004. Google ScholarDigital Library
F. Guo et al. Quality of service shared cache management in chip multiprocessor architecture. ACM TACO, 7(3):14:1--14:33, 2010. Google ScholarDigital Library
S. R. Sarangi et al. VARIUS: A Model of Process Variation and Resulting Timing Errors for Microarchitects. IEEE Trans. Semiconductor Manufacturing, 21(1):3 --13, 2008.Google ScholarCross Ref
S. Eyerman et al. A performance counter architecture for computing accurate CPI components. In Proc. ASPLOS, 2006. Google ScholarDigital Library
T. Cormen et al. Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001. Google ScholarDigital Library
J. Engblom et al. Full-system simulation from embedded to high-performance systems. In Processor and System-on-Chip Simulation, pages 25--45. Springer US, 2010.Google ScholarCross Ref
M. M. K. Martin et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News, 33(4):92--99, November 2005. Google ScholarDigital Library
C. Bienia et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. PACT, 2008. Google ScholarDigital Library
V. Aslot et al. SPEComp: A new benchmark suite for measuring parallel computer performance. In Proc. WOMPAT, pages 1--10, London, UK, UK, 2001. Springer-Verlag. Google ScholarDigital Library
K. K. Rangan et al. Achieving uniform performance and maximizing throughput in the presence of heterogeneity. In Proc. HPCA, pages 3--14, 2011. Google ScholarDigital Library
R. Teodorescu et al. Variation-aware application scheduling and power management for chip multiprocessors. In Proc. ISCA, 2008. Google ScholarDigital Library
A. Bhattacharjee et al. Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors. In Proc. ISCA, 2009. Google ScholarDigital Library
S. Herbert et al. Variation-aware dynamic voltage/frequency scaling. In Proc. HPCA, pages 301--312, 2009.Google ScholarCross Ref
S. Kim et al. Fair cache sharing and partitioning in a chip multiprocessor architecture. In Proc. PACT, pages 111--122, 2004. Google ScholarDigital Library
A. Pan et al. Imbalanced cache partitioning for balanced data-parallel programs. In Proc. Micro, pages 297--309, 2013. Google ScholarDigital Library
S.P. Muralidhara et al. Intra-application cache partitioning. In Proc. IPDPS, pages 1--12, 2010.Google ScholarCross Ref
M. Kandemir et al. A helper thread based dynamic cache partitioning scheme for multithreaded applications. In Proc. DAC, pages 954--959, 2011. Google ScholarDigital Library

Index Terms

Variation Aware Cache Partitioning for Multithreaded Programs
1. Hardware
  1. Emerging technologies
  2. Very large scale integration design

Recommendations

PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy
ICCAI '22: Proceedings of the 8th International Conference on Computing and Artificial Intelligence

In multi-core systems, hardware prefetchers aggravate the preemption of some access-intensive programs for shared last level cache (LLC) resources, resulting in lower system performance. As a solution, we propose a prefetch-aware multi-core shared cache ...
Read More
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Read More
Multicore Cache Simulations Using Heterogeneous Computing on General Purpose and Graphics Processors
DSD '11: Proceedings of the 2011 14th Euromicro Conference on Digital System Design

Traditional trace-driven memory system simulation is a very time consuming process while the advent of multicores simply exacerbates the problem. We propose a framework for accelerating trace-driven multicore cache simulations by utilizing the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '14: Proceedings of the 51st Annual Design Automation Conference
June 2014
1249 pages
ISBN:9781450327305
DOI:10.1145/2593069

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Multicores
Parallel Programs
Variation Aware Design
Variation Tolerance
Variations
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 202
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Variation Aware Cache Partitioning for Multithreaded Programs

DAC '14: Proceedings of the 51st Annual Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

Multicore Cache Simulations Using Heterogeneous Computing on General Purpose and Graphics Processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Variation Aware Cache Partitioning for Multithreaded Programs

DAC '14: Proceedings of the 51st Annual Design Automation Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

PACP: A Prefetch-aware Multi-core Shared Cache Partitioning Strategy

Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies

Multicore Cache Simulations Using Heterogeneous Computing on General Purpose and Graphics Processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media