research-article

Portable performance on asymmetric multicore processors

Authors:
Ivan Jibaja

University of Texas at Austin, USA / Pure Storage, USA

University of Texas at Austin, USA / Pure Storage, USA
View Profile

,
Ting Cao

Institute of Computing Technology at Chinese Academy of Sciences, China / Australian National University, Australia

Institute of Computing Technology at Chinese Academy of Sciences, China / Australian National University, Australia
View Profile

,
Stephen M. Blackburn

Australian National University, Australia

Australian National University, Australia
View Profile

,
Kathryn S. McKinley

Microsoft Research, USA

Microsoft Research, USA
View Profile

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and OptimizationFebruary 2016Pages 24–35https://doi.org/10.1145/2854038.2854047

Published:29 February 2016Publication History

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

Pages 24–35

ABSTRACT

Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load balance, and the critical path. Applying these criteria effectively is challenging especially for complex and non-scalable multithreaded applications. We demonstrate that runtimes for managed languages, which are now ubiquitous, provide a unique opportunity to abstract over AMP complexity and inform scheduling with rich semantics such as thread priorities, locks, and parallelism— information not directly available to the hardware, OS, or application. We present the WASH AMP scheduler, which (1) automatically identifies and accelerates critical threads in concurrent, but non-scalable applications; (2) respects thread priorities; (3) considers core availability and thread sensitivity; and (4) proportionally schedules threads on big and small cores to optimize performance and energy. We introduce new dynamic analyses that identify critical threads and classify applications as sequential, scalable, or non-scalable. Compared to prior work, WASH improves performance by 20% and energy by 9% or more on frequency-scaled AMP hardware (not simulation). Performance advantages grow to 27% when asymmetry increases. Performance advantages are robust to a complex multithreaded adversary independently scheduled by the OS. WASH effectively identifies and optimizes a wider class of workloads than prior work.

References

Android. Bionic platform, 2014. URL https://github.com/ android/platform_bionic.Google Scholar
D. F. Bacon, R. B. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for Java. In PLDI’98, pages 258–268, 1998. Google ScholarDigital Library
M. Becchi and P. Crowley. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers, pages 29–40, 2006. ISBN 1-59593-302-6. Google ScholarDigital Library
S. M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA’06, pages 169–190, Oct. 2006. Google ScholarDigital Library
T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In ISCA’12, pages 225–236, 2012. Google ScholarDigital Library
K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ISCA’12, pages 213–224, 2012. Google ScholarDigital Library
K. V. Craeynest, S. Akram, W. Heirman, A. Jaleel, and L. Eeckhout. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. PACT’13, pages 177–187, 2013. Google ScholarDigital Library
K. Du Bois, S. Eyerman, J. B. Sartor, and L. Eeckhout. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In ISCA’13, pages 511–522, Jun. 2013. Google ScholarDigital Library
H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In ASPLOS, pages 319–332, 2011. Google ScholarDigital Library
M. Hill and M. Marty. Amdahl’s law in the multicore era. Computer, 41(7):33–38, 2008. Google ScholarDigital Library
J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Bottleneck identification and scheduling in multithreaded applications. In ASPLOS 2012, pages 223–234, 2012. Google ScholarDigital Library
J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Utility-based acceleration of multithreaded application on asymmetric CMPs. In ISCA’13, pages 154–165, 2013. Google ScholarDigital Library
R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 81–92, 2003. Google ScholarDigital Library
R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In ISCA’04, pages 64–75, 2004. Google ScholarDigital Library
T. Li, D. P. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC’07, pages 1–11, 2007. Google ScholarDigital Library
T. Li, D. Baumberger, and S. Hahn. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In PPoPP’09, pages 65–74, 2009. Google ScholarDigital Library
J. C. Mogul, J. Mudigonda, N. L. Binkert, P. Ranganathan, and V. Talwar. Using asymmetric single-ISA CMPs to save energy on operating systems. Micro, 28(3):26–41, 2008. Google ScholarDigital Library
I. Molnar. Modular Scheduler Core and Completely Fair Scheduler {CFS}. http://lwn.net/Articles/230501/, Apr. 2007.Google Scholar
Qualcomm. Snapdragon 810 processors, 2014. URL https://www. qualcomm.com/products/snapdragon/processors/810.Google Scholar
J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. JPDC, 71(1):114– 131, 2011. Google ScholarDigital Library
J. C. Saez, A. Fedorova, D. Koufaty, and M. Prieto. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM TCM, 30(2):6:1–38, 2012. Google ScholarDigital Library
M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, pages 253–264, 2009. Google ScholarDigital Library
The Jikes RVM Research Group. Jikes Open-Source Research Virtual Machine, 2011. URL http://www.jikesrvm.org.Google Scholar

Index Terms

Portable performance on asymmetric multicore processors
1. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA³ '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and Algorithms

The exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Read More
Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors

Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for ...
Read More
The Impact of Dynamically Heterogeneous Multicore Processors on Thread Scheduling

Although most current multicore processors are homogeneous, microarchitects are now proposing heterogeneous core implementations, including systems in which heterogeneity is introduced at runtime. This article shows that operating system schedulers must ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization
February 2016
283 pages
ISBN:9781450337786
DOI:10.1145/2854038
General Chair:
Bjoern Franke
University of Edinburgh, UK
,
Program Chairs:
Youfeng Wu
Intel, USA
,
Fabrice Rastello
Inria, France
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 29 February 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Asymmetric
Energy
Heterogeneous
Managed Software
Multicore
Performance
Scheduling
Qualifiers
- research-article
Conference

Acceptance Rates
CGO '16 Paper Acceptance Rate25of108submissions,23%Overall Acceptance Rate312of1,061submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 416
  Total Downloads
- Downloads (Last 12 months)20
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Portable performance on asymmetric multicore processors

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors

The Impact of Dynamically Heterogeneous Multicore Processors on Thread Scheduling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Portable performance on asymmetric multicore processors

CGO '16: Proceedings of the 2016 International Symposium on Code Generation and Optimization

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application

Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors

The Impact of Dynamically Heterogeneous Multicore Processors on Thread Scheduling

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media