ABSTRACT
Static and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load balance, and the critical path. Applying these criteria effectively is challenging especially for complex and non-scalable multithreaded applications. We demonstrate that runtimes for managed languages, which are now ubiquitous, provide a unique opportunity to abstract over AMP complexity and inform scheduling with rich semantics such as thread priorities, locks, and parallelism— information not directly available to the hardware, OS, or application. We present the WASH AMP scheduler, which (1) automatically identifies and accelerates critical threads in concurrent, but non-scalable applications; (2) respects thread priorities; (3) considers core availability and thread sensitivity; and (4) proportionally schedules threads on big and small cores to optimize performance and energy. We introduce new dynamic analyses that identify critical threads and classify applications as sequential, scalable, or non-scalable. Compared to prior work, WASH improves performance by 20% and energy by 9% or more on frequency-scaled AMP hardware (not simulation). Performance advantages grow to 27% when asymmetry increases. Performance advantages are robust to a complex multithreaded adversary independently scheduled by the OS. WASH effectively identifies and optimizes a wider class of workloads than prior work.
- Android. Bionic platform, 2014. URL https://github.com/ android/platform_bionic.Google Scholar
- D. F. Bacon, R. B. Konuru, C. Murthy, and M. J. Serrano. Thin locks: Featherweight synchronization for Java. In PLDI’98, pages 258–268, 1998. Google ScholarDigital Library
- M. Becchi and P. Crowley. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Computing Frontiers, pages 29–40, 2006. ISBN 1-59593-302-6. Google ScholarDigital Library
- S. M. Blackburn et al. The DaCapo benchmarks: Java benchmarking development and analysis. In OOPSLA’06, pages 169–190, Oct. 2006. Google ScholarDigital Library
- T. Cao, S. M. Blackburn, T. Gao, and K. S. McKinley. The yin and yang of power and performance for asymmetric hardware and managed software. In ISCA’12, pages 225–236, 2012. Google ScholarDigital Library
- K. V. Craeynest, A. Jaleel, L. Eeckhout, P. Narváez, and J. S. Emer. Scheduling heterogeneous multi-cores through performance impact estimation (PIE). In ISCA’12, pages 213–224, 2012. Google ScholarDigital Library
- K. V. Craeynest, S. Akram, W. Heirman, A. Jaleel, and L. Eeckhout. Fairness-aware scheduling on single-ISA heterogeneous multi-cores. PACT’13, pages 177–187, 2013. Google ScholarDigital Library
- K. Du Bois, S. Eyerman, J. B. Sartor, and L. Eeckhout. Criticality stacks: identifying critical threads in parallel programs using synchronization behavior. In ISCA’13, pages 511–522, Jun. 2013. Google ScholarDigital Library
- H. Esmaeilzadeh, T. Cao, X. Yang, S. M. Blackburn, and K. S. McKinley. Looking back on the language and hardware revolutions: Measured power, performance, and scaling. In ASPLOS, pages 319–332, 2011. Google ScholarDigital Library
- M. Hill and M. Marty. Amdahl’s law in the multicore era. Computer, 41(7):33–38, 2008. Google ScholarDigital Library
- J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Bottleneck identification and scheduling in multithreaded applications. In ASPLOS 2012, pages 223–234, 2012. Google ScholarDigital Library
- J. A. Joao, M. A. Suleman, O. Mutlu, and Y. N. Patt. Utility-based acceleration of multithreaded application on asymmetric CMPs. In ISCA’13, pages 154–165, 2013. Google ScholarDigital Library
- R. Kumar, K. I. Farkas, N. P. Jouppi, P. Ranganathan, and D. M. Tullsen. Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction. In IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 81–92, 2003. Google ScholarDigital Library
- R. Kumar, D. M. Tullsen, P. Ranganathan, N. P. Jouppi, and K. I. Farkas. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In ISCA’04, pages 64–75, 2004. Google ScholarDigital Library
- T. Li, D. P. Baumberger, D. A. Koufaty, and S. Hahn. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC’07, pages 1–11, 2007. Google ScholarDigital Library
- T. Li, D. Baumberger, and S. Hahn. Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin. In PPoPP’09, pages 65–74, 2009. Google ScholarDigital Library
- J. C. Mogul, J. Mudigonda, N. L. Binkert, P. Ranganathan, and V. Talwar. Using asymmetric single-ISA CMPs to save energy on operating systems. Micro, 28(3):26–41, 2008. Google ScholarDigital Library
- I. Molnar. Modular Scheduler Core and Completely Fair Scheduler {CFS}. http://lwn.net/Articles/230501/, Apr. 2007.Google Scholar
- Qualcomm. Snapdragon 810 processors, 2014. URL https://www. qualcomm.com/products/snapdragon/processors/810.Google Scholar
- J. C. Saez, D. Shelepov, A. Fedorova, and M. Prieto. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. JPDC, 71(1):114– 131, 2011. Google ScholarDigital Library
- J. C. Saez, A. Fedorova, D. Koufaty, and M. Prieto. Leveraging core specialization via OS scheduling to improve performance on asymmetric multicore systems. ACM TCM, 30(2):6:1–38, 2012. Google ScholarDigital Library
- M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. In ASPLOS, pages 253–264, 2009. Google ScholarDigital Library
- The Jikes RVM Research Group. Jikes Open-Source Research Virtual Machine, 2011. URL http://www.jikesrvm.org.Google Scholar
Index Terms
- Portable performance on asymmetric multicore processors
Recommendations
Analysis of computing and energy performance of multicore, NUMA, and manycore platforms for an irregular application
IA3 '13: Proceedings of the 3rd Workshop on Irregular Applications: Architectures and AlgorithmsThe exponential growth in processor performance seems to have reached a turning point. Nowadays, energy efficiency is as important as performance and has become a critical aspect to the development of scalable systems. These strict energy constraints ...
Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors
Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for ...
The Impact of Dynamically Heterogeneous Multicore Processors on Thread Scheduling
Although most current multicore processors are homogeneous, microarchitects are now proposing heterogeneous core implementations, including systems in which heterogeneity is introduced at runtime. This article shows that operating system schedulers must ...
Comments