Abstract
Asymmetric multicore processors (AMPs) consist of cores with the same ISA (instruction-set architecture), but different microarchitectural features, speed, and power consumption. Because cores with more complex features and higher speed typically use more area and consume more energy relative to simpler and slower cores, we must use these cores for running applications that experience significant performance improvements from using those features. Having cores of different types in a single system allows optimizing the performance/energy trade-off. To deliver this potential to unmodified applications, the OS scheduler must map threads to cores in consideration of the properties of both. Our work describes a Comprehensive scheduler for Asymmetric Multicore Processors (CAMP) that addresses shortcomings of previous asymmetry-aware schedulers. First, previous schedulers catered to only one kind of workload properties that are crucial for scheduling on AMPs; either efficiency or thread-level parallelism (TLP), but not both. CAMP overcomes this limitation showing how using both efficiency and TLP in synergy in a single scheduling algorithm can improve performance. Second, most existing schedulers relying on models for estimating how much faster a thread executes on a “fast” vs. “slow” core (i.e., the speedup factor) were specifically designed for AMP systems where cores differ only in clock frequency. However, more realistic AMP systems include cores that differ more significantly in their features. To demonstrate the effectiveness of CAMP on more realistic scenarios, we augmented the CAMP scheduler with a model that predicts the speedup factor on a real AMP prototype that closely matches future asymmetric systems.
- Annavaram, M., Grochowski, E., and Shen, J. 2005. Mitigating Amdahl’s law through EPI throttling. In Proceedings of the International Symposium on Computer Architecture (ISCA’05). 298--309. Google ScholarDigital Library
- ARM. 2011. Big.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7. White paper, http://www.arm.com/files/downloads/big_LITTLE_Final_Final.pdf.Google Scholar
- Balakrishnan, S., Rajwar, R., Upton, M., and Lai, K. 2005. The impact of performance asymmetry in emerging multicore architectures. SIGARCH Comput. Architect. News 33, 2, 506--517. Google ScholarDigital Library
- Becchi, M. and Crowley, P. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. In Proceedings of the International Conference on Computing Frontiers (CF’06). 29--40. Google ScholarDigital Library
- Blagodurov, S., Zhuravlev, S., and Fedorova, A. 2010. Contention-aware scheduling on multicore systems. ACM Trans. Comput. Syst. 28, 8:1--8:45. Google ScholarDigital Library
- Constantinou, T., Sazeides, Y., Michaud, P., Fetis, D., and Seznec, A. 2005. Performance implications of single thread migration on a chip multi-core. SIGARCH Comput. Architect. News 33, 80--91. Google ScholarDigital Library
- Freeh, V. W., Lowenthal, D. K., Pan, F., Kappiah, N., Springer, R., and Rountree, B. L. 2007. Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans. Parall. Distrib. Syst. 18, 6, 835--848. Google ScholarDigital Library
- Friedman, J. H. 1999. Stochastic gradient boosting. www-stat.stanford.edu~jhf/ftp/stobst/pdf.Google Scholar
- Gillespie, M. 2008. Preparing for the second stage of multi-core hardware: Asymmetric (heterogeneous) cores. Intel white paper.Google Scholar
- Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. 2009. The WEKA data mining software: An update. SIGKDD Explor. Newsl. 11, 10--18. Google ScholarDigital Library
- Hill, M. D. and Marty, M. R. 2008. Amdahl’s law in the multicore era. IEEE Comput. 41, 7, 33--38. Google ScholarDigital Library
- Koufaty, D., Reddy, D., and Hahn, S. 2010. Bias scheduling in heterogeneous multi-core architectures. In Proceedings of Eurosys’10. Google ScholarDigital Library
- Kumar, R., Farkas, K. I., Jouppi, N., et al. 2003. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’03). Google ScholarDigital Library
- Kumar, R., Tullsen, D. M., Ranganathan, P., et al. 2004. Single-ISA heterogeneous multi-core architectures for multithreaded workload performance. In Proceedings of the International Symposium on Computer Architecture (ISCA’04). Google ScholarDigital Library
- Li, T., Baumberger, D., Koufaty, D., et al. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the Conference on Supercomputing (SC’07). 1--11. Google ScholarDigital Library
- Li, T., Brett, P., Knauerhase, R., Koufaty, D., Reddy, D., and Hahn, S. 2010. Operating system support for overlapping-ISA heterogeneous multicore architectures. In Proceedings of the 16th International Symposium on High Performance Computer Architecture (HPCA’10). 1--12.Google Scholar
- Mogul, J. C., Mudigonda, J., Binkert, N., Ranganathan, P., and Talwar, V. 2008. Using asymmetric single-ISA CMPs to save energy on operating systems. IEEE Micro 28, 3, 26--41. Google ScholarDigital Library
- Morad, T., Weiser, U., and Kolody, A. 2004. ACCMP---Asymmetric cluster chip multi-processing. CCIT Tech. rep #448.Google Scholar
- Saez, J. C., Fedorova, A., Prieto, M., et al. 2010a. A comprehensive scheduler for asymmetric multicore systems. In Proceedings of Eurosys’10. 139--152. Google ScholarDigital Library
- Saez, J. C., Fedorova, A., Prieto, M., et al. 2010b. Operating system support for mitigating software scalability bottlenecks on asymmetric multicore processors. In Proceedings of the International Conference on Computing Frontiers (CF’10). 31--40. Google ScholarDigital Library
- Saez, J. C., Shelepov, D., Fedorova, A., and Prieto, M. 2011. Leveraging workload diversity through OS scheduling to maximize performance on single-ISA heterogeneous multicore systems. J. Parall. Distrib. Comput. 71, 114--131. Google ScholarDigital Library
- Shelepov, D., Saez, J. C., Jeffery, S., et al. 2009. HASS: A scheduler for heterogeneous multicore systems. ACM SIGOPS Op. Syst. Rev. 43, 2, 66--75. Google ScholarDigital Library
- Suleman, M. A., Mutlu, O., Qureshi, M. K., and Patt, Y. N. 2009. Accelerating critical section execution with asymmetric multi-core architectures. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’09). 253--264. Google ScholarDigital Library
- Suleman, M. A., Qureshi, M. K., and Patt, Y. N. 2008. Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. SIGARCH Comput. Architect. News 36, 1, 277--286. Google ScholarDigital Library
- Tam, D., Azimi, R., and Stumm, M. 2007. Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors. In Proceedings of EuroSys’07. 47--58. Google ScholarDigital Library
- van der Pas, R. 2005. The OMPlab on Sun Systems. In Proceedings of the International Workshop on OpenMP (IWOMP’05).Google Scholar
Index Terms
- Leveraging Core Specialization via OS Scheduling to Improve Performance on Asymmetric Multicore Systems
Recommendations
ACFS: a completely fair scheduler for asymmetric single-isa multicore systems
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied ComputingSingle-ISA (instruction set architecture) asymmetric multicore processors (AMPs) were shown to deliver higher performance per watt and area than symmetric CMPs (Chip Multi-Processors) for applications with diverse architectural requirements. A large ...
Portable performance on asymmetric multicore processors
CGO '16: Proceedings of the 2016 International Symposium on Code Generation and OptimizationStatic and dynamic power constraints are steering chip manufacturers to build single-ISA Asymmetric Multicore Processors (AMPs) with big and small cores. To deliver on their energy efficiency potential, schedulers must consider core sensitivity, load ...
A comprehensive scheduler for asymmetric multicore systems
EuroSys '10: Proceedings of the 5th European conference on Computer systemsSymmetric-ISA (instruction set architecture) asymmetric-performance multicore processors were shown to deliver higher performance per watt and area for applications with diverse architectural requirements, and so it is likely that future multicore ...
Comments