ABSTRACT
Current and future parallel programming models need to be portable and efficient when moving to heterogeneous multi-core systems. OmpSs is a task-based programming model with dependency tracking and dynamic scheduling. This paper describes the OmpSs approach on scheduling dependent tasks onto the asymmetric cores of a heterogeneous system. The proposed scheduling policy improves performance by prioritizing the newly-created tasks at runtime, detecting the longest path of the dynamic task dependency graph, and assigning critical tasks to fast cores. While previous works use profiling information and are static, this dynamic scheduling approach uses information that is discoverable at runtime which makes it implementable and functional without the need of an oracle or profiling. The evaluation results show that our proposal outperforms a dynamic implementation of Heterogeneous Earliest Finish Time by up to 1.15x, and the default breadth-first OmpSs scheduler by up to 1.3x in an 8-core heterogeneous platform and up to 2.7x in a simulated 128-core chip.
- T. L. Adam, K. M. Chandy, and J. R. Dickson. A Comparison of List Schedules for Parallel Processing Systems. Commun. ACM, 17(12), 1974. Google ScholarDigital Library
- A. Agarwal and P. Kumar. Economical Duplication Based Task Scheduling for Heterogeneous and Homogeneous Computing Systems. IACC 2009, 2009.Google Scholar
- C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurr. Comput. : Pract. Exper., 23(2), 2011. Google ScholarDigital Library
- E. Ayguadé, R. Badia, P. Bellens, D. Cabrera, A. Duran, R. Ferrer, M. Gonzàlez, F. Igual, D. Jiménez-González, J. Labarta, L. Martinell, X. Martorell, R. Mayo, J. Pérez, J. Planas, and E. Quintana-Ortí. Extending OpenMP to Survive the Heterogeneous Multi-Core Era. International Journal of Parallel Programming, 38(5--6), 2010.Google Scholar
- S. Bansal, P. Kumar, and K. Singh. An Improved Duplication Strategy for Scheduling Precedence Constrained Graphs in Multiprocessor Systems. Parallel and Distributed Systems, IEEE Transactions on, 14(6), 2003. Google ScholarDigital Library
- Barcelona Supercomputing Center. Barcelona Application Repository. Available online on April 18th, 2014.Google Scholar
- P. Bellens, K. Palaniappan, R. Badia, G. Seetharaman, and J. Labarta. Parallel Implementation of the Integral Histogram. In Advanced Concepts for Intelligent Vision Systems, volume 6915 of Lecture Notes in Computer Science. 2011. Google ScholarDigital Library
- A. Buttari, J. Langou, J. Kurzak, and J. Dongarra. Parallel tiled QR factorization for multicore architectures. Technical report, 2007.Google Scholar
- M. Daoud and N. Kharma. Efficient Compile-Time Task Scheduling for Heterogeneous Distributed Computing Systems. ICPADS 2006, 2006. Google ScholarDigital Library
- A. Duran, E. Ayguadé, R. M. Badia, J. Labarta, L. Martinell, X. Martorell, and J. Planas. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters, 21, 2011.Google Scholar
- A. Duran, J. Corbalán, and E. Ayguadé. Evaluation of OpenMP Task Scheduling Strategies. IWOMP'08, 2008. Google ScholarDigital Library
- A. Duran, J. M. Perez, E. Ayguadé, R. M. Badia, and J. Labarta. Extending the OpenMP Tasking Model to Allow Dependent Tasks. IWOMP'08, 2008. Google ScholarDigital Library
- A. Fedorova, J. C. Saez, D. Shelepov, and M. Prieto. Communications of the ACM, (12).Google Scholar
- P. Greenhalgh. big.LITTLE Processing with ARM Cortex-A15 & Cortex-A7. ARM White Paper, 2011.Google Scholar
- M. Hakem and F. Butelle. Dynamic Critical Path Scheduling Parallel Programs onto Multiprocessors. IPDPS'05, 2005. Google ScholarDigital Library
- Intel Corporation. Reference Manual for Intel Math Kernel Library 11.1 .Google Scholar
- M. A. Iverson, F. Özgüner, and G. J. Follen. Parallelizing Existing Applications in a Distributed Heterogeneous Environment. HCW'95, 1995.Google Scholar
- P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carson, W. Dally, M. Denneau, P. Franzon, W. Harrod, K. Hill, and Others. Exascale Computing Study: Technology Challenges in Achieving Exascale Systems. Technical report, University of Notre Dame, CSE Dept., 2008.Google Scholar
- K. Li, Z. Zhang, Y. Xu, B. Gao, and L. He. Chemical Reaction Optimization for Heterogeneous Computing Environments. ISPA, 2012.Google ScholarDigital Library
- C.-H. Liu, C.-F. Li, K.-C. Lai, and C.-C. Wu. A dynamic Critical Path Duplication Task Scheduling Algorithm for Distributed Heterogeneous Computing Systems. volume 1 of ICPADS 2006, 2006. Google ScholarDigital Library
- A. Page and T. Naughton. Dynamic Task Scheduling using Genetic Algorithms for Heterogeneous Distributed Computing. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, 2005. Google ScholarDigital Library
- J. A. Pienaar, S. Chakradhar, and A. Raghunathan. Automatic Generation of Software Pipelines for Heterogeneous Parallel Systems. SC '12, 2012. Google ScholarDigital Library
- J. Planas, R. Badia, E. Ayguade, and J. Labarta. Self-Adaptive OmpSs Tasks in Heterogeneous Environments. IPDPS, 2013. Google ScholarDigital Library
- A. Rico, F. Cabarcas, C. Villavieja, M. Pavlovic, A. Vega, Y. Etsion, A. Ramirez, and M. Valero. On the Simulation of Large-Scale Architectures Using Multiple Application Abstraction Levels. ACM Trans. Archit. Code Optim., 8(4). Google ScholarDigital Library
- H. Topcuoglu, S. Hariri, and M.-Y. Wu. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Transactions on Parallel and Distributed Systems, 13(3), 2002. Google ScholarDigital Library
- M.-Y. Wu and D. Gajski. Hypertool: a Programming Aid for Message-Passing Systems. Parallel and Distributed Systems, IEEE Transactions on, 1(3), 1990. Google ScholarDigital Library
- T. Yang and A. Gerasoulis. DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors. Parallel and Distributed Systems, IEEE Transactions on, 5(9), 1994. Google ScholarDigital Library
- H. Yu. A Hybrid GA-based Scheduling Algorithm for Heterogeneous Computing Environments. SCIS'07, 2007.Google ScholarCross Ref
- Z. Zong, A. Manzanares, X. Ruan, and X. Qin. EAD and PEBD: Two Energy-Aware Duplication Scheduling Algorithms for Parallel Tasks on Homogeneous Clusters. Computers, IEEE Transactions on, 60(3), 2011. Google ScholarDigital Library
Index Terms
- Criticality-Aware Dynamic Task Scheduling for Heterogeneous Architectures
Recommendations
Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processors
Asymmetric multicore processors have recently emerged as an appealing technology for severely energy-constrained environments, especially in mobile appliances where heterogeneity in applications is mainstream. In addition, given the growing interest for ...
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Euro-Par 2009In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical ...
Energy aware scheduling model and online heuristics for stencil codes on heterogeneous computing architectures
Performance of high-end supercomputers will reach the exascale through the advent of core counts in billions. However, in the upcoming exascale computing era it is important not only to focus on the performance, but also on scalability of fine-grained ...
Comments