skip to main content
10.1145/2600212.2600223acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

Analysis of dynamic scheduling strategies for matrix multiplication on heterogeneous platforms

Published: 23 June 2014 Publication History

Abstract

The tremendous increase in the size and heterogeneity of supercomputers makes it very difficult to predict the performance of a scheduling algorithm. Therefore, dynamic solutions, where scheduling decisions are made at runtime have overpassed static allocation strategies. The simplicity and efficiency of dynamic schedulers such as Hadoop are a key of the success of the MapReduce framework. Dynamic schedulers such as StarPU, PaRSEC or StarSs are also developed for more constrained computations, e.g. task graphs coming from linear algebra. To make their decisions, these runtime systems make use of some static information, such as the distance of tasks to the critical path or the affinity between tasks and computing resources (CPU, GPU, …) and of dynamic information, such as where input data are actually located. In this paper, we concentrate on two elementary linear algebra kernels, namely the outer product and the matrix multiplication. For each problem, we propose several dynamic strategies that can be used at runtime and we provide an analytic study of their theoretical performance. We prove that the theoretical analysis provides very good estimate of the amount of communications induced by a dynamic strategy and can be used in order to efficiently determine thresholds used in dynamic scheduler, thus enabling to choose among them for a given problem and architecture.

References

[1]
C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2):187--198, 2011.
[2]
O. Beaumont, V. Boudet, F. Rastello, and Y. Robert. Partitioning a square into rectangles: NP-completeness and approximation algorithms. Algorithmica, 34(3):217--239, 2002.
[3]
O. Beaumont, H. Larcheveque, and L. Marchal. Non linear divisible loads: There is no free lunch. In International Parallel and Distributed Processing Symposium, 2012, pages 863--873. IEEE, 2012.
[4]
M. Benaim and J.-Y. Le Boudec. A class of mean field interaction models for computer and communication systems. Performance Evaluation, 65(11):823--838, 2008.
[5]
V. Berten and B. Gaujal. Brokering strategies in computational grids using stochastic prediction models. Parallel Computing, 33(4--5):238--249, 2007.
[6]
G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Herault, and J. J. Dongarra. PaRSEC: A programming paradigm exploiting heterogeneity for enhancing scalability. IEEE Computing in Science and Engineering, to appear. available online at http://www.netlib.org/utk/people/JackDongarra/PAPERS/ieee_cise_submitted_2.pdf.
[7]
G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Lemarinier, and J. Dongarra. DAGuE: A generic distributed DAG engine for high performance computing. Parallel Computing, 38(1):37--51, 2012.
[8]
J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008.
[9]
N. Gast, B. Gaujal, and J.-Y. Le Boudec. Mean field for Markov decision processes: from discrete to continuous optimization. IEEE Transactions on Automatic Control, 57(9):2266--2280, 2012.
[10]
T. Gautier, X. Besseron, and L. Pigeon. Kaapi: A thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, PASCO '07, pages 15--23, New York, NY, USA, 2007. ACM.
[11]
D. Gross, J. F. Shortle, J. M. Thompson, and C. M. Harris. Fundamentals of Queueing Theory, 4th Edition. John Wiley and Sons", 2008.
[12]
B. Kreaseck, L. Carter, H. Casanova, and J. Ferrante. Autonomous protocols for bandwidth-centric scheduling of independent-task applications. In Proceedings of the 17th International Parallel and Distributed Processing Symposium (IPDPS'03). IEEE, 2003.
[13]
M. Mitzenmacher. Analyses of load stealing models based on differential equations. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '98, pages 212--221, New York, NY, USA, 1998. ACM.
[14]
M. Mitzenmacher. How useful is old information? IEEE Trans. Parallel Distrib. Syst., 11(1):6--20, 2000.
[15]
M. Parashar and S. Hariri. Autonomic computing: concepts, infrastructure, and applications. CRC press, 2006.
[16]
J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. Hierarchical task-based programming with StarSs. International Journal of High Performance Computing Applications, 23(3):284--299, 2009.
[17]
H. Topcuoglu, S. Hariri, and M.-y. Wu. Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Transactions on Parallel and Distributed Systems, 13(3):260--274, 2002.

Cited By

View all
  • (2022)Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00073(694-704)Online publication date: May-2022
  • (2016)Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processorsCluster Computing10.1007/s10586-016-0611-819:3(1037-1051)Online publication date: 1-Sep-2016
  • (2015)Comparison of Static and Runtime Resource Allocation Strategies for Matrix MultiplicationProceedings of the 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2015.10(170-177)Online publication date: 17-Oct-2015

Index Terms

  1. Analysis of dynamic scheduling strategies for matrix multiplication on heterogeneous platforms

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    HPDC '14: Proceedings of the 23rd international symposium on High-performance parallel and distributed computing
    June 2014
    334 pages
    ISBN:9781450327497
    DOI:10.1145/2600212
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data-aware algorithms
    2. dynamic scheduling
    3. matrix multiplication
    4. performance evaluation
    5. randomized algorithms

    Qualifiers

    • Research-article

    Conference

    HPDC'14
    Sponsor:

    Acceptance Rates

    HPDC '14 Paper Acceptance Rate 21 of 130 submissions, 16%;
    Overall Acceptance Rate 166 of 966 submissions, 17%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Memory-Aware Scheduling of Tasks Sharing Data on Multiple GPUs with Dynamic Runtime Systems2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS53621.2022.00073(694-704)Online publication date: May-2022
    • (2016)Architecture-aware configuration and scheduling of matrix multiplication on asymmetric multicore processorsCluster Computing10.1007/s10586-016-0611-819:3(1037-1051)Online publication date: 1-Sep-2016
    • (2015)Comparison of Static and Runtime Resource Allocation Strategies for Matrix MultiplicationProceedings of the 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD.2015.10(170-177)Online publication date: 17-Oct-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media