research-article

Backtracking-based load balancing

Authors:

Tasuku Hiraishi,

Masahiro Yasugi,

Taiichi YuasaAuthors Info & Claims

ACM SIGPLAN Notices, Volume 44, Issue 4

Pages 55 - 64

https://doi.org/10.1145/1594835.1504187

Published: 14 February 2009 Publication History

Abstract

High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. The worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.

References

[1]

Thomas M. Breuel. Lexical closures for C++. In Usenix Proceedings, C++ Conference, 1988.

[2]

Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005. ISSN 0362-1340.

Digital Library

[3]

Marc Feeley. A message passing implementation of lazy task creation. In Proceedings of the International Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications, number 748 in Lecture Notes in Computer Science, pages 94--107. Springer-Verlag, 1993.

Digital Library

[4]

Marc Feeley. Lazy remote procedure call and its implementation in a parallel variant of C. In Proceedings of International Workshop on Parallel Symbolic Languages and Systems, number 1068 in Lecture Notes in Computer Science, pages 3--21. Springer-Verlag, 1995.

Digital Library

[5]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The im-plementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices (PLDI '98), 33(5):212--223, 1998.

Digital Library

[6]

Seth C. Goldstein, Klaus E. Schauser, and David E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 3(1):5--20, August 1996.

Digital Library

[7]

Robert H. Halstead, Jr. New ideas in parallel Lisp: Language design, implementation, and programming tools. In T. Ito and R. H. Halstead, editors, Parallel Lisp: Languages and Systems, volume 441 of Lecture Notes in Computer Science, pages 2--57, Sendai, Japan, June 5-8, 1990. Springer, Berlin.

Digital Library

[8]

Tasuku Hiraishi, Masahiro Yasugi, and Taiichi Yuasa. A transformation-based implementation of lightweight nested functions. IPSJ Digital Courier, 2:262-279, 2006. (IPSJ Transaction on Programming, Vol. 47, No. SIG 6(PRO 29), pp. 50--67.).

[9]

Intel Corporation. Intel Threading Building Block Tutorial, 2007. http://threadingbuildingblocks.org/.

Digital Library

[10]

Richard Kelsey, William Clinger, and Jonathan Rees. Revised 5 report on the algorithmic language Scheme. ACM SIGPLAN Notices, 33(9): 26--76, September 1998.

Digital Library

[11]

B. C. Kuszmaul. Cilk provides the "best overall productivity" for high performance computing:(and won the HPC challenge award to prove it). Proceedings of the nineteenth annual ACM Symposium on Parallel Algorithms and Architectures, pages 299--300, 2007.

Digital Library

[12]

Eric Mohr, David A. Kranz, and Robert H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3): 264--280, July 1991.

Digital Library

[13]

Liang Peng, Weng Fai Wong, Ming Dong Feng, and Chung Kwong Yuen. SilkRoad: A multithreaded runtime system with software distributed shared memory for SMP clusters. In IEEE International Conferrence on Cluster Computing (Cluster2000), pages 243--249, November 2000.

[14]

K. Randall. Cilk: Efficient multithreaded computing. Technical Report MIT/LCS/TR-749, 1998.

Digital Library

[15]

R. M. Stallman. Using and porting GNU Compiler Collection. 1999.

[16]

Volker Strumpen. Indolent closure creation. Technical Report MIT-LCS-TM-580, MIT, June 1998.

Digital Library

[17]

Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual. Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, Massachusetts, USA.

[18]

Kenjiro Taura, Kunio Tabata, and Akinori Yonezawa. Stack-Threads/MP: Integrating futures into calling standards. In Proceedings of ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP'99), pages 60--71, May 1999.

Digital Library

[19]

Seiji Umatani, Masahiro Yasugi, Tsuneyasu Komiya, and Taiichi Yuasa. Pursuing laziness for efficient implementation of modern multithreaded languages. In Proceedings of the Fifth International Symposium on High Performance Computing, number 2858 in Lecture Notes in Computer Science, pages 174--188, October 2003.

[20]

Rob V. van Nieuwpoort, Thilo Kielmann, and Henri E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In In Proceedings of the eighth ACM SIGPLAN symposium on Principles and Practices of Parallel Programming (PPoPP'01), pages 34--43, New York, NY, USA, 2001. ACM. ISBN 1-58113-346-4.

Digital Library

[21]

Mark T. Vandevoorde and Eric S. Roberts. WorkCrews: An abstraction for controlling parallelism. International Journal of Parallel Program-ming, 17(4):347--366, 1988.

Digital Library

[22]

David B. Wagner and Bradley G. Calder. Leapfrogging: A portable technique for implementing efficient futures. In Proceedings of Principles and Practice of Parallel Programming (PPoPP'93), pages 208--217, 1993.

Digital Library

[23]

Masahiro Yasugi, Tasuku Hiraishi, and Taiichi Yuasa. Lightweight lexical closures for legitimate execution stack access. In Proceedings of the Fifteenth International Conference on Compiler Construction (CC2006), number 3923 in Lecture Notes in Computer Science, pages 170--184. Springer-Verlag, 2006.

Digital Library

Cited By

Westrick SFluet MRainey MAcar U(2024)Automatic Parallelism ManagementProceedings of the ACM on Programming Languages10.1145/36328808:POPL(1118-1149)Online publication date: 5-Jan-2024
https://doi.org/10.1145/3632880
Tao T(2023)Synchronization Efficient Scheduling of Fine-grained Irregular Programs2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP59025.2023.00044(235-243)Online publication date: Mar-2023
https://doi.org/10.1109/PDP59025.2023.00044
Bai ZHiraishi TIda AYasugi M(2022)Parallelization of Matrix Partitioning in Hierarchical Matrix Construction on Distributed Memory SystemsJournal of Information Processing10.2197/ipsjjip.30.74230(742-754)Online publication date: 2022
https://doi.org/10.2197/ipsjjip.30.742
Show More Cited By

Index Terms

Backtracking-based load balancing
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures

Recommendations

Backtracking-based load balancing
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, ...
Study on Dynamic Load Balancing Algorithm Based on MPICH
WCSE '09: Proceedings of the 2009 WRI World Congress on Software Engineering - Volume 01

MPICH is the most important parallel programming tool in cluster currently. It implements communication in parallel program by message. Implementing load balance in MPI parallel program is very important. It may reduce running time and improve ...
Token-based adaptive load balancing for dynamically parallel computations on multicomputer platforms
CompSysTech '07: Proceedings of the 2007 international conference on Computer systems and technologies

The paper suggests an algorithm for token-based adaptive load balancing for dynamically parallel computations on multicomputer platforms. The proposed algorithm for load balance is initiated and performed by the idle or under-loaded processes and ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 44, Issue 4

PPoPP '09

April 2009

294 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1594835

Issue’s Table of Contents

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2009
322 pages
ISBN:9781605583976
DOI:10.1145/1504176
General Chair:
Daniel Reed
Microsoft Research, USA
,
Program Chair:
Vivek Sarkar
Rice University, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009

Published in SIGPLAN Volume 44, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

51
Total Citations
View Citations
781
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)3

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Westrick SFluet MRainey MAcar U(2024)Automatic Parallelism ManagementProceedings of the ACM on Programming Languages10.1145/36328808:POPL(1118-1149)Online publication date: 5-Jan-2024
https://doi.org/10.1145/3632880
Tao T(2023)Synchronization Efficient Scheduling of Fine-grained Irregular Programs2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP59025.2023.00044(235-243)Online publication date: Mar-2023
https://doi.org/10.1109/PDP59025.2023.00044
Bai ZHiraishi TIda AYasugi M(2022)Parallelization of Matrix Partitioning in Hierarchical Matrix Construction on Distributed Memory SystemsJournal of Information Processing10.2197/ipsjjip.30.74230(742-754)Online publication date: 2022
https://doi.org/10.2197/ipsjjip.30.742
Nakashima RYasugi MYoritaka HHiraishi TUmatani S(2021)Work-stealing Strategies That Consider Work Amount and HierarchyJournal of Information Processing10.2197/ipsjjip.29.47829(478-489)Online publication date: 2021
https://doi.org/10.2197/ipsjjip.29.478
Abe THiraishi T(2021)An Extensionally Equivalence-ensured Language for Task Parallel Processing with Backtracking-based Load BalancingJournal of Information Processing10.2197/ipsjjip.29.43429(434-448)Online publication date: 2021
https://doi.org/10.2197/ipsjjip.29.434
Rainey MNewton RHale KHardavellas NCampanoni SDinda PAcar UFreund SYahav E(2021)Task parallel assembly language for uncompromising parallelismProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3460969(1064-1079)Online publication date: 19-Jun-2021
https://dl.acm.org/doi/10.1145/3453483.3460969
Ouyang KSi MHori AChen ZBalaji PCuicchi CQualters IKramer W(2020)CAB-MPIProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433748(1-15)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433748
Iwasaki SAmer ATaura KBalaji P(2020)Analyzing the Performance Trade-Off in Implementing User-Level ThreadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.297605731:8(1859-1877)Online publication date: 1-Aug-2020
https://doi.org/10.1109/TPDS.2020.2976057
Ouyang KSi MHori AChen ZBalaji P(2020)CAB-MPI: Exploring Interprocess Work-Stealing towards Balanced MPI CommunicationSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00040(1-15)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00040
Nakashima RYoritaka HYasugi MHiraishi TUmatani S(2019)Extending a Work-Stealing Framework with Priorities and Weights2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3)10.1109/IA349570.2019.00008(9-16)Online publication date: Nov-2019
https://doi.org/10.1109/IA349570.2019.00008
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents