skip to main content
research-article

Backtracking-based load balancing

Published: 14 February 2009 Publication History

Abstract

High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. The worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.

References

[1]
Thomas M. Breuel. Lexical closures for C++. In Usenix Proceedings, C++ Conference, 1988.
[2]
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005. ISSN 0362-1340.
[3]
Marc Feeley. A message passing implementation of lazy task creation. In Proceedings of the International Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications, number 748 in Lecture Notes in Computer Science, pages 94--107. Springer-Verlag, 1993.
[4]
Marc Feeley. Lazy remote procedure call and its implementation in a parallel variant of C. In Proceedings of International Workshop on Parallel Symbolic Languages and Systems, number 1068 in Lecture Notes in Computer Science, pages 3--21. Springer-Verlag, 1995.
[5]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The im-plementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices (PLDI '98), 33(5):212--223, 1998.
[6]
Seth C. Goldstein, Klaus E. Schauser, and David E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 3(1):5--20, August 1996.
[7]
Robert H. Halstead, Jr. New ideas in parallel Lisp: Language design, implementation, and programming tools. In T. Ito and R. H. Halstead, editors, Parallel Lisp: Languages and Systems, volume 441 of Lecture Notes in Computer Science, pages 2--57, Sendai, Japan, June 5-8, 1990. Springer, Berlin.
[8]
Tasuku Hiraishi, Masahiro Yasugi, and Taiichi Yuasa. A transformation-based implementation of lightweight nested functions. IPSJ Digital Courier, 2:262-279, 2006. (IPSJ Transaction on Programming, Vol. 47, No. SIG 6(PRO 29), pp. 50--67.).
[9]
Intel Corporation. Intel Threading Building Block Tutorial, 2007. http://threadingbuildingblocks.org/.
[10]
Richard Kelsey, William Clinger, and Jonathan Rees. Revised 5 report on the algorithmic language Scheme. ACM SIGPLAN Notices, 33(9): 26--76, September 1998.
[11]
B. C. Kuszmaul. Cilk provides the "best overall productivity" for high performance computing:(and won the HPC challenge award to prove it). Proceedings of the nineteenth annual ACM Symposium on Parallel Algorithms and Architectures, pages 299--300, 2007.
[12]
Eric Mohr, David A. Kranz, and Robert H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3): 264--280, July 1991.
[13]
Liang Peng, Weng Fai Wong, Ming Dong Feng, and Chung Kwong Yuen. SilkRoad: A multithreaded runtime system with software distributed shared memory for SMP clusters. In IEEE International Conferrence on Cluster Computing (Cluster2000), pages 243--249, November 2000.
[14]
K. Randall. Cilk: Efficient multithreaded computing. Technical Report MIT/LCS/TR-749, 1998.
[15]
R. M. Stallman. Using and porting GNU Compiler Collection. 1999.
[16]
Volker Strumpen. Indolent closure creation. Technical Report MIT-LCS-TM-580, MIT, June 1998.
[17]
Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual. Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, Massachusetts, USA.
[18]
Kenjiro Taura, Kunio Tabata, and Akinori Yonezawa. Stack-Threads/MP: Integrating futures into calling standards. In Proceedings of ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP'99), pages 60--71, May 1999.
[19]
Seiji Umatani, Masahiro Yasugi, Tsuneyasu Komiya, and Taiichi Yuasa. Pursuing laziness for efficient implementation of modern multithreaded languages. In Proceedings of the Fifth International Symposium on High Performance Computing, number 2858 in Lecture Notes in Computer Science, pages 174--188, October 2003.
[20]
Rob V. van Nieuwpoort, Thilo Kielmann, and Henri E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In In Proceedings of the eighth ACM SIGPLAN symposium on Principles and Practices of Parallel Programming (PPoPP'01), pages 34--43, New York, NY, USA, 2001. ACM. ISBN 1-58113-346-4.
[21]
Mark T. Vandevoorde and Eric S. Roberts. WorkCrews: An abstraction for controlling parallelism. International Journal of Parallel Program-ming, 17(4):347--366, 1988.
[22]
David B. Wagner and Bradley G. Calder. Leapfrogging: A portable technique for implementing efficient futures. In Proceedings of Principles and Practice of Parallel Programming (PPoPP'93), pages 208--217, 1993.
[23]
Masahiro Yasugi, Tasuku Hiraishi, and Taiichi Yuasa. Lightweight lexical closures for legitimate execution stack access. In Proceedings of the Fifteenth International Conference on Compiler Construction (CC2006), number 3923 in Lecture Notes in Computer Science, pages 170--184. Springer-Verlag, 2006.

Cited By

View all
  • (2024)Automatic Parallelism ManagementProceedings of the ACM on Programming Languages10.1145/36328808:POPL(1118-1149)Online publication date: 5-Jan-2024
  • (2023)Synchronization Efficient Scheduling of Fine-grained Irregular Programs2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP59025.2023.00044(235-243)Online publication date: Mar-2023
  • (2022)Parallelization of Matrix Partitioning in Hierarchical Matrix Construction on Distributed Memory SystemsJournal of Information Processing10.2197/ipsjjip.30.74230(742-754)Online publication date: 2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 44, Issue 4
PPoPP '09
April 2009
294 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1594835
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
    February 2009
    322 pages
    ISBN:9781605583976
    DOI:10.1145/1504176
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009
Published in SIGPLAN Volume 44, Issue 4

Check for updates

Author Tags

  1. backtrack search
  2. backtracking
  3. load balancing
  4. parallel computing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)3
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automatic Parallelism ManagementProceedings of the ACM on Programming Languages10.1145/36328808:POPL(1118-1149)Online publication date: 5-Jan-2024
  • (2023)Synchronization Efficient Scheduling of Fine-grained Irregular Programs2023 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP59025.2023.00044(235-243)Online publication date: Mar-2023
  • (2022)Parallelization of Matrix Partitioning in Hierarchical Matrix Construction on Distributed Memory SystemsJournal of Information Processing10.2197/ipsjjip.30.74230(742-754)Online publication date: 2022
  • (2021)Work-stealing Strategies That Consider Work Amount and HierarchyJournal of Information Processing10.2197/ipsjjip.29.47829(478-489)Online publication date: 2021
  • (2021)An Extensionally Equivalence-ensured Language for Task Parallel Processing with Backtracking-based Load BalancingJournal of Information Processing10.2197/ipsjjip.29.43429(434-448)Online publication date: 2021
  • (2021)Task parallel assembly language for uncompromising parallelismProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3460969(1064-1079)Online publication date: 19-Jun-2021
  • (2020)CAB-MPIProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433748(1-15)Online publication date: 9-Nov-2020
  • (2020)Analyzing the Performance Trade-Off in Implementing User-Level ThreadsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.297605731:8(1859-1877)Online publication date: 1-Aug-2020
  • (2020)CAB-MPI: Exploring Interprocess Work-Stealing towards Balanced MPI CommunicationSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00040(1-15)Online publication date: Nov-2020
  • (2019)Extending a Work-Stealing Framework with Priorities and Weights2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3)10.1109/IA349570.2019.00008(9-16)Online publication date: Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media