research-article

Idempotent work stealing

Authors:

Maged M. Michael,

Martin T. Vechev,

Vijay A. SaraswatAuthors Info & Claims

ACM SIGPLAN Notices, Volume 44, Issue 4

Pages 45 - 54

https://doi.org/10.1145/1594835.1504186

Published: 14 February 2009 Publication History

Get Access

Abstract

Load balancing is a technique which allows efficient parallelization of irregular workloads, and a key component of many applications and parallelizing runtimes. Work-stealing is a popular technique for implementing load balancing, where each parallel thread maintains its own work set of items and occasionally steals items from the sets of other threads.

The conventional semantics of work stealing guarantee that each inserted task is eventually extracted exactly once. However, correctness of a wide class of applications allows for relaxed semantics, because either: i) the application already explicitly checks that no work is repeated or ii) the application can tolerate repeated work.

In this paper, we introduce idempotent work tealing, and present several new algorithms that exploit the relaxed semantics to deliver better performance. The semantics of the new algorithms guarantee that each inserted task is eventually extracted at least once-instead of exactly once.

On mainstream processors, algorithms for conventional work stealing require special atomic instructions or store-load memory ordering fence instructions in the owner's critical path operations. In general, these instructions are substantially slower than regular memory access instructions. By exploiting the relaxed semantics, our algorithms avoid these instructions in the owner's operations.

We evaluated our algorithms using common graph problems and micro-benchmarks and compared them to well-known conventional work stealing algorithms, the THE Cilk and Chase-Lev algorithms. We found that our best algorithm (with LIFO extraction) outperforms existing algorithms in nearly all cases, and often by significant margins.

References

[1]

Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pages 119--129, June 1998.

Digital Library

Google Scholar

[2]

David A. Bader and Guojing Cong. A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs). Journal of Parallel and Distributed Computing, 65(9):994--1006, September 2005.

Digital Library

Google Scholar

[3]

David A. Bader and Joseph JáJá. SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs). Journal of Parallel and Distributed Computing, 58(1):92--108, July 1999.

Digital Library

Google Scholar

[4]

Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 207--216, October 1995.

Digital Library

Google Scholar

[5]

Philippe Charles, Christian Grothoff, Vijay A. Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the Twentieth Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA, pages 519--538, October 2005.

Digital Library

Google Scholar

[6]

David Chase and Yossi Lev. Dynamic circular work-stealing deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, pages 21--28, July 2005.

Digital Library

Google Scholar

[7]

Christine H. Flood, David Detlefs, Nir Shavit, and Xiaolan Zhang. Parallel garbage collection for shared memory multiprocessors. In Proceedings of the First Java Virtual Machine Research and Technology Symposium, JVM, pages 21--21, April 2001.

Digital Library

Google Scholar

[8]

Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI, pages 212--223, June 1998.

Digital Library

Google Scholar

[9]

John Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pages 16--25, June 1994.

Digital Library

Google Scholar

[10]

Danny Hendler, Yossi Lev, Mark Moir, and Nir Shavit. A dynamic-sized nonblocking work stealing deque. Distributed Computing, 18(3):189--207, 2006.

Digital Library

Google Scholar

[11]

Danny Hendler and Nir Shavit. Non-blocking steal-half work queues. In Proceedings Twenty-First Annual ACM Symposium on Principles of Distributed Computing, PODC, pages 280--289, July 2002.

Digital Library

Google Scholar

[12]

IBM System/370 Extended Architecture, Principles of Operation, 1983. Publication No. SA22-7085.

Google Scholar

[13]

Doug Lea. The JSR-133 Cookbook for Compiler Writers. Web page.

Google Scholar

[14]

Maged M. Michael. Practical lock-free and wait-free LL/SC/VL implementations using 64-bit CAS. In Proceedings of the Eighteenth International Conference on Distributed Computing, DISC, pages 144--158, October 2004.

Crossref

Google Scholar

Cited By

View all

Silva HCastro MSilva FMelo A(2024)A Framework for Automated Parallel Execution of Scientific Multi-workflow Applications in the Cloud with Work StealingEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_21(298-311)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69583-4_21
Wang CPetri GLv YLong TLiu Z(2022)Decidability of Liveness for Concurrent Objects on the TSO Memory ModelDependable Software Engineering. Theories, Tools, and Applications10.1007/978-3-031-21213-0_10(149-165)Online publication date: 11-Dec-2022
https://doi.org/10.1007/978-3-031-21213-0_10
Frey DGuillou LRaynal MTaïani F(2021)Consensus-Free Ledgers When Operations of Distinct Processes are CommutativeParallel Computing Technologies10.1007/978-3-030-86359-3_27(359-370)Online publication date: 7-Sep-2021
https://doi.org/10.1007/978-3-030-86359-3_27
Show More Cited By

Index Terms

Idempotent work stealing
1. Computing methodologies
  1. Concurrent computing methodologies
    1. Concurrent programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language features
        Concurrent programming structures
      2. Language types
        Concurrent programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Process management
        Concurrency control
        Multithreading
        Process synchronization
        Scheduling

Recommendations

Idempotent work stealing
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Load balancing is a technique which allows efficient parallelization of irregular workloads, and a key component of many applications and parallelizing runtimes. Work-stealing is a popular technique for implementing load balancing, where each parallel ...
Dynamic circular work-stealing deque
SPAA '05: Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (henceforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both industry and academia. This highly efficient scheme is based ...
Fence-free work stealing on bounded TSO processors
ASPLOS '14

Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when ...

Comments

Information & Contributors

Information

Published In

ACM SIGPLAN Notices Volume 44, Issue 4

PPoPP '09

April 2009

294 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/1594835

Issue’s Table of Contents

PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2009
322 pages
ISBN:9781605583976
DOI:10.1145/1504176
General Chair:
Daniel Reed
Microsoft Research, USA
,
Program Chair:
Vivek Sarkar
Rice University, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 February 2009

Published in SIGPLAN Volume 44, Issue 4

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

107
Total Citations
View Citations
1,219
Total Downloads

Downloads (Last 12 months)29
Downloads (Last 6 weeks)6

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Silva HCastro MSilva FMelo A(2024)A Framework for Automated Parallel Execution of Scientific Multi-workflow Applications in the Cloud with Work StealingEuro-Par 2024: Parallel Processing10.1007/978-3-031-69583-4_21(298-311)Online publication date: 26-Aug-2024
https://doi.org/10.1007/978-3-031-69583-4_21
Wang CPetri GLv YLong TLiu Z(2022)Decidability of Liveness for Concurrent Objects on the TSO Memory ModelDependable Software Engineering. Theories, Tools, and Applications10.1007/978-3-031-21213-0_10(149-165)Online publication date: 11-Dec-2022
https://doi.org/10.1007/978-3-031-21213-0_10
Frey DGuillou LRaynal MTaïani F(2021)Consensus-Free Ledgers When Operations of Distinct Processes are CommutativeParallel Computing Technologies10.1007/978-3-030-86359-3_27(359-370)Online publication date: 7-Sep-2021
https://doi.org/10.1007/978-3-030-86359-3_27
Mutluergil STasiran S(2019)A mechanized refinement proof of the Chase---Lev deque using a proof systemComputing10.1007/s00607-018-0635-4101:1(59-74)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.1007/s00607-018-0635-4
Horie MHorii HOgata KOnodera T(2018)Balanced double queues for GC work-stealing on weak memory modelsACM SIGPLAN Notices10.1145/3299706.321057053:5(109-119)Online publication date: 18-Jun-2018
https://dl.acm.org/doi/10.1145/3299706.3210570
Horie MHorii HOgata KOnodera TPayer HSartor J(2018)Balanced double queues for GC work-stealing on weak memory modelsProceedings of the 2018 ACM SIGPLAN International Symposium on Memory Management10.1145/3210563.3210570(109-119)Online publication date: 18-Jun-2018
https://dl.acm.org/doi/10.1145/3210563.3210570
Popovic MKordic BPopovic MBasicevic I(2018)A Solution of Concurrent Queue on PSTM2018 26th Telecommunications Forum (TELFOR)10.1109/TELFOR.2018.8611903(1-4)Online publication date: Nov-2018
https://doi.org/10.1109/TELFOR.2018.8611903
Yang JHe Q(2018)Scheduling Parallel Computations by Work StealingInternational Journal of Parallel Programming10.1007/s10766-016-0484-846:2(173-197)Online publication date: 1-Apr-2018
https://dl.acm.org/doi/10.1007/s10766-016-0484-8
Alglave JKroening DNimal VPoetzl D(2017)Don’t Sit on the FenceACM Transactions on Programming Languages and Systems10.1145/299459339:2(1-38)Online publication date: 29-May-2017
https://dl.acm.org/doi/10.1145/2994593
Ribic HLiu Y(2016)AEQUITASProceedings of the 2016 International Conference on Supercomputing10.1145/2925426.2926260(1-12)Online publication date: 1-Jun-2016
https://dl.acm.org/doi/10.1145/2925426.2926260
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations