skip to main content
research-article

Helper locks for fork-join parallel programming

Published: 09 January 2010 Publication History

Abstract

Helper locks allow programs with large parallel critical sections, called parallel regions, to execute more efficiently by enlisting processors that might otherwise be waiting on the helper lock to aid in the execution of the parallel region. Suppose that a processor p is executing a parallel region A after having acquired the lock L protecting A. If another processor p′ tries to acquire L, then instead of blocking and waiting for p to complete A, processor p′ joins p to help it complete A. Additional processors not blocked on L may also help to execute A.
The HELPER runtime system can execute fork-join computations augmented with helper locks and parallel regions. HELPER supports the unbounded nesting of parallel regions. We provide theoretical completion-time and space-usage bounds for a design of HELPER based on work stealing. Specifically, let V be the number of parallel regions in a computation, let T1 be its work, and let T∞ be its "aggregate span" --- the sum of the spans (critical-path lengths) of all its parallel regions. We prove that HELPER completes the computation in expected time O(T1/PP + T∞+ PV on P processors. This bound indicates that programs with a small number of highly parallel critical sections can attain linear speedup. For the space bound, we prove that HELPER completes a program using only O(PS1 stack space, where S1 is the sum, over all regions, of the stack space used by each region in a serial execution. Finally, we describe a prototype of HELPER implemented by modifying the Cilk multithreaded runtime system. We used this prototype to implement a concurrent hash table with a resize operation protected by a helper lock.

References

[1]
E. Allen, D. Chase, J. Hallett, V. Luchango, J.-W. Maessen, S. Ryu, G. L. S. Jr., and S. Tobin-Hochstadt. The Fortress language specification, version 1.0. Technical report, Sun Microsystems, Inc., March 2008. URL http://research.sun.com/projects/plrg/Publications/fortress.1.0.pdf.
[2]
N. S. Arora, R. D. Blumofe, and C. G. Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures, pages 119--129, Puerto Vallarta, Mexico, 1998.
[3]
G. Barnes. A method for implementing lock-free shared-data structures. In SPAA '93: Proceedings of the Fifth Annual ACM Symposium on Parallel Algorithms and Architectures, pages 261--270, New York, NY, USA, 1993. ACM. ISBN 0-89791-599-2.
[4]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5):720--748, 1999.
[5]
K. Ebcioglu, V. Saraswat, and V. Sarkar. X10: an experimental language for high productivity programming of scalable systems. In Proceedings of the Second Workshop on Productivity and Performance in High-End Computing (PPHEC-05), Feb. 2005. Held in conjunction with the Eleventh Symposium on High Performance Computer Architecture.
[6]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998.
[7]
M. Herlihy, N. Shavit, and M. Tzafrir. Hopscotch hashing. In DISC '08: Proceedings of the 22nd International Symposium on Distributed Computing, pages 350--364, Berlin, Heidelberg, 2008. Springer-Verlag.
[8]
A. Israeli and L. Rappoport. Disjoint-access-parallel implementations of strong shared memory primitives. In PODC '94: Proceedings of the Thirteenth Annual ACM Symposium on Principles of Distributed Computing, pages 151--160, New York, NY, USA, 1994. ACM.
[9]
C. E. Leiserson. The Cilk++ concurrency platform. In DAC '09: Proceedings of the 46th Annual Design Automation Conference, pages 522--527, New York, NY, 2009. ACM.
[10]
S.-C. Lim, J. Ahn, and M. H. Kim. A concurrent Blink-tree algorithm using a cooperative locking protocol. In Lecture Notes in Computer Science, volume 2712, pages 253--260. Springer Berlin / Heidelberg, 2003.
[11]
OpenMP Architecture Review Board. OpenMP application program interface, version 3.0. http://www.openmp.org/mp-documents/spec30.pdf, May 2008.
[12]
J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly, 2007.
[13]
J. Turek, D. Shasha, and S. Prakash. Locking without blocking: making lock based concurrent data structure algorithms nonblocking. In PODS '92: Proceedings of the Eleventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 212--222, New York, NY, USA, 1992. ACM.

Cited By

View all
  • (2022)Early scheduling on steroids: Boosting parallel state machine replicationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.001163(269-282)Online publication date: May-2022
  • (2021)Atomic power in forksProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458192(2141-2153)Online publication date: 10-Jan-2021
  • (2023)Co-Located Parallel Scheduling of Threads to Optimize Cache Sharing2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00030(251-264)Online publication date: 5-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 45, Issue 5
PPoPP '10
May 2010
346 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1837853
Issue’s Table of Contents
  • cover image ACM Conferences
    PPoPP '10: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
    January 2010
    372 pages
    ISBN:9781605588773
    DOI:10.1145/1693453
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 January 2010
Published in SIGPLAN Volume 45, Issue 5

Check for updates

Author Tags

  1. cilk
  2. dynamic multithreading
  3. helper lock
  4. nested parallelism
  5. parallel region
  6. scheduling
  7. work stealing

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Early scheduling on steroids: Boosting parallel state machine replicationJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.02.001163(269-282)Online publication date: May-2022
  • (2021)Atomic power in forksProceedings of the Thirty-Second Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3458064.3458192(2141-2153)Online publication date: 10-Jan-2021
  • (2023)Co-Located Parallel Scheduling of Threads to Optimize Cache Sharing2023 IEEE Real-Time Systems Symposium (RTSS)10.1109/RTSS59052.2023.00030(251-264)Online publication date: 5-Dec-2023
  • (2018)Scheduling Parallel Computations by Work StealingInternational Journal of Parallel Programming10.1007/s10766-016-0484-846:2(173-197)Online publication date: 1-Apr-2018
  • (2017)Adaptive lock-free data structures in Haskell: a general method for concurrent implementation swappingACM SIGPLAN Notices10.1145/3156695.312297352:10(197-211)Online publication date: 7-Sep-2017
  • (2017)Adaptive lock-free data structures in Haskell: a general method for concurrent implementation swappingProceedings of the 10th ACM SIGPLAN International Symposium on Haskell10.1145/3122955.3122973(197-211)Online publication date: 7-Sep-2017
  • (2015)Notified AccessProceedings of the 2015 IEEE International Parallel and Distributed Processing Symposium10.1109/IPDPS.2015.30(871-881)Online publication date: 25-May-2015
  • (2014)Provably good scheduling for parallel programs that use data structures through implicit batchingProceedings of the 26th ACM symposium on Parallelism in algorithms and architectures10.1145/2612669.2612688(84-95)Online publication date: 23-Jun-2014
  • (2011)CABProceedings of the 2011 International Conference on Parallel Processing10.1109/ICPP.2011.32(722-732)Online publication date: 13-Sep-2011
  • (2010)Brief announcementProceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures10.1145/1810479.1810517(186-188)Online publication date: 13-Jun-2010

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media