article

Free Access

Algorithms for scalable synchronization on shared-memory multiprocessors

Authors:
John M. Mellor-Crummey

Rice Univ., Houston, TX

Rice Univ., Houston, TX
View Profile

,
Michael L. Scott

Univ. of Rochester, Rochester, NY

Univ. of Rochester, Rochester, NY
View Profile

Authors Info & Claims

ACM Transactions on Computer Systems Volume 9 Issue 1pp 21–65https://doi.org/10.1145/103727.103729

Published:01 February 1991Publication History

ACM Transactions on Computer Systems

Abstract

Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory.

We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes.

We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problem in large-scale shared-memory multiprocessors. The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —From the Authors' Abstract

References

1 ADVE, S. V., ANn HILL, M. D. Weak ordering--A new definition. In Proceedings of the International Symposium on Computer Architecture (May 1990), 2-14. Google Scholar
2 AGARWAL, A., AND CHEmAN, M. Adaptive backoff synchronization techniques. In Proceedings of the International Symposium on Computer Architecture (May 1989), 396 406. Google Scholar
3 AGARWAL, A., SIMONI, R., HENNESSY, J., AND HOROWITZ, M. An evaluation of directory schemes for cache coherence. In Proceedings of the International Symposium on Computer Architecture (June 1988), 280-289. Google Scholar
4 ALMASI, G. S., ANn GOTTLIEB, A. Highly Parallel Computing. Benjamin/Cummings, Redwood City, Calif., 1989. Google Scholar
5 ANDERSON, T.E. The performance of spin lock alternatives for shared-memory multiprocessors. IEEE Trans. Parallel Distributed Syst. 1, I (Jan. 1990), 6-16. Google Scholar
6 ANDEaSON, T. E., L~ZOWSKA, E. D., AND LEVY, H. M. The performance implications of thread management alternatives for shared-memory multiprocessors IEEE Trans. Comput. 38, 12 (Dec. 1989), 1631-1644. Google Scholar
7 A~CnmALn, J. AND BAER, J.-L. An economical solution to the cache coherence problem. In Proceedings of the International Symposium on Computer Architecture (1984), 355-362. Google Scholar
8 BBN Laboratories. Butterfly parallel processor overview. Tech. Rep. 6148, version 1, BBN Laboratories, Cambridge, Mass, Mar. 1986.Google Scholar
9 BROOKS, E. D., III. The butterfly barrier Int. J Parallel Program. 15, 4 (1986), 295-307. Google Scholar
10 BZOOKS, E. D., III. The shared memory hypercube. Parallel Comput. 6 (1988), 235-245.Google Scholar
11 DAVIS, H, AND HENNESSY, J. Characterizing the synchronization behavior of parallel programs. In Proceedings of the ACM Conference on Parallel Programming: Experzence with Apphcat~ons, Languages and Systems (July 1988), 198-211. Google Scholar
12 DIJKSTRA, E Solution of a problem in concurrent programming and control. Commun. ACM. 8, 9 (Sept. 1965), 569. Google Scholar
13 GHARACHORLOO, K., LENOSKI, D., LAUDON, J., GmBONS, P., GUPTA, A., AND HENNESSY, J. L. Memory consistency and event ordering in scalable shared.memory multiprocessors. In Proceedlngs of the International Symposium on Computer Architecture (May 1990), 15-26. Google Scholar
14 GOODMAN, J.R, VERNON, M. K., AND WOES% P.J. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems (Apr. 1989), 64-75 Google Scholar
15 GOOVMAN, J R., AND WOEST, P. J. The Wisconsin Multlcube: A new large-scale cache coherent multiprocessor. In Proceedings of the International Symposium on Computer Architecture (May 1988), 422-431. Google Scholar
16 GOTTLIEB, A. Scalability, combining and the NYU ultracomputer. Ohio State University Parallel Computing Workshop, Mar. 1990. Invited Lecture.Google Scholar
17 GOTTLIEB, A., GRISHMAN, R., KRUSKAL, C. P., McAuLIFFE, K. P., RUDOLPH, L., AND SNIR, M. The NYU Ultracomputer--Designing an MIMD shared memory parallel computer. IEEE Trans. Comput. C-32, 2 (Feb. 1983), 175-189.Google Scholar
18 GRAUNI<E, G., AND THAKKAR, S. Synchronization algorithms for shared-memory multiprocessors. Computer 23, 6 (June 1990), 60-69 Google Scholar
19 HENSGEN, D , FINKEL, R , AND MANBER, U. Two algorithms for barrier synchroniztion. Int. J. Parallel Program. 17, 1 (1988) 1-17. Google Scholar
20 HERLIHY, M. A methodology for implementing highly concurrent data structures. In Proceed~ngs of the Second A CM Symposium on Principles and Practice of Parallel Programming (Mar 1990), 197-206. Google Scholar
21 JAYASIMHA, D. N. Distributed synchronizers. In Proceedings of the 1988 Internatwnal Conference on Parallel Processing (Aug. 1988), 23-27.Google Scholar
22 KRUSKAL, C.P., RUDOLPH, L., AND SNm, M. Efficient synchronization on multiprocessors with shared memory. In Proceedings of the F~fih ACM Symposium on Principles of Dzstributecl Computing (1986), 218-228. Google Scholar
23 KUMAR, M., AND PFISTER, G.F. The onset of hot spot contention In Proceedings of the 1986 International Conference on Parallel Processing (1986), 28-34.Google Scholar
24 LAMPORT, L. A new solution of Dijkstra's concurrent programming problem. Commun. ACM 17, 8 (Aug. 1974), 453-455. Google Scholar
25 LAMPORT, L. The mutual exclusion problem: Part I--A theory of interprocess communication: Part II--Statement and solutions. J ACM 33, 2 (Apr. 1986), 313-348. Google Scholar
26 LAMPORT, L. A fast mutual exclusion algorithm. ACM Trans. Comput. Syst. 5, 1 (Feb. 1987) 1-11. Google Scholar
27 LEE, C.A. Barrier synchronization over multistage interconnection networks. In Proceedrags of the Second IEEE Symposium on Parallel and Distributed Processing (Dallas, Tex, Dec 1990), 130-133.Google Scholar
28 LEE, J., AND RAMACHANDRAN, U. Synchronization with multiprocessor caches. In Proceedrags of the Internatwnal Symposlum on Computer Architecture (May 1990), 27-37. Google Scholar
29 LUBACHEVSK~, B. An approach to automating the verification of compact parallel coordination programs. I Acta Inf. 21 (1984), 125-169.Google Scholar
30 LUBACnEVSKY, B. Synchronization barrier and related tools for shared memory parallel programming. In Proceedings of the 1989 International Converence on Parallel Processing (Aug 1989), II-175-II-179.Google Scholar
31 MAEKAWA, M. A ~ algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May 1985), 145-159. Google Scholar
32 MELLOR-CRUMMEY, J.M. Concurrent queues: Practical fetch-and-~ algorithms. Tech. Rep. 229, Computer Science Dept., Univ. of Rochester, Nov. 1987.Google Scholar
33 MELLOR-CRUMMEY, J. M., AND SCOTT, M. L. Algorithms for scalable synchronization on shared-memory multiprocessors. Tech. Rep. 342, Computer Science Dept., Univ. of Rochester, Apr. 1990. Also COMP TR90-114, Dept. of Computer Science, Rice Univ., May 1990 Google Scholar
34 OUSTERHOUT, J. K, SCELZA, D. A., AND SINDHU, P. S. Medusa: An experiment in distributed operating system structure. Commun. ACM 23, 2 (Feb. 1980), 92-105. Google Scholar
35 P1596 Working Group of the IEEE Computer Society Microprocessor Standards Committee. SCI (scalable coherent interface): An overview of extended cache-coherence protocols, Feb 5, 1990. Draft 0.59 P1596/Part III-D.Google Scholar
36 PETERSON, G. L. A new solution to Lamport's concurrent programming problem using small shared variables. ACM Trans. Program. Lang. Syst. 5, 1 (Jan. 1983), 56-65. Google Scholar
37 PFISTER, G., BRANTLEY, W. C., GEORGE, D. A., HARVEY, S. L., KLEINFELDER, W. J., McAvLIFFE, K. P., MELTON, T. A., NORTON, V A., AND WEISS, J. The IBM research parallel processor prototype (RP3): Introduction and architecture. In Proceedings of the 1985 International Conference on Parallel Processing (Aug. 1985), 764-771.Google Scholar
38 PFISTER, G., AND NORTON, V. A. "Hot spot" contention and combining in multistage interconnection networks. IEEE Trans. Comput. C-34, 10 (Oct. 1985), 943 948.Google Scholar
39 RAYMOND, K. A tree-based algorithm for distributed mutual exclusion. ACM Trans. Cornput. Syst. 7, i (Feb. 1989), 61-77. Google Scholar
40 RAYNAL, M. Algorithms for Mutual Exclusion. MIT Press Series in Scientific Computation. MIT Press, Cambridge, Mass., 1986. Translated from the French by D. Beeson. Google Scholar
41 REED, D. P., A~D KANODIA, R K. Synchronization with eventcounts and sequencers. Commun. ACM22, 2 (Feb. 1979), 115-123. Google Scholar
42 RETTBERG, R. D., CROWTHER, W. R., CARVEY, P. P., AND TOMHNSON, R. S. The Monarch parallel processor hardware design. Computer 23, 4 (Apr. 1990), 18-30 Google Scholar
43 RICART, G., AND AGRAWALA, A.K. An optimal algorithm for mutual exclusion in computer networks. Commun. ACM. 24, i (Jan. 1981) 9-17. Google Scholar
44 RUDOLPH, L., A~D SEGALL, Z. Dynamic decentralized cache schemes for MIMD parallel processors. In Proceedings of the International Symposium on Computer Architecture (1984), 340-347. Google Scholar
45 SANDERS, B. A. The information structure of distributed mutual exclusion algorithms. ACM Trans. Comput. Syst. 5, 3 (Aug. 1987), 284--299 Google Scholar
46 SCHNEIDER, F.B. Synchronization in distributed programs. ACM Trans. Program. Lang. Syst. 4, 2 (Apr. 1982), 179-195. Google Scholar
47 SCOTT, M. L., LEBLANC, T. J., AND MARSH, B. D. Multi-model parallel programming in Psyche. In Proceedings of the Second ACM Symposium on Principles and Practice of Parallel Programming (Mar. 1990), 70-78. Google Scholar
48 TANG, P., AND YEW, P.-C. Processor self-scheduling for multiple-nested parallel loops. In Proceedtngs of the 1986 International Conference on Parallel Processing (Aug. 1986), 528-535.Google Scholar
49 TANG, P., AND Y~W, P.-C. Algorithms for distributing hot-spot addressing CSRD Rep. 617, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana- Champaign, Jan. 1987.Google Scholar
50 YEW, P.-C. Architecture of the Cedar parallel supercomputer. CSRD Rep 609, Center for Supercomputing Research and Development, Univ. of Illinois at Urbana-Champaign, Aug. 1986.Google Scholar
51 YEW, P.-C., TZENG, N.-F., AND LAWRIE, D.H. Distributing hot-spot addressing in large-scale multiprocessors. IEEE Trans. Comput. C-36, 4 (Apr. 1987), 388-395. Google Scholar
52 ZAHORIAN, J., LAZOWSKA, E. D., AND EAGER, D. L. The effect of scheduling discipline on spin overhead in shared memory parallel processors. Tech. Rep. TR-89-07-03, Computer Science Dept., Univ. of Washington, July 1989.Google Scholar

Index Terms

Recommendations

Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors

Most multiprocessors are multiprogrammed to achieve acceptable response time and to increase their utilization. Unfortunately, inopportune preemption may significantly degrade the performance of synchronized parallel applications. To address this ...
Read More
Speculative Locks for Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors
Read More
A low-latency scalable locking algorithm for shared memory multiprocessors
SPDP '94: Proceedings of the 1994 6th IEEE Symposium on Parallel and Distributed Processing

In this paper we deal exclusively with spin locks for small- to medium-scale shared memory multiprocessors. The performance of a lock depends on the contention for the critical section being protected by the lock. The selection of the most appropriate ...
Read More

Reviews

Reviewer: Roy Levin

The authors have compiled a competent survey of spin-lock and barrier synchronization algorithms, including some new ones of their own. Two sections of the paper present these algorithms with clear pseudocode. Performance comparisons on a BBN Butterfly and a Sequent Symmetry support the conclusions in the subsequent section on architectural implications. The paper closes with a concise list of recommendations that will aid software practitioners who must navigate this sometimes bewildering space of possible algorithms. I was distracted by several small inconsistencies in notation and presentation that should have been picked up in the editorial process (like the references to the nonexistent Algorithms 8 and 11). It also took me a while to realize that a direct correspondence does not always exist between the algorithms whose code is presented and the algorithms whose performance is graphed. Finally, the space devoted to a lame attempt to summarize the correctness proof of the authors' intriguing new spin-lock algorithm would have been better spent explaining how a key race-condition is resolved. These inconsistencies are relatively small flaws in an otherwise well-organized and well-reasoned paper. I encourage designers of parallel architectures and parallel algorithms to read it.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Computer Systems Volume 9, Issue 1
Feb. 1991
97 pages
ISSN:0734-2071
EISSN:1557-7333
DOI:10.1145/103727
Issue’s Table of Contents

Copyright © 1991 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 February 1991
Published in tocs Volume 9, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1,043
  Total Citations
  View Citations
- 8,483
  Total Downloads
- Downloads (Last 12 months)731
- Downloads (Last 6 weeks)99
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors

Speculative Locks for Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors

A low-latency scalable locking algorithm for shared memory multiprocessors

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems

Abstract

References

Cited By

Index Terms

Recommendations

Nonblocking Algorithms and Preemption-Safe Locking on Multiprogrammed Shared Memory Multiprocessors

Speculative Locks for Concurrent Execution of Critical Sections in Shared-Memory Multiprocessors

A low-latency scalable locking algorithm for shared memory multiprocessors

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media