|
ABSTRACT
In this paper we investigate how performance and speedup of applications would be affected by using non-blocking rather than blocking synchronisation in parallel systems. The results obtained show that for many applications, non-blocking synchronisation lead to significant speedups for a fairly large number of processors, while it never slows the applications down. As part of this investigation this paper also provides a set of efficient and simple translations that show how typical blocking operations found in parallel applications, such as simple locks, queues and lock trees can be translated into non-blocking equivalents that use hardware primitives common in modern multiprocessor systems. With these translations this paper clearly demonstrates that it is easy for the application designer/programmer to replace the blocking operations commonly found on with non-blocking equivalents ones. For the empirical results a set of representative applications running on a large-scale ccNUMA machine were used.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
A. Eichenberger and S. Abraham, Impact of Load Imbalance on the Design of Software Barriers, in Proceedings of the 1995 International Conference on Parallel Processing, pp. 63-72, August 1995.
|
| |
2
|
M. Galles, Scalable Pipelined Interconnect for Distributed Endpoint Routing: The SGI Spider Chip, in Proceedings of Hot Interconnects IV, pp. 141-146, 1996.
|
 |
3
|
|
| |
4
|
A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph and M. Snir, The NYU Ultracomputer --- Designing a MIMD Shared-Memory Parallel Machine", IEEE Trans. on Computers, 32(2), p. 175, February 1983.
|
 |
5
|
|
 |
6
|
|
 |
7
|
Anna R. Karlin , Kai Li , Mark S. Manasse , Susan Owicki, Empirical studies of competitve spinning for a shared-memory multiprocessor, Proceedings of the thirteenth ACM symposium on Operating systems principles, p.41-55, October 13-16, 1991, Pacific Grove, California, United States
|
 |
8
|
Alain Kägi , Doug Burger , James R. Goodman, Efficient synchronization: let them eat QOLB, Proceedings of the 24th annual international symposium on Computer architecture, p.170-180, June 01-04, 1997, Denver, Colorado, United States
|
 |
9
|
|
| |
10
|
D. Lenoski , J. Laudon , T. Joe , D. Nakahira , L. Stevens , A. Gupta , J. Hennessy, The DASH Prototype: Logic Overhead and Performance, IEEE Transactions on Parallel and Distributed Systems, v.4 n.1, p.41-61, January 1993
[doi> 10.1109/71.205652
]
|
 |
11
|
|
 |
12
|
|
 |
13
|
|
| |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
D. R. O'Hallaron, Spark98: Sparse Matrix Kernels for Shared Memory and Message Passing Systems, Technical Report CMU-CS-97-178, October 1997.
|
 |
18
|
Edward Rothberg , Jaswinder Pal Singh , Anoop Gupta, Working sets, cache sizes, and node granularity issues for large-scale multiprocessors, Proceedings of the 20th annual international symposium on Computer architecture, p.14-26, May 16-19, 1993, San Diego, California, United States
|
| |
19
|
SGI, SGI TechPubs Library, http://techpubs.sgi.com/, 2000.
|
| |
20
|
|
 |
21
|
|
 |
22
|
|
 |
23
|
|
 |
24
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
 |
25
|
Steven Cameron Woo , Jaswinder Pal Singh , John L. Hennessy, The performance advantages of integrating block data transfer in cache-coherent multiprocessors, Proceedings of the sixth international conference on Architectural support for programming languages and operating systems, p.219-229, October 05-07, 1994, San Jose, California, United States
|
| |
26
|
|
| |
27
|
|
Peer to Peer - Readers of this Article have also read:
-
Web application security assessment by fault injection and behavior monitoring
Proceedings of the 12th international conference on World Wide Web
Yao-Wen Huang
, Shih-Kun Huang
, Tsung-Po Lin
, Chung-Hung Tsai
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|