|
ABSTRACT
Scalable parallel computers are often nonuniform communication architectures (NUCAs), where the access time to other processor's caches vary with their physical location. Still, few attempts of exploring cache-to-cache communication locality have been made. This paper introduces a new kind of synchronization primitives (lock-unlock) that favor neighboring processors when a lock is released. This improves the lock handover time as well as access time to the shared data of the critical region.A critical section guarded by our new RH lock takes less than half the time to execute compared with the same critical section guarded by any other lock on our NUCA hardware. The execution time for Raytrace with 28 processors was improved 2.23--4.68 times, while global traffic was dramatically decreased compared with all the other locks. The average execution time was improved 7--24% while the global traffic was decreased 8-28% for an average over the seven applications studied.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
{And89} T. E. Anderson. The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors. In Proceedings of the 1989 International Conference on Parallel Processing, volume II Software, pages 170--174, August 1989.
|
| |
2
|
|
 |
3
|
|
 |
4
|
Luiz André Barroso , Kourosh Gharachorloo , Robert McNamara , Andreas Nowatzyk , Shaz Qadeer , Barton Sano , Scott Smith , Robert Stets , Ben Verghese, Piranha: a scalable architecture based on single-chip multiprocessing, Proceedings of the 27th annual international symposium on Computer architecture, p.282-293, June 2000, Vancouver, British Columbia, Canada
|
 |
5
|
|
| |
6
|
{Cra93} T. S. Craig. Building FIFO and Priority-Queuing Spin Locks from Atomic Swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, February 1993.
|
| |
7
|
|
 |
8
|
Kourosh Gharachorloo , Madhu Sharma , Simon Steely , Stephen Van Doren, Architecture and design of AlphaServer GS320, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.13-24, November 2000, Cambridge, Massachusetts, United States
|
| |
9
|
|
 |
10
|
James R. Goodman , Mary K. Vernon , Philip J. Woest, Efficient synchronization primitives for large-scale cache-coherent multiprocessors, Proceedings of the third international conference on Architectural support for programming languages and operating systems, p.64-75, April 03-06, 1989, Boston, Massachusetts, United States
|
| |
11
|
|
 |
12
|
Alain Kägi , Doug Burger , James R. Goodman, Efficient synchronization: let them eat QOLB, Proceedings of the 24th annual international symposium on Computer architecture, p.170-180, June 01-04, 1997, Denver, Colorado, United States
|
 |
13
|
|
 |
14
|
|
 |
15
|
|
| |
16
|
Daniel Lenoski , James Laudon , Kourosh Gharachorloo , Wolf-Dietrich Weber , Anoop Gupta , John Hennessy , Mark Horowitz , Monica S. Lam, The Stanford Dash Multiprocessor, Computer, v.25 n.3, p.63-79, March 1992
[doi> 10.1109/2.121510
]
|
 |
17
|
|
| |
18
|
|
| |
19
|
{MS96} L. W. McVoy and Carl Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the 1996 USENIX Annual Technical Conference, pages 279--294, January 1996.
|
 |
20
|
|
 |
21
|
|
| |
22
|
{SBC+96} A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yuan, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, E. Hagersten, and B. Liencres. Gigaplane: A High Performance Bus for Large SMPs. In Proceedings of IEEE Hot Interconnects IV, pages 41--52, August 1996.
|
 |
23
|
|
| |
24
|
|
 |
25
|
|
 |
26
|
Steven Cameron Woo , Moriyoshi Ohara , Evan Torrie , Jaswinder Pal Singh , Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture, p.24-36, June 22-24, 1995, S. Margherita Ligure, Italy
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|