ACM Home Page
Please provide us with feedback. Feedback
Efficient synchronization for nonuniform communication architectures
Full text pdf formatPdf (162 KB)
Source Conference on High Performance Networking and Computing archive
Proceedings of the 2002 ACM/IEEE conference on Supercomputing table of contents
Baltimore, Maryland
Pages: 1 - 13  
Year of Publication: 2002
Authors
Zoran Radović  Uppsala University, Uppsala, Sweden
Erik Hagersten  Uppsala University, Uppsala, Sweden
Sponsors
IEEE-CS\DATC : IEEE Computer Society
ACM: Association for Computing Machinery
SIGARCH: ACM Special Interest Group on Computer Architecture
Publisher
IEEE Computer Society Press  Los Alamitos, CA, USA
Bibliometrics
Downloads (6 Weeks): 1,   Downloads (12 Months): 20,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   

ABSTRACT

Scalable parallel computers are often nonuniform communication architectures (NUCAs), where the access time to other processor's caches vary with their physical location. Still, few attempts of exploring cache-to-cache communication locality have been made. This paper introduces a new kind of synchronization primitives (lock-unlock) that favor neighboring processors when a lock is released. This improves the lock handover time as well as access time to the shared data of the critical region.A critical section guarded by our new RH lock takes less than half the time to execute compared with the same critical section guarded by any other lock on our NUCA hardware. The execution time for Raytrace with 28 processors was improved 2.23--4.68 times, while global traffic was dramatically decreased compared with all the other locks. The average execution time was improved 7--24% while the global traffic was decreased 8-28% for an average over the seven applications studied.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
{And89} T. E. Anderson. The Performance Implications of Spin-Waiting Alternatives for Shared-Memory Multiprocessors. In Proceedings of the 1989 International Conference on Parallel Processing, volume II Software, pages 170--174, August 1989.
 
2
3
4
5
 
6
{Cra93} T. S. Craig. Building FIFO and Priority-Queuing Spin Locks from Atomic Swap. Technical Report TR 93-02-02, Department of Computer Science, University of Washington, February 1993.
 
7
8
 
9
10
 
11
12
13
14
15
 
16
17
 
18
 
19
{MS96} L. W. McVoy and Carl Staelin. lmbench: Portable Tools for Performance Analysis. In Proceedings of the 1996 USENIX Annual Technical Conference, pages 279--294, January 1996.
20
21
 
22
{SBC+96} A. Singhal, D. Broniarczyk, F. Cerauskis, J. Price, L. Yuan, C. Cheng, D. Doblar, S. Fosth, N. Agarwal, K. Harvey, E. Hagersten, and B. Liencres. Gigaplane: A High Performance Bus for Large SMPs. In Proceedings of IEEE Hot Interconnects IV, pages 41--52, August 1996.
23
 
24
25
26


Collaborative Colleagues:
Zoran Radović: colleagues
Erik Hagersten: colleagues

Peer to Peer - Readers of this Article have also read: