|
ABSTRACT
This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates processing logic (160 thread units), embedded memory (5MB) and communication hardware on the same die. Such a unique architecture presents new opportunities for optimization. Specifically, we consider the following three areas: (1) a memory aware runtime library that places frequently used data structures in scratchpad memory; (2) a unique spin lock algorithm for shared memory synchronization based on in-memory atomic instructions and native support for thread level execution; (3) a fast barrier that directly uses C64 hardware support for collective synchronization. All three optimizations together, result in an 80% overhead reduction for language constructs in OpenMP. We believe that such a drastic reduction in the cost of managing parallelism makes OpenMP more amenable for writing parallel programs on the C64 platform.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
George S. Almási, Eduard Ayguadé, Călin Caşcaval, José Castaños, Jesús Labarta, Francisco Martíinez, Xavier Martorell, and José Moreira. Evaluation of Open MP for the Cyclops multithreaded architecture. In OpenMP Shared Memory Parallel Programming: International Workshop on OpenMP Applications and Tools, WOMPAT 2003, volume 2716 of Lecture Notes in Computer Science, pages 69--83, Toronto, Canada, June 26--27, 2003.
|
| |
2
|
George S. Almasi , Călin Caşcaval , José G. Castaños , Monty Denneau , Wilm Donath , Maria Eleftheriou , Mark Giampapa , Howard Ho , Derek Lieber , José E. Moreira , Dennis Newns , Marc Snir , Henry S. Warren, Jr., Demonstrating the Scalability of a Molecular Dynamics Application on a Petaflops Computer, International Journal of Parallel Programming, v.30 n.4, p.317-351, August 2002
[doi> 10.1023/A:1019856029918
]
|
| |
3
|
|
| |
4
|
Rudolf Berrendorf and Guido Nieken. Performance characteristics for Open MP constructs on different parallel computer architectures. Concurrency - Practice and Experience, 12(12):1261--1273, 2000.
|
| |
5
|
J. Mark Bull. Measuring synchronization and scheduling overheads in Open MP. In Proceedings of the First European Workshop on Open MP, Lund, Sweden, September 30 - October 1, 1999.
|
| |
6
|
Juan del Cuvillo, Weirong Zhu, Ziang Hu, and Guang R. Gao. FAST: A functionally accurate simulation toolset for the C yclops64 cellular architecture. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation, pages 11--20, Madison, Wisconsin, June 4, 2005. Held in conjunction with the 32nd Annual International Symposium on Computer Architecture.
|
| |
7
|
Juan del Cuvillo, Weirong Zhu, Ziang Hu, and Guang R. Gao. Toward a software infrastructure for the C yclops-64 cellular architecture. In Proceedings of the 20th International Symposium on High Performance Computing Systems and Applications, St. John's, Newfoundland and Labrador, Canada, May 14--17, 2006.
|
 |
8
|
Nathan R. Fredrickson , Ahmad Afsahi , Ying Qian, Performance characteristics of openMP constructs, and application benchmarks on a large symmetric multiprocessor, Proceedings of the 17th annual international conference on Supercomputing, June 23-26, 2003, San Francisco, CA, USA
[doi> 10.1145/782814.782835]
|
| |
9
|
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
 |
13
|
|
 |
14
|
|
| |
15
|
IBM. IBM system/370 extended architecture, Principle of operation. 1983. Publication no. SA22-7085.
|
 |
16
|
|
| |
17
|
|
 |
18
|
|
 |
19
|
|
 |
20
|
|
| |
21
|
Maged M. Michael. CAS -based lock-free algorithm for shared deques. In the 9th Euro-Par Conference on Parallel Processing, pages 651--660, August 2003.
|
| |
22
|
|
 |
23
|
Maged M. Michael , Michael L. Scott, Simple, fast, and practical non-blocking and blocking concurrent queue algorithms, Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing, p.267-275, May 23-26, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/248052.248106]
|
| |
24
|
Open MP Architecture Review Board. Open MP FORTRAN application program interface. Technical Report 2.0, Open MP Architecture Review Board, November 2000.
|
| |
25
|
Open MP Architecture Review Board. Open MP C and C ++ application program interface. Technical Report 2.0, Open MP Architecture Review Board, March 2002.
|
| |
26
|
|
| |
27
|
David Rodenas , Xavier Martorell , Eduard Ayguade , Jesus Labarta , George Almasi , Calin Cascaval , Jose Castanos , Jose Moreira, Optimizing NANOS OpenMP for the IBM Cyclops Multithreaded Architecture, Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers, p.110, April 04-08, 2005
[doi> 10.1109/IPDPS.2005.317]
|
 |
28
|
|
 |
29
|
|
|