|
ABSTRACT
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI message-passing programs for execution on distributed memory systems. This translation aims to extend the ease of creating parallel applications with OpenMP to a wider variety of platforms, such as commodity cluster systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data.We evaluate the performance achieved by our translation scheme on seven representative OpenMP applications, two from SPEC OMPM2001 and five from the NAS Parallel Benchmarks suite, on two different platforms. The average scalability (execution time relative to the serial version) achieved is within 12% of that achieved by corresponding hand-tuned MPI applications. We also compare our programs with versions deployed for a Software Distributed Shared Memory (SDSM) system and find that the direct translation to MPI achieves up to 30% higher scalability. A comparison with High Performance Fortran (HPF) versions of two NAS benchmarks indicates that our translated OpenMP versions achieve 12% to 89% better performance than the HPF versions.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
Cristiana Amza , Alan L. Cox , Sandhya Dwarkadas , Pete Keleher , Honghui Lu , Ramakrishnan Rajamony , Weimin Yu , Willy Zwaenepoel, TreadMarks: Shared Memory Computing on Networks of Workstations, Computer, v.29 n.2, p.18-28, February 1996
[doi> 10.1109/2.485843
]
|
| |
3
|
Vishal Aslot , Max J. Domeika , Rudolf Eigenmann , Greg Gaertner , Wesley B. Jones , Bodo Parady, SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance, Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming, p.1-10, July 30-31, 2001
|
| |
4
|
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
|
| |
5
|
P. Banerjee, J. A. Chandy, M. Gupta, J. G. Holm, A. Lain, D. J. Palermo, S. Ramaswamy, and E. Su. The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers. In The First International Workshop on Parallel Processing, pages 322--330, Bangalore, India, Dec. 1994.
|
| |
6
|
John Bircsak , Peter Craig , RaeLyn Crowell , Zarka Cvetanovic , Jonathan Harris , C. Alexander Nelson , Carl D. Offner, Extending OpenMP for NUMA machines, Proceedings of the 2000 ACM/IEEE conference on Supercomputing (CDROM), p.48-es, November 04-10, 2000, Dallas, Texas, United States
|
| |
7
|
|
| |
8
|
|
| |
9
|
|
| |
10
|
B. Chapman, P. Mehrotra, and H. Zima. Enhancing OpenMP with features for Locality Control. Technical Report TR99-02, Inst. for Software Technology and Parallel Systems, U. Vienna, www.par.univie.ac.at., 1999.
|
 |
11
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
12
|
F. Darema, D. A. George, V. A. Norton, and G. F. Pfister. A single-program-multiple-data computational model for epex/fortran. Parallel Computing, 7(1):11--24, 1988.
|
| |
13
|
|
| |
14
|
|
| |
15
|
T. El-Ghazawi, W. Carlson, and J. Draper. UPC Language Specifications V1.0, Feb. 2001.
|
| |
16
|
|
| |
17
|
M. Frumkin, H. Jin, and J. Yan. Implementation of NAS Parallel Benchmarks in High Performance Fortran. Technical Report NAS-98-009.
|
| |
18
|
|
| |
19
|
|
| |
20
|
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRP-CTR92225, Houston, Tex., 1993.
|
| |
21
|
|
| |
22
|
H. Jin, M. Frumkin, and J. Yan. The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance. Technical Report NAS-99-011.
|
| |
23
|
|
| |
24
|
Charles H. Koelbel , David B. Loveman , Robert S. Schreiber , Guy L. Steele, Jr. , Mary E. Zosel, The high performance Fortran handbook, MIT Press, Cambridge, MA, 1994
|
| |
25
|
U. Kremer. Automatic data layout for distributed memory machines. Technical Report TR96-261, 14, 1996.
|
| |
26
|
S.-I. Lee, T. A. Johnson, and R. Eigenmann. Cetus - An Extensible Compiler Infrastructure for Source-to-Source Transformation. In Proc. of the Workshop on Languages and Compilers for Parallel Computing(LCPC'03), pages 539--553. (Springer-Verlag Lecture Notes in Computer Science), Oct. 2003.
|
| |
27
|
|
| |
28
|
|
 |
29
|
|
| |
30
|
|
| |
31
|
|
| |
32
|
S.-J. Min, A. Basumallik, and R. Eigenmann. Supporting realistic OpenMP applications on a commodity cluster of workstations. In OpenMP Shared Memory Parallel Programming: International Workshop on OpenMP Applications and Tools, WOMPAT 2003, Toronto, Canada, June 26--27, 2003. Proceedings Editors: M. J. Voss (Ed.), pages 170--179, 2003.
|
| |
33
|
OpenMP Forum. OpenMP: A Proposed Industry Standard API for Shared Memory Programming. Technical report, October 1997.
|
 |
34
|
|
 |
35
|
|
| |
36
|
V. Schuster and D. Miles. Distributed OpenMP, Extensions to OpenMP for SMP Clusters. In Proc. of the Workshop on OpenMP Applications and Tools (WOMPAT2000), July 2000.
|
| |
37
|
|
| |
38
|
J. Zhu and J. Hoeflinger. Compiling for a Hybrid Programming Model Using the LMAD Representation. In Proc. of the 14th annual workshop on Languages and Compilers for Parallel Computing (LCPC2001), August 2001.
|
|