|
ABSTRACT
The Cray X1 was recently introduced as the first in a new line of parallel systems to combine high-bandwidth vector processing with an MPP system architecture. Alongside capabilities such as automatic fine-grained data parallelism through the use of vector instructions, the X1 offers hardware support for a transparent global-address space (GAS), which makes it an interesting target for GAS languages. In this paper, we describe our experience with developing a portable, open-source and high performance compiler for Unified Parallel C (UPC), a SPMD global-address space language extension of ISO C. As part of our implementation effort, we evaluate the X1's hardware support for GAS languages and provide empirical performance characterizations in the context of leveraging features such as vectorization and global pointers for the Berkeley UPC compiler. We discuss several difficulties encountered in the Cray C compiler which are likely to present challenges for many users, especially implementors of libraries and source-to-source translators. Finally, we analyze the performance of our compiler on some benchmark programs and show that, while there are some limitations of the current compilation approach, the Berkeley UPC compiler uses the X1 network more effectively than MPI or SHMEM, and generates serial code whose vectorizability is comparable to the original C code.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
The Berkeley UPC Compiler, 2002. http://upc.lbl.gov.
|
| |
3
|
K. Berlin, J. Huan, M. Jacob, et al. Evaluating the impact of programming language features on the performance of parallel applications on cluster architectures. In 16th International Workshop on Languages and Compilers for Parallel Processing (LCPC), October 2003.
|
| |
4
|
|
| |
5
|
Programming Languages -- C, 1999. The ISO C Standard, ISO/IEC 9899:1999.
|
 |
6
|
Soumen Chakrabarti , Manish Gupta , Jong-Deok Choi, Global communication analysis and optimization, Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, p.68-78, May 21-24, 1996, Philadelphia, Pennsylvania, United States
|
 |
7
|
|
| |
8
|
C. Coarfa, Y. Dotsenko, J. Eckhardt, and J. Mellor-Crummey. Co-array Fortran performance and potential: An NPB experimental study. In 16th International Workshop on Languages and Compilers for Parallel Processing (LCPC), October 2003.
|
| |
9
|
Cray C/C++ reference manual. http://www.cray.com/craydoc/manuals/004-2179-003/html-004-2179-003/.
|
| |
10
|
Cray X1 system overview. http://www.cray.com/craydoc/20/manuals/S-2346-23/html-S-2346-23/S-2346-23-toc.html.
|
 |
11
|
A. Krishnamurthy , D. E. Culler , A. Dusseau , S. C. Goldstein , S. Lumetta , T. von Eicken , K. Yelick, Parallel programming in Split-C, Proceedings of the 1993 ACM/IEEE conference on Supercomputing, p.262-273, December 1993, Portland, Oregon, United States
[doi> 10.1145/169627.169724]
|
| |
12
|
|
| |
13
|
Earth Simulator. http://www.es.jamstec.go.jp/.
|
| |
14
|
|
| |
15
|
T. El-Ghazawi, W. Carlson, and J. Draper. UPC specification, 2003. http://upc.gwu.edu/documentation.html.
|
| |
16
|
Paul N. Hilfinger , Dan Bonachea , David Gay , Susan Graham , Ben Liblit , Geoff Pike , Katherine Yelick, Titanium Language Reference Manual, University of California at Berkeley, Berkeley, CA, 2001
|
| |
17
|
|
 |
18
|
|
| |
19
|
|
| |
20
|
F. McMahon. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical report, Lawrence Livermore National Laboratory, December 1986.
|
| |
21
|
The Message Passing Interface (MPI) standard. http://www.mpi-forum.org/.
|
| |
22
|
R. Numwich and J. Reid. Co-Array Fortran for parallel programming. Technical Report RAL-TR-1998-060, Rutherford Appleton Lab, 1998.
|
| |
23
|
Leonid Oliker , Andrew Canning , Jonathan Carter , John Shalf , David Skinner , Ethier Ethier , Rupak Biswas , Jahed Djomehri , Rob Van der Wijngaart, Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations, Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p.38, November 15-21, 2003
|
| |
24
|
Optimizing applications on the Cray X1 system. http://www.cray.com/craydoc/20/manuals/S-2315-51/html-S-2315-51/S-2315-51-toc.html.
|
 |
25
|
|
| |
26
|
Man page collections: Shared memory access (SHMEM). http://www.cray.com/craydoc/20/manuals/S-2383-22/S-2383-22-manual.pdf.
|
| |
27
|
A. Wakatani. Effectiveness of Message Strip-Mining for Regular and Irregular Communication. In PDCS, Oct 94.
|
| |
28
|
K. Yelick, D. Bonachea, and C. Wallace. A proposal for a UPC memory consistency model. Technical Report LBNL-54983, Lawrence Berkeley National Lab, May 2004.
|
| |
29
|
K. Yelick et al. Titanium: a high performance java dialect. In proceedings of ACM 1998 Workshop on Java for High-Performance Network Computing, February 1998.
|
| |
30
|
|
CITED BY 6
|
Christopher Barton , CĆlin Casçaval , George Almási , Yili Zheng , Montse Farreras , Siddhartha Chatterje , José Nelson Amaral, Shared memory programming for large scale machines, ACM SIGPLAN Notices, v.41 n.6, June 2006
|
|
Dennis Abts , Abdulla Bataineh , Steve Scott , Greg Faanes , Jim Schwarzmeier , Eric Lundberg , Tim Johnson , Mike Bye , Gerald Schwoerer, The Cray BlackWidow: a highly scalable vector multiprocessor, Proceedings of the 2007 ACM/IEEE conference on Supercomputing, November 10-16, 2007, Reno, Nevada
|
|
Albert Cohen , Marc Sigler , Sylvain Girbal , Olivier Temam , David Parello , Nicolas Vasilache, Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
K. Yelick , P. Hilfinger , S. Graham , D. Bonachea , J. Su , A. Kamil , K. Datta , P. Colella , T. Wen, Parallel Languages and Compilers: Perspective From the Titanium Experience, International Journal of High Performance Computing Applications, v.21 n.3, p.266-290, August 2007
|
|
|
|
|
|
Sylvain Girbal , Nicolas Vasilache , Cédric Bastoul , Albert Cohen , David Parello , Marc Sigler , Olivier Temam, Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies, International Journal of Parallel Programming, v.34 n.3, p.261-317, June 2006
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|