| A case study in top-down performance estimation for a large-scale parallel application |
| Full text |
Pdf
(771 KB)
|
| Source
|
Principles and Practice of Parallel Programming
archive
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
table of contents
New York, New York, USA
SESSION: Performance characterization
table of contents
Pages: 81 - 89
Year of Publication: 2006
ISBN:1-59593-189-9
|
|
Authors
|
|
Ilya Sharapov
|
Sun Microsystems, Santa Clara, CA
|
|
Robert Kroeger
|
Sun Microsystems, Santa Clara, CA
|
|
Guy Delamarter
|
Sun Microsystems, Santa Clara, CA
|
|
Razvan Cheveresan
|
Sun Microsystems, Santa Clara, CA
|
|
Matthew Ramsay
|
Sun Microsystems, Santa Clara, CA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 7, Downloads (12 Months): 120, Citation Count: 2
|
|
|
ABSTRACT
This work presents a general methodology for estimating the performance of an HPC workload when running on a future hardware architecture. Further, it demonstrates the methodology by estimating the performance of a significant scientific application -- the Gyrokinetic Toroidal Code (GTC) -- when executing on Sun's proposed next-generation petascale computer architecture.For GTC, we identify the important phases of the iteration and perform low-level analysis that includes instruction tracing and component simulations of processor and memory systems. Low-level analysis is complemented with scalability estimates based on modeling MPI, OpenMP and I/O activity in the code. The work's approach permits accurate end-to-end performance projections from the microarchitecture level to the petascale.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
T.F. Chan and T. Mathew. Domain decomposition algorithms. In Acta Numerica, pages 61--143. Cambridge University Press, 1994.
|
| |
2
|
Robit Chandra , Leonardo Dagum , Dave Kohr , Dror Maydan , Jeff McDonald , Ramesh Menon, Parallel programming in OpenMP, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2001
|
| |
3
|
F. Chen. Introduction to Plasma Physics and Controlled Fusion, volume 1. Plenum Publishing Corporation, 2nd edition, January 1984.
|
 |
4
|
Mee-Chow Chiang , Gurindar S. Sohi, Experience with mean value analysis model for evaluating shared bus, throughput-oriented multiprocessors, Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, p.90-100, May 21-24, 1991, San Diego, California, United States
|
 |
5
|
|
| |
6
|
S. Ethier. Performance study of the 3D particle-in-cell code GTC on the Cray X1. Presentation at CUG04, May 2004.
|
| |
7
|
|
 |
8
|
|
| |
9
|
I. Gluhovsky and D. Vengerov. Nonnegative monotone convex-concave multivariate extrapolation models with application to computer cache rates. to appear in Technometrics.
|
| |
10
|
|
| |
11
|
|
 |
12
|
|
| |
13
|
Adolfy Hoisie. Performance Prediction: Where the Rubber Meets the Road. Presentation at the Salishan Conference on High Speed Computing, April 2002.
|
| |
14
|
Adolfy Hoisie. Performance Prediction: Where the Rubber Meets the Road. Presentation at the Salishan Conference on High Speed Computing, April 2002.
|
| |
15
|
S. Kunkel, R. Eickemeyer, M. Lipasti, T. Mullins, B. O'Krafka, H. Rosenberg, S. Vanderweil, P. Vitale, and L. Whitely. A performance methodology for commercial servers. IBM Journal of Research and Development, 44(6):851--873, 2000.
|
| |
16
|
|
| |
17
|
Z. Lin, T. S. Hahm, W. W. Lee, W. M. Tang, and R. B. White. Turbulent transport reduction by zonal flows: Massively parallel simulations. Science, (281):1835, 1998.
|
| |
18
|
P. Luszczek, J.J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Bailey, and D. Takahashi. Introduction to the hpc challenge benchmark suite. Technical report, April 2005. Lawrence Berkeley National Laboratory. Paper LBNL-57493.
|
| |
19
|
|
| |
20
|
|
 |
21
|
Daniel J. Sorin , Vijay S. Pai , Sarita V. Adve , Mary K. Vernon , David A. Wood, Analytic evaluation of shared-memory systems with ILP processors, Proceedings of the 25th annual international symposium on Computer architecture, p.380-391, June 27-July 02, 1998, Barcelona, Spain
|
| |
22
|
W.M. Tang. Introduction to gyrokinetic theory with applications in magnetic confinement research in plasma physics. Technical report, Princeton Plasma Physics Laboratory, 2005.
|
| |
23
|
R.B. White and A.H. Boozer. Rapid guiding center calculations. Physics of Plasmas, 2(8):2915--2919, 1995.
|
| |
24
|
R.W. Wolff. Stochastic Modeling and the Theory of Queues. Prentice-Hall, 1989.
|
CITED BY 2
|
|
Filip Blagojevic , Dimitrios S. Nikolopoulos , Alexandros Stamatakis , Christos D. Antonopoulos , Matthew Curtis-Maury, Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems, Parallel Computing, v.33 n.10-11, p.700-719, November, 2007
|
|
|
|