|
ABSTRACT
Benchmarks that measure memory bandwidth, such as STREAM, Apex-MAPS and MultiMAPS, are increasingly popular due to the "Von Neumann" bottleneck of modern processors which causes many calculations to be memory-bound. We present a scheme for predicting the performance of HPC applications based on the results of such benchmarks. A Genetic Algorithm approach is used to "learn" bandwidth as a function of cache hit rates per machine with MultiMAPS as the fitness test. The specific results are 56 individual performance predictions including 3 full-scale parallel applications run on 5 different modern HPC architectures, with various CPU counts and inputs, predicted within 10% average difference with respect to independently verified runtimes.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
|
| |
2
|
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers". IEEE Technical Committee on Computer Architecture Newsletter.
|
| |
3
|
|
| |
4
|
|
| |
5
|
Allan Snavely , Laura Carrington , Nicole Wolter , Jesus Labarta , Rosa Badia , Avi Purkayastha, A framework for performance modeling and prediction, Proceedings of the 2002 ACM/IEEE conference on Supercomputing, p.1-17, November 16, 2002, Baltimore, Maryland
|
| |
6
|
|
| |
7
|
Department of Defense, High Performance Computing Modernization Program. Technology Insertion 07. http://www.hpcmo.hpc.mil/Htdocs/TI/.
|
| |
8
|
HPC Challenge Benchmarks, http://icl.cs.utk.edu/hpcc/.
|
| |
9
|
R. Bleck, An oceanic general circulation model framed in hybrid isopycnic-cartesian coordinates. Ocean Modelling, 4, 55--88. 2002.
|
| |
10
|
C. C. Hoke, V. Burnley, C. G. Schwabacher, Aerodynamic Analysis of Complex Missile Configurations using AVUS (Air Vehicles Unstructured Solver). Applied Aerodynamics Conference and Exhibit. August 2004, Providence, RI.
|
| |
11
|
P. G. Buning, D. C. Jespersen, T. H. Pulliam, G. H. Klopfer, W. M. Chan, J. P. Slotnick, S. E. Krist, and K. J. Renze, Overflow Users Manual, Langley Research Center, 2003. Hampton, VA.
|
| |
12
|
|
| |
13
|
|
| |
14
|
Reference Guide for The Genetic Algorithm Utility Library. http://gaul.sourceforge.net/gaul_reference_guide.html.2005.
|
| |
15
|
High Performance Computing Modernization Program, http://www.hpcmo.hpc.mil.
|
| |
16
|
D. Skinner, Performance monitoring of parallel scientific applications, Lawrence Berkeley National Laboratory, LBNL/PUB---5503. May 2005. Berkeley, CA.
|
| |
17
|
D. Bailey, J. Barton, T. Lasinski, H. Simon, "The NAS parallel benchmarks", International Journal of Supercomputer Applications, 1991.
|
| |
18
|
SPEC, http://www.spec.org/.
|
| |
19
|
|
| |
20
|
J. McCalpin, "Memory bandwidth and machine balance in current high performance computers", IEEE Technical Committee on Computer Architecture Newsletter.
|
 |
21
|
|
| |
22
|
R. S., Ballansc, J. A. Cocke, and H. G. Kolsky, The Lookahead Unit, Planning a Computer System, (McGraw-Hill, New York, 1962).
|
| |
23
|
|
 |
24
|
Jack L. Lo , Joel S. Emer , Henry M. Levy , Rebecca L. Stamm , Dean M. Tullsen , S. J. Eggers, Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading, ACM Transactions on Computer Systems (TOCS), v.15 n.3, p.322-354, Aug. 1997
[doi> 10.1145/263326.263382]
|
 |
25
|
|
 |
26
|
Jeff Gibson , Robert Kunz , David Ofelt , Mark Horowitz , John Hennessy , Mark Heinrich, FLASH vs. (Simulated) FLASH: closing the simulation loop, Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, p.49-58, November 2000, Cambridge, Massachusetts, United States
|
 |
27
|
|
| |
28
|
|
| |
29
|
|
 |
30
|
|
| |
31
|
|
| |
32
|
|
| |
33
|
Eric L. Boyd , Waqar Azeem , Hsien-Hsin Lee , Tien-Pao Shih , Shih-Hao Hung , Edward S. Davidson, A Hierarchical Approach to Modeling and Improving the Performance of Scientific Applications on the KSR1, Proceedings of the 1994 International Conference on Parallel Processing, p.188-192, August 15-19, 1994
[doi> 10.1109/ICPP.1994.30]
|
| |
34
|
|
| |
35
|
A. Spooner and D. Kerbyson, "Identification of Performance Characteristics from Multiview Trace Analysis", Proc. Of Int. Conf. On Computational Science (ICCS), part 3 2659, pp. 936--945, 2003.
|
| |
36
|
Shirley Moore , David Cronk , Felix Wolf , Avi Purkayastha , Patricia Teller , Robert Araiza , Maria Gabriela Aguilera , Jamie Nava, Performance Profiling and Analysis of DoD Applications Using PAPI and TAU, Proceedings of the 2005 Users Group Conference on 2005 Users Group Conference, p.394, June 27-30, 2005
[doi> 10.1109/DOD_UGC.2005.50]
|
| |
37
|
High Productivity Computer Systems, www.highproductivity.org
|
| |
38
|
M. Snir, and Jing Yu, "On the Theory of Spatial and Temporal Locality", Technical Report No. UIUCDCS-R-2005-2611, University of Illinois at Urbana-Champaign, Urbana, IL, July 2005.
|
| |
39
|
X. Gao. PhD Thesis. 2006. University of California Computer Science Department.
|
| |
40
|
Y. Chen and A. Snavely: Metrics for Ranking the Performance of Supercomputers, Cyberinfrastructure Technology Watch Journal: Special Issue on High Productivity Computer Systems, J. Dongarra Editor, Volume 2 Number 4, February 2007.
|
| |
41
|
E. Ipek, S. McKee, R. Caruana, B. R. de Supinski, and Schulz, M. 2006. Efficiently exploring architectural design spaces via predictive modeling. SIGPLAN Not. 41, 11 (Nov. 2006), 195--206. DOI= http://doi.acm.org/10.1145/1168918.1168882
|
| |
42
|
A. Phansalkar, L. K. John. Performance Prediction using Program Similarity, Proceedings of SPEC Benchmark Workshop 2006.
|
|