|
ABSTRACT
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use models of programs and machines to choose these parameters. It is widely believed that model-driven optimization does not compete with empirical optimization, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the empirical optimization engine in ATLAS (a system for generating a dense numerical linear algebra library called the BLAS) with a model-driven optimization engine that used detailed models to estimate values for optimization parameters, and then measured the relative performance of the two systems on three different hardware platforms. Our experiments show that model-driven optimization can be surprisingly effective, and can generate code whose performance is comparable to that of code generated by empirical optimizers for the BLAS.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
|
 |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
M. Frigo and S. G. Johnson. FFTW: An adaptive software architecture for the FFT. In proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3, pages 1381--1384, 1998
|
| |
9
|
|
| |
10
|
T. Kisuki, P. M. W. Knijnenburg, M. F. P. O'Boyle, and H. A. G. Wijshoff . Iterative compilation in program optimization. In proceedings of Compilers for Parallel Computers, pages (CPC), pages 35--44, 2000
|
| |
11
|
|
 |
12
|
Monica D. Lam , Edward E. Rothberg , Michael E. Wolf, The cache performance and optimizations of blocked algorithms, Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, p.63-74, April 08-11, 1991, Santa Clara, California, United States
|
| |
13
|
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems Journal 9, 2, 78--117, 1970
|
 |
14
|
|
 |
15
|
|
 |
16
|
|
| |
17
|
J. Ramanujam and P. Sadayappan, Tiling multidimensional iteration spaces for multicomputers, Journal of Parallel and Distributed Computing, 16(2):108--120, 1992
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
 |
21
|
Jianxin Xiong , Jeremy Johnson , Robert Johnson , David Padua, SPL: a language and compiler for DSP algorithms, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.298-308, June 2001, Snowbird, Utah, United States
|
CITED BY 25
|
Cemal Yilmaz , Arvind S. Krishna , Atif Memon , Adam Porter , Douglas C. Schmidt , Aniruddha Gokhale , Balachandran Natarajan, Main effects screening: a distributed continuous quality assurance process for monitoring performance degradation in evolving software systems, Proceedings of the 27th international conference on Software engineering, May 15-21, 2005, St. Louis, MO, USA
|
|
|
|
|
Albert Cohen , Sébastien Donadio , Maria-Jesus Garzaran , Christoph Herrmann , Oleg Kiselyov , David Padua, In search of a program generator to implement generic transformations for high-performance computing, Science of Computer Programming, v.62 n.1, p.25-46, September 2006
|
|
|
J. Dongarra , G. Bosilca , Z. Chen , V. Eijkhout , G. E. Fagg , E. Fuentes , J. Langou , P. Luszczek , J. Pjesivac-Grbovic , K. Seymour , H. You , S. S. Vadhiyar, Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, v.50 n.2/3, p.223-238, March 2006
|
|
|
|
|
|
|
|
|
|
F. Agakov , E. Bonilla , J. Cavazos , B. Franke , G. Fursin , M. F. P. O'Boyle , J. Thomson , M. Toussaint , C. K. I. Williams, Using Machine Learning to Focus Iterative Optimization, Proceedings of the International Symposium on Code Generation and Optimization, p.295-305, March 26-29, 2006
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Leonardo Bachega , Siddhartha Chatterjee , Kenneth A. Dockser , John A. Gunnels , Manish Gupta , Fred G. Gustavson , Christopher A. Lapkowski , Gary K. Liu , Mark P. Mendell , Charles D. Wait , T. J. Chris Ward, A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design, Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, p.85-96, September 29-October 03, 2004
|
|
|
|
|
|
|
|
|
|
|
|
|
|
John Cavazos , Christophe Dubach , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Grigori Fursin , Olivier Temam, Automatic performance model construction for the fast software exploration of new hardware designs, Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, October 22-25, 2006, Seoul, Korea
|
|
|
John Cavazos , Grigori Fursin , Felix Agakov , Edwin Bonilla , Michael F. P. O'Boyle , Olivier Temam, Rapidly Selecting Good Compiler Optimizations using Performance Counters, Proceedings of the International Symposium on Code Generation and Optimization, p.185-197, March 11-14, 2007
|
|
|
|
Albert Cohen , Marc Sigler , Sylvain Girbal , Olivier Temam , David Parello , Nicolas Vasilache, Facilitating the search for compositions of program transformations, Proceedings of the 19th annual international conference on Supercomputing, June 20-22, 2005, Cambridge, Massachusetts
|
|
|
|
|
|
|
|
INDEX TERMS
Primary Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Compilers
Additional Classification:
D.
Software
D.3
PROGRAMMING LANGUAGES
D.3.4
Processors
Subjects:
Code generation;
Optimization
G.
Mathematics of Computing
G.4
MATHEMATICAL SOFTWARE
I.
Computing Methodologies
I.2
ARTIFICIAL INTELLIGENCE
I.2.2
Automatic Programming
Subjects:
Program transformation
General Terms:
Algorithms,
Experimentation,
Measurement,
Performance
Keywords:
BLAS,
blocking,
code generation,
compilers,
empirical optimization,
memory hierarchy,
model-driven optimization,
program transformation,
tiling,
unrolling
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|