|
ABSTRACT
Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. It is designed to reduce the memory footprint after a wide spectrum of loop transformations, whether based on uniform dependence vectors or more expressive polyhedral abstractions. Conversely, few loop transformations have been proposed to facilitate register promotion, namely loop fusion, unroll-and-jam or tiling. Building on array data-flow analysis and expansion, we extend storage mapping optimization to improve opportunities for register promotion.Our work is motivated by the empirical study of a computational biology benchmark, the approximate string matching algorithm BPR from NR-grep, on a wide issue micro-architecture. Our experiments confirm the major benefit of register tiling (even on non-numerical benchmarks) but also shed the light on two novel issues: prior array expansion may be necessary to enable loop transformations that finally authorize profitable register promotion, and more advanced scheduling techniques (beyond tiling and unroll-and-jam) may significantly improve performance in fine-tuning register usage and instruction-level parallelism.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Denis Barthou , Albert Cohen , Jean-François Collard, Maximal static expansion, Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, p.98-106, January 19-21, 1998, San Diego, California, United States
[doi> 10.1145/268946.268955]
|
| |
3
|
|
| |
4
|
C. Bastoul. Efficient code generation for automatic parallelization and optimization. In ISPDC'2 IEEE International Symposium on Parallel and Distributed Computing, Ljubjana, Slovenia, Oct. 2003.
|
| |
5
|
C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam. Putting polyhedral loop transformations to work. In Workshop on Languages and Compilers for Parallel Computing (LCPC'03), LNCS, College Station, Texas, Oct. 2003.
|
| |
6
|
C. Bastoul and P. Feautrier. Improving data locality by chunking. In CC'12 Intl. Conference on Compiler Construction, LNCS 2622, pages 320--335, Warsaw, Poland, april 2003.
|
 |
7
|
Rastislav Bodík , Rajiv Gupta , Mary Lou Soffa, Load-reuse analysis: design and evaluation, Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, p.64-76, May 01-04, 1999, Atlanta, Georgia, United States
|
 |
8
|
|
| |
9
|
|
 |
10
|
|
| |
11
|
L. Carter, J. Ferrante, and S. F. Hummel. Efficient multiprocessor parallelism via hierarchical tiling. In SIAM Conference on Parallel Processing for Scientific Computing, Feb. 1995.
|
| |
12
|
A. Cohen, S. Girbal, and O. Temam. Facilitating the exploration of compositions of program transformations. Research report 5114, INRIA Futurs, France, Feb. 2004.
|
| |
13
|
|
 |
14
|
Jean-François Collard , Denis Barthou , Paul Feautrier, Fuzzy array dataflow analysis, Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, p.92-101, July 19-21, 1995, Santa Barbara, California, United States
|
| |
15
|
|
 |
16
|
Evelyn Duesterwald , Rajiv Gupta , Mary Lou Soffa, A practical data flow framework for array reference analysis and its use in optimizations, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, p.68-77, June 21-25, 1993, Albuquerque, New Mexico, United States
|
 |
17
|
|
| |
18
|
P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22:243--268, Sept. 1988.
|
| |
19
|
|
| |
20
|
G. Fursin, M. O'Boyle, and P. Knijnenburg. Evaluating iterative compilation. In 11 th Workshop on Languages and Compilers for Parallel Computing, LNCS, Washington DC, July 2002. Springer-Verlag.
|
 |
21
|
|
| |
22
|
W. Kelly. Optimization within a unified transformation framework. Technical Report CS-TR-3725, University of Maryland, 1996.
|
| |
23
|
T. Kisuki, P. Knijnenburg, K. Gallivan, and M. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. In Parallel Architectures and Compilation Techniques (PACT'00). IEEE Computer Society Press, Oct. 2001.
|
 |
24
|
|
 |
25
|
Induprakas Kodukula , Nawaaz Ahmed , Keshav Pingali, Data-centric multi-level blocking, Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, p.346-357, June 16-18, 1997, Las Vegas, Nevada, United States
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
| |
29
|
|
| |
30
|
|
 |
31
|
|
 |
32
|
|
 |
33
|
|
| |
34
|
|
| |
35
|
|
| |
36
|
|
| |
37
|
G.-R. Perrin and A. Darte, editors. The Data Parallel Programming Model. Number 1132 in LNCS. Springer-Verlag, 1996.
|
| |
38
|
|
 |
39
|
|
 |
40
|
|
| |
41
|
|
| |
42
|
R. Schreiber, S. Aditya, B. Rau, V. Kathail, S. Mahlke, S. Abraham, and G. Snider. High-level synthesis of nonprogrammable hardware accelerators. Technical report, Hewlett-Packard, May 2000.
|
 |
43
|
|
 |
44
|
Michelle Mills Strout , Larry Carter , Jeanne Ferrante , Beth Simon, Schedule-independent storage mapping for loops, Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, p.24-33, October 02-07, 1998, San Jose, California, United States
|
 |
45
|
William Thies , Frédéric Vivien , Jeffrey Sheldon , Saman Amarasinghe, A unified framework for schedule and storage optimization, Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, p.232-242, June 2001, Snowbird, Utah, United States
|
| |
46
|
|
| |
47
|
D. Wonnacott and W. Pugh. Nonlinear array dependence analysis. In Proc. Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers, 1995. Troy, New York.
|
CITED BY
|
Mohammed Fellahi , Albert Cohen , Sid Touati, Code-size conscious pipelining of imperfectly nested loops, Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, p.49-55, September 16-16, 2007, Brasov, Romania
|
Peer to Peer - Readers of this Article have also read:
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
-
An intelligent component database for behavioral synthesis
Proceedings of the 27th ACM/IEEE conference on Design automation
Gwo-Dong Chen
, Daniel D. Gajski
|