ACM Home Page
Please provide us with feedback. Feedback
Applications of storage mapping optimization to register promotion
Full text PdfPdf (268 KB)
Source
International Conference on Supercomputing archive
Proceedings of the 18th annual international conference on Supercomputing table of contents
Malo, France
SESSION: Compilers table of contents
Pages: 247 - 256  
Year of Publication: 2004
ISBN:1-58113-839-3
Authors
Patrick Carribault  Bull, Les Clayes-sous-Bois and PRiSM, Université de Versailles
Albert Cohen  Université Paris-Sud
Sponsors
SIGARCH: ACM Special Interest Group on Computer Architecture
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 10,   Downloads (12 Months): 113,   Citation Count: 1
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues   peer to peer  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1006209.1006244
What is a DOI?

ABSTRACT

Storage mapping optimization is a flexible approach to folding array dimensions in numerical codes. It is designed to reduce the memory footprint after a wide spectrum of loop transformations, whether based on uniform dependence vectors or more expressive polyhedral abstractions. Conversely, few loop transformations have been proposed to facilitate register promotion, namely loop fusion, unroll-and-jam or tiling. Building on array data-flow analysis and expansion, we extend storage mapping optimization to improve opportunities for register promotion.Our work is motivated by the empirical study of a computational biology benchmark, the approximate string matching algorithm BPR from NR-grep, on a wide issue micro-architecture. Our experiments confirm the major benefit of register tiling (even on non-numerical benchmarks) but also shed the light on two novel issues: prior array expansion may be necessary to enable loop transformations that finally authorize profitable register promotion, and more advanced scheduling techniques (beyond tiling and unroll-and-jam) may significantly improve performance in fine-tuning register usage and instruction-level parallelism.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
2
 
3
 
4
C. Bastoul. Efficient code generation for automatic parallelization and optimization. In ISPDC'2 IEEE International Symposium on Parallel and Distributed Computing, Ljubjana, Slovenia, Oct. 2003.
 
5
C. Bastoul, A. Cohen, S. Girbal, S. Sharma, and O. Temam. Putting polyhedral loop transformations to work. In Workshop on Languages and Compilers for Parallel Computing (LCPC'03), LNCS, College Station, Texas, Oct. 2003.
 
6
C. Bastoul and P. Feautrier. Improving data locality by chunking. In CC'12 Intl. Conference on Compiler Construction, LNCS 2622, pages 320--335, Warsaw, Poland, april 2003.
7
8
 
9
10
 
11
L. Carter, J. Ferrante, and S. F. Hummel. Efficient multiprocessor parallelism via hierarchical tiling. In SIAM Conference on Parallel Processing for Scientific Computing, Feb. 1995.
 
12
A. Cohen, S. Girbal, and O. Temam. Facilitating the exploration of compositions of program transformations. Research report 5114, INRIA Futurs, France, Feb. 2004.
 
13
14
 
15
16
17
 
18
P. Feautrier. Parametric integer programming. RAIRO Recherche Opérationnelle, 22:243--268, Sept. 1988.
 
19
 
20
G. Fursin, M. O'Boyle, and P. Knijnenburg. Evaluating iterative compilation. In 11 th Workshop on Languages and Compilers for Parallel Computing, LNCS, Washington DC, July 2002. Springer-Verlag.
21
 
22
W. Kelly. Optimization within a unified transformation framework. Technical Report CS-TR-3725, University of Maryland, 1996.
 
23
T. Kisuki, P. Knijnenburg, K. Gallivan, and M. O'Boyle. The effect of cache models on iterative compilation for combined tiling and unrolling. In Parallel Architectures and Compilation Techniques (PACT'00). IEEE Computer Society Press, Oct. 2001.
24
25
 
26
27
28
 
29
 
30
31
32
33
 
34
 
35
 
36
 
37
G.-R. Perrin and A. Darte, editors. The Data Parallel Programming Model. Number 1132 in LNCS. Springer-Verlag, 1996.
 
38
39
40
 
41
 
42
R. Schreiber, S. Aditya, B. Rau, V. Kathail, S. Mahlke, S. Abraham, and G. Snider. High-level synthesis of nonprogrammable hardware accelerators. Technical report, Hewlett-Packard, May 2000.
43
44
45
 
46
 
47
D. Wonnacott and W. Pugh. Nonlinear array dependence analysis. In Proc. Third Workshop on Languages, Compilers and Run-Time Systems for Scalable Computers, 1995. Troy, New York.


Collaborative Colleagues:
Patrick Carribault: colleagues
Albert Cohen: colleagues

Peer to Peer - Readers of this Article have also read: