ABSTRACT
Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.
In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.
In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.
Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.
We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.
We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.
- A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers, Principles, Techniques, and Tools. Pearson Addison Wesley, 2007. ISBN 0-321-49169-6. Google ScholarDigital Library
- R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002. ISBN 1-55860-286-0. Google ScholarDigital Library
- J. Backus. Can Programming Be Liberated from the Von Neumann Style?: A Functional Style and Its Algebra of Programs. Commun. ACM, 21 (8): 613--641, Aug. 1978. Google ScholarDigital Library
- E. Barendsen and S. Smetsers. Conventional and Uniqueness Typing in Graph Rewrite Systems. In Found. of Soft. Tech. and Theoretical Comp. Sci. (FSTTCS), volume 761 of LNCS, pages 41--51, 1993. Google ScholarDigital Library
- E. Barendsen and S. Smetsers. Uniqueness Typing for Functional Languages with Graph Rewriting Semantics. Mathematical Structures in Computer Science, 6 (6): 579--612, 1996.Google Scholar
- L. Bergstrom and J. Reppy. Nested Data-Parallelism on the GPU. In Procs. of Int. Conf. Funct. Prog. (ICFP), pages 247--258. ACM, 2012. Google ScholarDigital Library
- R. S. Bird. An Introduction to the Theory of Lists. In NATO Inst. on Logic of Progr. and Calculi of Discrete Design, pages 5--42, 1987. Google ScholarDigital Library
- R. S. Bird. Algebraic Identities for Program Calculation. The Computer Journal, 32 (2): 122--126, 1989. Google ScholarDigital Library
- G. Blelloch. Programming Parallel Algorithms. Communications of the ACM (CACM), 39 (3): 85--97, 1996. Google ScholarDigital Library
- G. E. Blelloch. Scans as Primitive Parallel Operations. Computers, IEEE Transactions, 38 (11): 1526--1538, 1989. Google ScholarDigital Library
- G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a Portable Nested Data-Parallel Language. Journal of parallel and distributed computing, 21 (1): 4--14, 1994. Google ScholarDigital Library
- W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-Linear Expressions. In Procs. Int. Conf. on Supercomp. (ICS), pages 528--537, 1994. Google ScholarDigital Library
- M. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A Status Report. In Int. Work. on Decl. Aspects of Multicore Prog. (DAMP), pages 10--18, 2007. Google ScholarDigital Library
- T. E. Cheatham Jr. Programming Language Design Issues. In Design and Implem. of Prog. Lang., pages 399--435. Springer, 1977. Google ScholarDigital Library
- K. Claessen, M. Sheeran, and B. J. Svensson. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In Work. on Decl. Aspects of Multicore Prog DAMP, pages 21--30, 2012. Google ScholarDigital Library
- D. Coutts, D. Stewart, and R. Leshchinskiy. Rewriting Haskell Strings. In Practical Aspects of Decl. Lang., pages 50--64. Springer, 2007. Google ScholarDigital Library
- A. Gill, J. Launchbury, and S. L. Peyton Jones. A Short Cut to Deforestation. In Procs. of Int. Conf. on Functional Prog. Lang. and Computer Arch., pages 223--232. ACM, 1993. Google ScholarDigital Library
- C. Grelck and S.-B. Scholz. SAC - A Functional Array Language for Efficient Multi-Threaded Execution. International Journal of Parallel Programming, 34 (4): 383--427, 2006. Google ScholarDigital Library
- M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural Parallelization Analysis in SUIF. Trans. on Prog. Lang. and Sys. (TOPLAS), 27(4): 662--731, 2005. Google ScholarDigital Library
- G. Hoare. The Rust Programming Language, June 2013. URL http://www.rust-lang.org/.Google Scholar
- S. P. Jones, A. Tolmach, and T. Hoare. Playing by the Rules: Rewriting as a Practical Optimisation Technique in GHC. In Haskell Workshop, volume 1, pages 203--233, 2001.Google Scholar
- G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, Shape-Polymorphic, Parallel Arrays in Haskell. ACM Sigplan Notices, 45 (9): 261--272, 2010. Google ScholarDigital Library
- T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier. Optimising Purely Functional GPU Programs. In Procs. of Int. Conf. Funct. Prog. (ICFP), 2013. Google ScholarDigital Library
- S. P. Midki and D. A. Padua. Issues in the Compile-Time Optimization of Parallel Programs. In Procs. of Int. Conf. on Parallel Processing, volume 2, pages 105--113, 1990.Google Scholar
- C. Munk. Introduction to the Numerical Solution of Partial Differential Equations in Finance. 2007.Google Scholar
- C. Oancea, C. Andreetta, J. Berthold, A. Frisch, and F. Henglein. Financial Software on GPUs: between Haskell and Fortran. In Funct. High-Perf. Comp. (FHPC'12), 2012. Google ScholarDigital Library
- C. E. Oancea and L. Rauchwerger. Logical Inference Techniques for Loop Parallelization. In Procs. of Int. Conf. Prog. Lang. Design and Impl. (PLDI), pages 509--520, 2012. Google ScholarDigital Library
- C. E. Oancea and L. Rauchwerger. A Hybrid Approach to Proving Memory Reference Monotonicity. In Int. Lang. Comp. Par. Comp. (LCPC'11), volume 7146 of LNCS, pages 61--75, 2013.Google Scholar
- S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid Analysis: Static & Dynamic Memory Reference Analysis. Int. Journal of Par. Prog, 31(3): 251--283, 2003. Google ScholarDigital Library
Index Terms
A T2 graph-reduction approach to fusion
Recommendations
Financial software on GPUs: between Haskell and Fortran
FHPC '12: Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computingThis paper presents a real-world pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardware-neutral algorithm implementation and efficient execution on graphics-processing units (GPU)...
Bounds Checking: An Instance of Hybrid Analysis
ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array ProgrammingThis paper presents an analysis for bounds checking of array subscripts that lifts checking assertions to program level under the form of an arbitrarily-complex predicate (inspector), whose runtime evaluation guards the execution of the code of ...
Using fusion to enable late design decisions for pipelined computations
FHPC 2016: Proceedings of the 5th International Workshop on Functional High-Performance ComputingWe present an embedded language in Haskell for programming pipelined computations. The language is a combination of Feldspar (a functional language for array computations) and a new implementation of Ziria (a language for describing streaming ...
Comments