skip to main content
10.1145/2502323.2502328acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

A T2 graph-reduction approach to fusion

Published:23 September 2013Publication History

ABSTRACT

Fusion is one of the most important code transformations as it has the potential to substantially optimize both the memory hierarchy time overhead and, sometimes asymptotically, the space requirement.

In functional languages, fusion is naturally and relatively easily derived as a producer-consumer relation between program constructs that expose a richer, higher-order algebra of program invariants, such as the map-reduce list homomorphisms.

In imperative languages, fusing producer-consumer loops requires dependency analysis on arrays applied at loop-nest level. Such analysis, however, has often been labeled as "heroic effort" and, if at all, is supported only in its simplest and most conservative form in industrial compilers.

Related implementations in the functional context typically apply fusion only when the to-be-fused producer is used exactly once, i.e., in the consumer. This guarantees that the transformation is conservative: the resulting program does not duplicate computation.

We show that the above restriction is more conservative than needed, and present a structural-analysis technique, inspired from the T1--T2 transformation for reducible data flow, that enables fusion even in some cases when the producer is used in different consumers and without duplicating computation.

We report an implementation of the fusion algorithm for a functional-core language, named L0, which is intended to support nested parallelism across regular multi-dimensional arrays. We succinctly describe L0's semantics and the compiler infrastructure on which the fusion transformation relies, and present compiler-generated statistics related to fusion on a set of six benchmarks.

References

  1. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman. Compilers, Principles, Techniques, and Tools. Pearson Addison Wesley, 2007. ISBN 0-321-49169-6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Allen and K. Kennedy. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, 2002. ISBN 1-55860-286-0. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Backus. Can Programming Be Liberated from the Von Neumann Style?: A Functional Style and Its Algebra of Programs. Commun. ACM, 21 (8): 613--641, Aug. 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. E. Barendsen and S. Smetsers. Conventional and Uniqueness Typing in Graph Rewrite Systems. In Found. of Soft. Tech. and Theoretical Comp. Sci. (FSTTCS), volume 761 of LNCS, pages 41--51, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Barendsen and S. Smetsers. Uniqueness Typing for Functional Languages with Graph Rewriting Semantics. Mathematical Structures in Computer Science, 6 (6): 579--612, 1996.Google ScholarGoogle Scholar
  6. L. Bergstrom and J. Reppy. Nested Data-Parallelism on the GPU. In Procs. of Int. Conf. Funct. Prog. (ICFP), pages 247--258. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. S. Bird. An Introduction to the Theory of Lists. In NATO Inst. on Logic of Progr. and Calculi of Discrete Design, pages 5--42, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. S. Bird. Algebraic Identities for Program Calculation. The Computer Journal, 32 (2): 122--126, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Blelloch. Programming Parallel Algorithms. Communications of the ACM (CACM), 39 (3): 85--97, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. E. Blelloch. Scans as Primitive Parallel Operations. Computers, IEEE Transactions, 38 (11): 1526--1538, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. E. Blelloch, J. C. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a Portable Nested Data-Parallel Language. Journal of parallel and distributed computing, 21 (1): 4--14, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Blume and R. Eigenmann. The Range Test: A Dependence Test for Symbolic, Non-Linear Expressions. In Procs. Int. Conf. on Supercomp. (ICS), pages 528--537, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. Chakravarty, R. Leshchinskiy, S. P. Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A Status Report. In Int. Work. on Decl. Aspects of Multicore Prog. (DAMP), pages 10--18, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. E. Cheatham Jr. Programming Language Design Issues. In Design and Implem. of Prog. Lang., pages 399--435. Springer, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Claessen, M. Sheeran, and B. J. Svensson. Expressive Array Constructs in an Embedded GPU Kernel Programming Language. In Work. on Decl. Aspects of Multicore Prog DAMP, pages 21--30, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Coutts, D. Stewart, and R. Leshchinskiy. Rewriting Haskell Strings. In Practical Aspects of Decl. Lang., pages 50--64. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Gill, J. Launchbury, and S. L. Peyton Jones. A Short Cut to Deforestation. In Procs. of Int. Conf. on Functional Prog. Lang. and Computer Arch., pages 223--232. ACM, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Grelck and S.-B. Scholz. SAC - A Functional Array Language for Efficient Multi-Threaded Execution. International Journal of Parallel Programming, 34 (4): 383--427, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. W. Hall, S. P. Amarasinghe, B. R. Murphy, S.-W. Liao, and M. S. Lam. Interprocedural Parallelization Analysis in SUIF. Trans. on Prog. Lang. and Sys. (TOPLAS), 27(4): 662--731, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. G. Hoare. The Rust Programming Language, June 2013. URL http://www.rust-lang.org/.Google ScholarGoogle Scholar
  21. S. P. Jones, A. Tolmach, and T. Hoare. Playing by the Rules: Rewriting as a Practical Optimisation Technique in GHC. In Haskell Workshop, volume 1, pages 203--233, 2001.Google ScholarGoogle Scholar
  22. G. Keller, M. M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, Shape-Polymorphic, Parallel Arrays in Haskell. ACM Sigplan Notices, 45 (9): 261--272, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. L. McDonell, M. M. Chakravarty, G. Keller, and B. Lippmeier. Optimising Purely Functional GPU Programs. In Procs. of Int. Conf. Funct. Prog. (ICFP), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. P. Midki and D. A. Padua. Issues in the Compile-Time Optimization of Parallel Programs. In Procs. of Int. Conf. on Parallel Processing, volume 2, pages 105--113, 1990.Google ScholarGoogle Scholar
  25. C. Munk. Introduction to the Numerical Solution of Partial Differential Equations in Finance. 2007.Google ScholarGoogle Scholar
  26. C. Oancea, C. Andreetta, J. Berthold, A. Frisch, and F. Henglein. Financial Software on GPUs: between Haskell and Fortran. In Funct. High-Perf. Comp. (FHPC'12), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. E. Oancea and L. Rauchwerger. Logical Inference Techniques for Loop Parallelization. In Procs. of Int. Conf. Prog. Lang. Design and Impl. (PLDI), pages 509--520, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. E. Oancea and L. Rauchwerger. A Hybrid Approach to Proving Memory Reference Monotonicity. In Int. Lang. Comp. Par. Comp. (LCPC'11), volume 7146 of LNCS, pages 61--75, 2013.Google ScholarGoogle Scholar
  29. S. Rus, J. Hoeflinger, and L. Rauchwerger. Hybrid Analysis: Static & Dynamic Memory Reference Analysis. Int. Journal of Par. Prog, 31(3): 251--283, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A T2 graph-reduction approach to fusion

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FHPC '13: Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
          September 2013
          104 pages
          ISBN:9781450323819
          DOI:10.1145/2502323

          Copyright © 2013 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 September 2013

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          FHPC '13 Paper Acceptance Rate8of14submissions,57%Overall Acceptance Rate18of25submissions,72%

          Upcoming Conference

          ICFP '24

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader