ABSTRACT
Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input.
We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23--100% of its performance on a 128-core cluster.
- N. Bell and J. Hoberock. Thrust: A productivity-oriented library for CUDA. GPU Computing Gems Jade Edition, page 359, 2011.Google Scholar
- G. Blelloch, J. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, 1994. Google ScholarDigital Library
- H. Boehm and M. Weiser. Garbage collection in an uncooperative environment. Software--Practice & Experience, 18 (9): 807--820, 1988. Google ScholarDigital Library
- B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. In Proc. ACM Symposium on Principles and Practice of Parallel Programming, pages 47--56, 2011. Google ScholarDigital Library
- M. Chakravarty and G. Keller. Functional array fusion. In Proc. ACM International Conference on Functional Programming, pages 205--216, 2001. Google ScholarDigital Library
- Keller, and Marlow}chakravarty:07:dampM. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In Proc. Workshop on Declarative Aspects of Multicore Programming, pages 10--18, 2007. Google ScholarDigital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proc. ACM International Conference on Functional Programming, pages 315--326, 2007. Google ScholarDigital Library
- J. T. Feo, D. C. Cann, and R. R. Oldehoeft. A report on the sisal language project. Journal of Parallel and Distributed Computing, 10 (4): 349--366, 1990. Google ScholarDigital Library
- M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in manticore. Journal of Functional Programming, 20 (5--6): 537--576, 2010. Google ScholarDigital Library
- H. González-Vélez and M. Leyton. A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software: Practice and Experience, 40 (12): 1135--1160, 2010. Google ScholarDigital Library
- C. Herrmann and C. Lengauer. HDC: A higher-order language for divide-and-conquer. Parallel Processing Letters, 10 (2--3): 239--250, 2000.Google ScholarCross Ref
- G. Keller, M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proc. ACM International Conference on Functional Programming, pages 261--272, 2010. Google ScholarDigital Library
- K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proc. International Workshop on Languages and Compilers for Parallel Computing, pages 301--320, 1994. Google ScholarDigital Library
- H.-W. Loidl, F. Rubio, N. Scaife, K. Hammond, S. Horiguchi, U. Klusik, R. Loogen, G. Michaelson, R. Pena, S. Priebe, Á. Rebón, and P. Trinder. Comparing parallel functional languages: Programming and performance. Higher-Order and Symbolic Computation, 16 (3): 203--251, 2003. Google ScholarDigital Library
- R. Loogen, Y. Ortega-mallén, and R. Pena-marí. Parallel functional programming in Eden. Journal of Functional Programming, 15 (3): 431--475, 2005. Google ScholarDigital Library
- E. Meijer. Confessions of a used programming language salesman. In Proc. ACM Conference on Object-oriented Programming, Systems, Languages, and Applications, pages 677--694, 2007. Google ScholarDigital Library
- S. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In Proc. European Symposium on Programming, pages 18--44, 1996. Google ScholarDigital Library
- S. Peyton Jones, A. Tolmach, and T. Hoare. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proc. ACM Haskell Workshop, pages 203--233, 2001.Google Scholar
- A. Prokopec, T. Rompf, P. Bagwell, and M. Odersky. A generic parallel collection framework. In Euro-Par'11: Proceedings of the International Conference on Parallel Processing, pages 136--147, 2011. Google ScholarDigital Library
- T. Rompf, A. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In Proc. ACM Symposium on Principles of Programming Languages, pages 497--510, 2013. Google ScholarDigital Library
- N. Scaife, G. Michaelson, and S. Horiguchi. Comparative cross-platform performance results from a parallelizing SML compiler. In Proc. International Workshop on Implementation of Functional Languages, pages 138--154, 2002. Google ScholarDigital Library
- N. Scaife, S. Horiguchi, and G. M. P. Bristow. A parallel SML compiler based on algorithmic skeletons. Journal of Functional Programming, (4): 615--650, 2005. Google ScholarDigital Library
- S.-B. Scholz. Single Assignment C--efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 3 (6): 1005--1059, 2003. Google ScholarDigital Library
- J. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. Liu, and W.-M. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, 2012.Google Scholar
- H. Tanno and H. Iwasaki. Parallel skeletons for variable-length lists in SkeTo skeleton library. In Euro-Par 2009 Parallel Processing, volume 5704 of phLecture Notes in Computer Science, pages 666--677. 2009. Google ScholarDigital Library
- D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In Proc. ACM Conference on Programming Language Design and Implementation, pages 181--192, 1996. Google ScholarDigital Library
- P. Wadler. List comprehensions. In S. Peyton Jones, editor, The Implementation of Functional Programming Languages, pages 127--138. Prentice Hall, 1987.Google Scholar
- P. Wadler. Deforestation: transforming programs to eliminate trees. Theoretical Computer Science, 73 (2): 231--248, 1988. Google ScholarDigital Library
Index Terms
- Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing
Recommendations
Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing
PPoPP '14Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton ...
Basic skeletons in 11c
Algorithmic skeletons11c is a high-level parallel language that provides support for some of the most widely used algorithmic skeletons. The language has a syntax based on OpenMP-like directives and the compiler uses direct translation to MPI to produce parallel code. To ...
A pattern-supported parallelization approach
PMAM '13: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and ManycoresIn the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced ...
Comments