skip to main content
10.1145/2555243.2555268acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Authors Info & Claims
Published:06 February 2014Publication History

ABSTRACT

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input.

We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23--100% of its performance on a 128-core cluster.

References

  1. N. Bell and J. Hoberock. Thrust: A productivity-oriented library for CUDA. GPU Computing Gems Jade Edition, page 359, 2011.Google ScholarGoogle Scholar
  2. G. Blelloch, J. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. H. Boehm and M. Weiser. Garbage collection in an uncooperative environment. Software--Practice & Experience, 18 (9): 807--820, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. In Proc. ACM Symposium on Principles and Practice of Parallel Programming, pages 47--56, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Chakravarty and G. Keller. Functional array fusion. In Proc. ACM International Conference on Functional Programming, pages 205--216, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Keller, and Marlow}chakravarty:07:dampM. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In Proc. Workshop on Declarative Aspects of Multicore Programming, pages 10--18, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proc. ACM International Conference on Functional Programming, pages 315--326, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. T. Feo, D. C. Cann, and R. R. Oldehoeft. A report on the sisal language project. Journal of Parallel and Distributed Computing, 10 (4): 349--366, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in manticore. Journal of Functional Programming, 20 (5--6): 537--576, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. González-Vélez and M. Leyton. A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software: Practice and Experience, 40 (12): 1135--1160, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. Herrmann and C. Lengauer. HDC: A higher-order language for divide-and-conquer. Parallel Processing Letters, 10 (2--3): 239--250, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  12. G. Keller, M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proc. ACM International Conference on Functional Programming, pages 261--272, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proc. International Workshop on Languages and Compilers for Parallel Computing, pages 301--320, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H.-W. Loidl, F. Rubio, N. Scaife, K. Hammond, S. Horiguchi, U. Klusik, R. Loogen, G. Michaelson, R. Pena, S. Priebe, Á. Rebón, and P. Trinder. Comparing parallel functional languages: Programming and performance. Higher-Order and Symbolic Computation, 16 (3): 203--251, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Loogen, Y. Ortega-mallén, and R. Pena-marí. Parallel functional programming in Eden. Journal of Functional Programming, 15 (3): 431--475, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Meijer. Confessions of a used programming language salesman. In Proc. ACM Conference on Object-oriented Programming, Systems, Languages, and Applications, pages 677--694, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In Proc. European Symposium on Programming, pages 18--44, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Peyton Jones, A. Tolmach, and T. Hoare. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proc. ACM Haskell Workshop, pages 203--233, 2001.Google ScholarGoogle Scholar
  19. A. Prokopec, T. Rompf, P. Bagwell, and M. Odersky. A generic parallel collection framework. In Euro-Par'11: Proceedings of the International Conference on Parallel Processing, pages 136--147, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T. Rompf, A. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In Proc. ACM Symposium on Principles of Programming Languages, pages 497--510, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. N. Scaife, G. Michaelson, and S. Horiguchi. Comparative cross-platform performance results from a parallelizing SML compiler. In Proc. International Workshop on Implementation of Functional Languages, pages 138--154, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. N. Scaife, S. Horiguchi, and G. M. P. Bristow. A parallel SML compiler based on algorithmic skeletons. Journal of Functional Programming, (4): 615--650, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S.-B. Scholz. Single Assignment C--efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 3 (6): 1005--1059, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. Liu, and W.-M. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, 2012.Google ScholarGoogle Scholar
  25. H. Tanno and H. Iwasaki. Parallel skeletons for variable-length lists in SkeTo skeleton library. In Euro-Par 2009 Parallel Processing, volume 5704 of phLecture Notes in Computer Science, pages 666--677. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In Proc. ACM Conference on Programming Language Design and Implementation, pages 181--192, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Wadler. List comprehensions. In S. Peyton Jones, editor, The Implementation of Functional Programming Languages, pages 127--138. Prentice Hall, 1987.Google ScholarGoogle Scholar
  28. P. Wadler. Deforestation: transforming programs to eliminate trees. Theoretical Computer Science, 73 (2): 231--248, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
              February 2014
              412 pages
              ISBN:9781450326568
              DOI:10.1145/2555243

              Copyright © 2014 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 6 February 2014

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              PPoPP '14 Paper Acceptance Rate28of184submissions,15%Overall Acceptance Rate230of1,014submissions,23%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader