research-article

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Authors:
Christopher Rodrigues

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Thomas Jablin

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Abdul Dakkak

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

,
Wen-Mei Hwu

University of Illinois at Urbana-Champaign, Urbana, IL, USA

University of Illinois at Urbana-Champaign, Urbana, IL, USA
View Profile

PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programmingFebruary 2014Pages 247–258https://doi.org/10.1145/2555243.2555268

Published:06 February 2014Publication History

PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Pages 247–258

ABSTRACT

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input.

We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23--100% of its performance on a 128-core cluster.

References

N. Bell and J. Hoberock. Thrust: A productivity-oriented library for CUDA. GPU Computing Gems Jade Edition, page 359, 2011.Google Scholar
G. Blelloch, J. Hardwick, J. Sipelstein, M. Zagha, and S. Chatterjee. Implementation of a portable nested data-parallel language. Journal of Parallel and Distributed Computing, 21 (1): 4--14, 1994. Google ScholarDigital Library
H. Boehm and M. Weiser. Garbage collection in an uncooperative environment. Software--Practice & Experience, 18 (9): 807--820, 1988. Google ScholarDigital Library
B. Catanzaro, M. Garland, and K. Keutzer. Copperhead: Compiling an embedded data parallel language. In Proc. ACM Symposium on Principles and Practice of Parallel Programming, pages 47--56, 2011. Google ScholarDigital Library
M. Chakravarty and G. Keller. Functional array fusion. In Proc. ACM International Conference on Functional Programming, pages 205--216, 2001. Google ScholarDigital Library
Keller, and Marlow}chakravarty:07:dampM. Chakravarty, R. Leshchinskiy, S. Peyton Jones, G. Keller, and S. Marlow. Data Parallel Haskell: A status report. In Proc. Workshop on Declarative Aspects of Multicore Programming, pages 10--18, 2007. Google ScholarDigital Library
D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proc. ACM International Conference on Functional Programming, pages 315--326, 2007. Google ScholarDigital Library
J. T. Feo, D. C. Cann, and R. R. Oldehoeft. A report on the sisal language project. Journal of Parallel and Distributed Computing, 10 (4): 349--366, 1990. Google ScholarDigital Library
M. Fluet, M. Rainey, J. Reppy, and A. Shaw. Implicitly threaded parallelism in manticore. Journal of Functional Programming, 20 (5--6): 537--576, 2010. Google ScholarDigital Library
H. González-Vélez and M. Leyton. A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software: Practice and Experience, 40 (12): 1135--1160, 2010. Google ScholarDigital Library
C. Herrmann and C. Lengauer. HDC: A higher-order language for divide-and-conquer. Parallel Processing Letters, 10 (2--3): 239--250, 2000.Google ScholarCross Ref
G. Keller, M. Chakravarty, R. Leshchinskiy, S. Peyton Jones, and B. Lippmeier. Regular, shape-polymorphic, parallel arrays in Haskell. In Proc. ACM International Conference on Functional Programming, pages 261--272, 2010. Google ScholarDigital Library
K. Kennedy and K. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proc. International Workshop on Languages and Compilers for Parallel Computing, pages 301--320, 1994. Google ScholarDigital Library
H.-W. Loidl, F. Rubio, N. Scaife, K. Hammond, S. Horiguchi, U. Klusik, R. Loogen, G. Michaelson, R. Pena, S. Priebe, Á. Rebón, and P. Trinder. Comparing parallel functional languages: Programming and performance. Higher-Order and Symbolic Computation, 16 (3): 203--251, 2003. Google ScholarDigital Library
R. Loogen, Y. Ortega-mallén, and R. Pena-marí. Parallel functional programming in Eden. Journal of Functional Programming, 15 (3): 431--475, 2005. Google ScholarDigital Library
E. Meijer. Confessions of a used programming language salesman. In Proc. ACM Conference on Object-oriented Programming, Systems, Languages, and Applications, pages 677--694, 2007. Google ScholarDigital Library
S. Peyton Jones. Compiling Haskell by program transformation: A report from the trenches. In Proc. European Symposium on Programming, pages 18--44, 1996. Google ScholarDigital Library
S. Peyton Jones, A. Tolmach, and T. Hoare. Playing by the rules: Rewriting as a practical optimisation technique in GHC. In Proc. ACM Haskell Workshop, pages 203--233, 2001.Google Scholar
A. Prokopec, T. Rompf, P. Bagwell, and M. Odersky. A generic parallel collection framework. In Euro-Par'11: Proceedings of the International Conference on Parallel Processing, pages 136--147, 2011. Google ScholarDigital Library
T. Rompf, A. Sujeeth, N. Amin, K. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: new directions for extensible compilers based on staging. In Proc. ACM Symposium on Principles of Programming Languages, pages 497--510, 2013. Google ScholarDigital Library
N. Scaife, G. Michaelson, and S. Horiguchi. Comparative cross-platform performance results from a parallelizing SML compiler. In Proc. International Workshop on Implementation of Functional Languages, pages 138--154, 2002. Google ScholarDigital Library
N. Scaife, S. Horiguchi, and G. M. P. Bristow. A parallel SML compiler based on algorithmic skeletons. Journal of Functional Programming, (4): 615--650, 2005. Google ScholarDigital Library
S.-B. Scholz. Single Assignment C--efficient support for high-level array operations in a functional setting. Journal of Functional Programming, 3 (6): 1005--1059, 2003. Google ScholarDigital Library
J. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. Liu, and W.-M. Hwu. Parboil: A revised benchmark suite for scientific and commercial throughput computing. Technical Report IMPACT-12-01, University of Illinois at Urbana-Champaign, 2012.Google Scholar
H. Tanno and H. Iwasaki. Parallel skeletons for variable-length lists in SkeTo skeleton library. In Euro-Par 2009 Parallel Processing, volume 5704 of phLecture Notes in Computer Science, pages 666--677. 2009. Google ScholarDigital Library
D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harper, and P. Lee. TIL: A type-directed optimizing compiler for ML. In Proc. ACM Conference on Programming Language Design and Implementation, pages 181--192, 1996. Google ScholarDigital Library
P. Wadler. List comprehensions. In S. Peyton Jones, editor, The Implementation of Functional Programming Languages, pages 127--138. Prentice Hall, 1987.Google Scholar
P. Wadler. Deforestation: transforming programs to eliminate trees. Theoretical Computer Science, 73 (2): 231--248, 1988. Google ScholarDigital Library

Index Terms

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing
1. Software and its engineering
  1. Software notations and tools
    1. Compilers
    2. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Functional languages
        Parallel programming languages

Recommendations

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing
PPoPP '14

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton ...
Read More
Basic skeletons in 11c
Algorithmic skeletons

11c is a high-level parallel language that provides support for some of the most widely used algorithmic skeletons. The language has a syntax based on OpenMP-like directives and the compiler uses direct translation to MPI to produce parallel code. To ...
Read More
A pattern-supported parallelization approach
PMAM '13: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores

In the embedded systems domain a trend towards multi-and many-core processors is evident. For the exploitation of these additional processing elements parallel software is inevitable. The pattern-supported parallelization approach, which is introduced ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
February 2014
412 pages
ISBN:9781450326568
DOI:10.1145/2555243
General Chair:
José Moreira
IBM Research, USA
,
Program Chair:
James Larus
EPFL, Switzerland
ACM SIGPLAN Notices Volume 49, Issue 8
PPoPP '14
August 2014
390 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2692916
Editors:
Mark W. Bailey
Hamilton College, Clinton, NY
,
Rajeev Balasubramonian
University of Utah
,
Al Davis
University of Utah
,
Sarita Adve
University of Illinois at Urbana-Champ
Issue’s Table of Contents
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithmic skeletons
loop fusion
parallel programming
Qualifiers
- research-article
Conference

Acceptance Rates
PPoPP '14 Paper Acceptance Rate28of184submissions,15%Overall Acceptance Rate230of1,014submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 333
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

PPoPP '14: Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

ABSTRACT

References

Cited By

Index Terms

Recommendations

Triolet: a programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Basic skeletons in 11c

A pattern-supported parallelization approach