research-article

Public Access

Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation

Authors:
Tao B. Schardl

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
View Profile

,
William S. Moses

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
View Profile

,
Charles E. Leiserson

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA

MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
View Profile

Authors Info & Claims

ACM Transactions on Parallel Computing Volume 6 Issue 4Article No.: 19pp 1–33https://doi.org/10.1145/3365655

Published:17 December 2019Publication History

ACM Transactions on Parallel Computing

Abstract

Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the “serial-projection property,” which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler.

For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program’s control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3,000 lines of LLVM’s half-million-line core middle-end functionality.

These changes sufficed to enable LLVM’s existing compiler optimizations for serial code—including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination—to work with parallel control constructs such as parallel loops and Cilk’s Cilk_Spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling, which restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way.

References

Shivali Agarwal, Rajkishore Barik, Vivek Sarkar, and Rudrapatna K. Shyamasundar. 2007. May-happen-in-parallel analysis of X10 programs. In Proceedings of PPoPP. 183--193.Google Scholar
Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of IPDPS. 1--12.Google ScholarCross Ref
Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Addison-Wesley.Google ScholarDigital Library
Jonathan Aldrich, Craig Chambers, EminGun Sirer, and Susan Eggers. 1999. Static analyses for eliminating unnecessary synchronization from Java programs. In Static Analysis, Agostino Cortesi and Gilberto Filé (Eds.). Lecture Notes in Computer Science, Vol. 1694. 19--38.Google Scholar
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of SPAA. 119--129.Google Scholar
E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Yuan Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and Guansong Zhang. 2009. The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20, 3 (2009), 404--418.Google ScholarDigital Library
Rajkishore Barik and Vivek Sarkar. 2009. Interprocedural load elimination for dynamic optimization of parallel programs. In Proceedings of PACT. 41--52.Google ScholarDigital Library
Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2013. Interprocedural strength reduction of critical sections in explicitly-parallel programs. In Proceedings of PACT. 29--40.Google ScholarCross Ref
Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In Proceedings of ASPLOS.Google ScholarDigital Library
Emery D. Berger, Ting Yang, Tongping Liu, and Gene Novark. 2009. Grace: Safe multithreaded programming for C/C++. In Proceedings of OOPSLA. 81--96.Google ScholarDigital Library
Guy E. Blelloch. 1996. Programming parallel algorithms. Commun. ACM 39, 3 (Mar. 1996).Google ScholarDigital Library
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proceedings of PPoPP. 181--192.Google ScholarDigital Library
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37, 1 (1996), 55--69.Google ScholarDigital Library
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (1999), 720--748.Google ScholarDigital Library
Robert L. Bocchino, Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir. 2009. Parallel programming must be deterministic by default. In Proceedings of HotPar.Google Scholar
Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ concurrency memory model. In Proceedings of PLDI. 68--78.Google Scholar
Luca Cardelli. 1997. Program fragments, linking, and modularization. In Proceedings of POPL. 266--277.Google ScholarDigital Library
Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In Proceedings of PPPJ. 51--61.Google ScholarDigital Library
Prasanth Chatarasi, Jun Shirako, and Vivek Sarkar. 2015. Polyhedral optimizations of explicitly parallel programs. In Proceedings of PACT. 213--226.Google ScholarDigital Library
John S. Danaher, I.-Ting Angelina Lee, and Charles E. Leiserson. 2008. Programming with exceptions in JCilk. Sci. Comput. Program. 63, 2 (Dec. 2008), 147--171.Google Scholar
Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic shared memory multiprocessing. In Proceedings of ASPLOS. 85--96.Google ScholarDigital Library
Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A relaxed consistency deterministic computer. In Proceedings of ASPLOS. 67--78.Google ScholarDigital Library
Wei Du, Renato Ferreira, and Gagan Agrawal. 2003. Compiler support for exploiting coarse-grained pipelined parallelism. In Proceedings of SC. 8--21.Google ScholarDigital Library
Mingdong Feng and Charles E. Leiserson. 1997. Efficient detection of determinacy races in Cilk programs. In Proceedings of SPAA.Google Scholar
Mingdong Feng and Charles E. Leiserson. 1999. Efficient detection of determinacy races in Cilk programs. Theory Comput. Syst. 32, 3 (1999), 301--326.Google ScholarCross Ref
Jeremy T. Fineman and Charles E. Leiserson. 2011. Race detectors for Cilk and Cilk++ programs. In Encyclopedia of Parallel Computing, David Padua (Ed.). 1706--1719.Google Scholar
The MPI Forum. 1993. MPI: A message passing interface. In Proceedings of Supercomputing. 878--883.Google ScholarDigital Library
Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proceedings of SPAA. 79--90.Google ScholarDigital Library
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of PLDI. 212--223.Google Scholar
GCC Team. 2014. GCC 4.9 Release Series Changes, New Features, and Fixes. Retrieved from https://gcc.gnu.org/gcc-4.9/changes.html.Google Scholar
GCC Team. 2015. GOMP—An OpenMP Implementation for GCC. Retrieved from https://gcc.gnu.org/projects/gomp/.Google Scholar
P. B. Gibbons. 1989. A more practical PRAM model. In Proceedings of SPAA. 158--168.Google ScholarDigital Library
Dan Grossman and Ruth E. Anderson. 2012. Introducing parallelism and concurrency in the data structures course. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE ’12). ACM, New York, NY, 505--510.Google Scholar
Dirk Grunwald and Harini Srinivasan. 1993. Data flow equations for explicitly parallel programs. In Proceedings of PPoPP. 159--168.Google ScholarDigital Library
Pablo Halpern. 2012. Strict Fork-Join Parallelism. Technical Report N3409. Intel Corporation.Google Scholar
Pablo Halpern and Charles E. Leiserson. 2013. Thread-Local Storage in X-Parallel Computations. Technical Report N3556. Intel Corporation and MIT.Google Scholar
Robert H. Halstead, Jr. 1985. Multilisp: A language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501--538.Google ScholarDigital Library
Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview scalability analyzer. In Proceedings of SPAA.Google Scholar
Michael A. Heroux, Douglas W. Doerfler, Paul S. Crozier, James M. Willenbring, H. Carter Edwards, Alan Williams, Mahesh Rajan, Eric R. Keiter, Heidi K. Thornquist, and Robert W. Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.Google Scholar
C. A. R. Hoare. 1961. Algorithm 64: Quicksort. Commun. ACM 4, 7 (1961), 321.Google ScholarDigital Library
L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, J. K. Hollingsworth, and M. V. Zelkowitz. 2005. Parallel programmer productivity: A case study of novice parallel programmers. In Proceedings of SC.Google Scholar
Derek R. Hower, Polina Dudnik, Mark D. Hill, and David A. Wood. 2011. Calvin: Deterministic or not? Free will to choose. In Proceedings of HPCA. 333--334.Google Scholar
Institute of Electrical and Electronic Engineers. [n.d.]. Information Technology—Portable Operating System Interface (POSIX)—Part 1: System Application Program Interface (API) [C Language]. IEEE Standard 1003.1, 1996 Edition.Google Scholar
Intel Corporation. 2010. Intel Cilk Plus Application Binary Interface Specification. Document Number: 324512-001US. Retrieved from https://software.intel.com/sites/products/cilk-plus/cilk_plus_abi.pdf.Google Scholar
Intel Corporation. 2010. Intel Cilk Plus Language Specification. Document Number: 324396-001US. Retrieved from http://software.intel.com/sites/products/cilk-plus/cilk_plus_language_specification.pdf.Google Scholar
Intel Corporation. 2013. Cilk Plus/LLVM. Retrieved from http://cilkplus.github.io/.Google Scholar
Intel Corporation 2013. Intel Cilk Plus Language Extension Specification, Version 1.2. Intel Corporation. Retrieved from https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm.Google Scholar
Intel Corporation. 2015. Intel C++ Compiler 16.0 User and Reference Guide.Google Scholar
Intel Corporation. 2018. Intel Cilk Plus Samples. Retrieved from https://software.intel.com/en-us/code-samples/intel-compiler/all-samples-and-downloads.Google Scholar
Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A scalable architecture for ordered parallelism. In Proceedings of MICRO. 228--241.Google ScholarDigital Library
Pramod G. Joisha, Robert S. Schreiber, Prithviraj Banerjee, Hans J. Boehm, and Dhruva R. Chakrabarti. 2011. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code. In Proceedings of POPL. 623--636.Google Scholar
Herbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, and Thomas Fahringer. 2013. INSPIRE: The Insieme parallel intermediate representation. In Proceedings of PACT. 7--18.Google ScholarCross Ref
Brian W. Kernighan and Dennis M. Ritchie. 1988. The C Programming Language (2nd ed.). Prentice Hall, Inc.Google ScholarDigital Library
D. Khaldi, P. Jouvelot, C. Ancourt, and F. Irigoin. 2012. SPIRE, a Sequential to Parallel Intermediate Representation Extension. Technical Report. Technical Report CRI/A-487, MINES ParisTech.Google Scholar
Dounia Khaldi, Pierre Jouvelot, François Irigoin, Corinne Ancourt, and Barbara Chapman. 2015. LLVM parallel intermediate representation: Design and evaluation using OpenSHMEM communications. In Proceedings of LLVM. 2:1--2:8.Google Scholar
Jens Knoop, Bernhard Steffen, and Jürgen Vollmer. 1996. Parallelism for free: Efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst. 18, 3 (May 1996), 268--299.Google ScholarDigital Library
Leslie Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept. 1979), 690--691.Google ScholarDigital Library
Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of CGO. 75--87.Google ScholarCross Ref
Edward A. Lee. 2006. The problem with threads. Computer 39 (2006), 33--42.Google ScholarDigital Library
I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. Trans. Parallel Comput. 2, 3, Article 17 (2015), 42 pages.Google Scholar
I-Ting Angelina Lee, Aamir Shafi, and Charles E. Leiserson. 2012. Memory-mapping support for reducer hyperobjects. In Proceedings of SPAA. 287--297.Google Scholar
Jaejin Lee, Samuel P. Midkiff, and David A. Padua. 1997. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In Proceedings of LCPC. 114--130.Google Scholar
Charles E. Leiserson. 2010. The Cilk++ concurrency platform. J. Supercomput. 51, 3 (2010), 244--257.Google ScholarDigital Library
LLVM Developer List. 2012. [LLVMdev] [cfe-dev] SPIR Provisional Specification Is Now Available in the Khronos Website. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053293.html.Google Scholar
LLVM Developer List. 2012. [LLVMdev] [RFC] OpenMP Representation in LLVM IR. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053861.html.Google Scholar
LLVM Developer List. 2012. [LLVMdev] [RFC] Parallelization Metadata and Intrinsics in LLVM (for OpenMP, Etc.). Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053792.html.Google Scholar
LLVM Developer List. 2015. [LLVMdev] LLVM Parallel IR. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2015-March/083314.html.Google Scholar
LLVM Project. 2015. OpenMP: Support for the OpenMP Language. Retrieved from http://openmp.llvm.org/.Google Scholar
LLVM Project. 2018. Exception Handling in LLVM. Retrieved from https://llvm.org/docs/ExceptionHandling.html.Google Scholar
LLVM Project. 2018. LLVM Language Reference Manual. Retrieved from http://llvm.org/docs/LangRef.html.Google Scholar
LLVM Project. 2018. LLVM’s Analysis and Transform Passes. Retrieved from http://llvm.org/docs/Passes.html.Google Scholar
Michael McCool, Arch D. Robison, and James Reinders. 2012. Structured Parallel Programming: Patterns for Efficient Computation. Elsevier Science.Google ScholarDigital Library
Don McCrady. 2008. Avoiding Contention Using Combinable Objects. Microsoft Developer Network blog post. Retrieved from http://blogs.msdn.com/nativeconcurrency/archive/2008/09/25/avoiding-contention-using-combinable-objects.aspx.Google Scholar
Samuel P. Midkiff and David A. Padua. 1990. Issues in the optimization of parallel programs. In Proceedings of ICPP. 105--113.Google Scholar
Carroll Morgan. 1994. Programming from Specifications (2nd ed.). Prentice Hall International (UK) Ltd.Google ScholarDigital Library
Joel Moses. 1970. The Function of FUNCTION in LISP or Why the FUNARG Problem Should be Called the Environment Problem. Technical Report memo AI-199. Massachusetts Institute of Technology Artificial Intelligence Laboratory.Google Scholar
Steven S. Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann.Google Scholar
Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of PACT. 281--290.Google ScholarDigital Library
Robert H. B. Netzer and Barton P. Miller. 1992. What are race conditions?ACM Lett. Program. Lang. Syst. 1, 1 (1992), 74--88.Google ScholarDigital Library
Diego Novillo, Ron Unrau, and Jonathan Schaeffer. 1998. Concurrent SSA form in the presence of mutual exclusion. In Proceedings of ICPP. 356--364.Google ScholarCross Ref
Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient deterministic multithreading in software. In Proceedings of ASPLOS. 97--108.Google ScholarDigital Library
OpenMP Architecture Review Board. 2015. OpenMP Application Program Interface, Version 4.5. Retrieved from http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.Google Scholar
Suhas S. Patil. 1970. Closure properties of interconnections of determinate systems. In Record of the Project MAC Conference on Concurrent Systems and Parallel Computation, Jack B. Dennis (Ed.). ACM.Google ScholarDigital Library
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of parallelism in algorithms. In Proceedings of ACM PLDI.Google ScholarDigital Library
Antoniu Pop and Albert Cohen. 2010. Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers. In Proceedings of CPC.Google Scholar
William Pugh. 1999. Fixing the Java memory model. In Proceedings of JAVA. 89--98.Google ScholarDigital Library
Rolf Rabenseifner, Georg Hager, and Gabriele Jost. 2009. Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In Proceedings of PDP. 427--436.Google ScholarDigital Library
James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc.Google Scholar
Arch D. Robison and Ralph E. Johnson. 2010. Three layer cake for shared-memory programming. In Proceedings of ParaPLoP. 5:1--5:8.Google Scholar
Erik Ruf. 2000. Effective synchronization removal for Java. In Proceedings of PLDI. 208--218.Google ScholarDigital Library
Radu Rugina and Martin C. Rinard. 2003. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst. 25, 1 (Jan. 2003), 70--116.Google ScholarDigital Library
Vivek Sarkar. 1998. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In Proceedings of LCPC. 94--113.Google Scholar
Vivek Sarkar and Barbara Simons. 1994. Parallel program graphs and their classification. In Proceedings of LCPC. 633--655.Google ScholarCross Ref
Tao B. Schardl. 2016. Performance Engineering of Multicore Software: Developing a Science of Fast Code for the Post-Moore Era. Ph.D. Dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
Tao B. Schardl, Tyler Denniston, Damon Doucet, Bradley C. Kuszmaul, I-Ting Angelina Lee, and Charles E. Leiserson. 2017. The CSI framework for compiler-inserted program instrumentation. Proc. ACM Meas. Anal. Comput. Syst. 1, 2 (Dec. 2017).Google ScholarDigital Library
Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof scalability profiler. In Proceedings of SPAA. 89--100.Google Scholar
Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding fork-join parallelism into LLVM’s intermediate representation. In Proceedings of PPoPP. 249--265.Google Scholar
J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In Proceedings of IPDPS.Google Scholar
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. 2013. Reducing contention through priority updates. In Proceedings of SPAA. 152--163.Google Scholar
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem-based benchmark suite. In Proceedings of SPAA. 68--70.Google ScholarDigital Library
Harini Srinivasan and Dirk Grunwald. 1991. An Efficient Construction of Parallel Static Single Assignment Form for Structured Parallel Programs. Technical Report. Technical Report CU-CS-564-91, University of Colorado at Boulder.Google Scholar
Harini Srinivasan, James Hook, and Michael Wolfe. 1993. Static single assignment for explicitly parallel programs. In Proceedings of POPL. 260--272.Google ScholarDigital Library
Harini Srinivasan and Michael Wolfe. 1991. Analyzing programs with explicit parallelism. In Proceedings of LCPC. 405--419.Google Scholar
Richard M. Stallman and the GCC Developer Community. 2016. Using the GNU Compiler Collection (for GCC version 6.1.0). Free Software Foundation.Google Scholar
Guy L. Steele Jr. 1990. Making asynchronous parallelism safe for the world. In Proceedings of POPL. 218--231.Google ScholarDigital Library
George Stelle, William S. Moses, Stephen L. Olivier, and Patrick McCormick. 2017. OpenMPIR: Implementing openmp tasks with tapir. In Proceedings of LLVM-HPC.Google ScholarDigital Library
Bjarne Stroustrup. 2013. The C++ Programming Language (4th ed.). Addison-Wesley.Google Scholar
Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably good and practically efficient parallel race detection for fork-join programs. In Proceedings of SPAA. 83--94.Google ScholarDigital Library
Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In Proceedings of POPL. 209--220.Google ScholarDigital Library
Eelco Visser. 2001. A survey of rewriting strategies in program transformation systems. Electr. Notes Theoret. Comput. Sci. 57 (2001), 109--143.Google ScholarCross Ref
Martin Wimmer. 2013. Wait-free hyperobjects for task-parallel programming systems. In Proceedings of IPDPS. 803--812.Google ScholarDigital Library
Jie Yu and Satish Narayanasamy. 2009. A case for an interleaving constrained shared-memory multi-processor. In Proceedings of ISCA. 325--336.Google ScholarDigital Library
Jisheng Zhao and Vivek Sarkar. 2011. Intermediate language extensions for parallelism. In Proceedings of SPLASH. 329--340.Google ScholarDigital Library

Index Terms

Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. Compilers

Recommendations

OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code
PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

This paper presents OpenCilk, an open-source software infrastructure for task-parallel programming that allows for substantial code reuse and easy exploration of design choices in language abstraction, compilation strategy, runtime mechanism, and ...
Read More
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...
Read More
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17

This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...
Read More

Reviews

Reviewer: William M. Waite

A typical compiler has a front end that analyzes the source text and converts it to a language-independent intermediate representation (IR) whose structure and operations support general-purpose optimization in the compiler's middle end. The quality of the optimization depends on the IR's ability to express the intent of source language constructs in a way that the middle end understands and can improve. Although some compilers now support task parallel linguistic constructs, their front ends convert those constructs into more primitive IR sequences. Often, those sequences involve opaque procedure calls that block further optimization. The authors took a different approach with Tapir. They chose to extend the existing LLVM IR to encode logical parallelism and expose opportunities for standard compiler optimizations to work on parallel code. Integration of these extensions was relatively simple, and they turned out to be easy to maintain in response to LLVM changes. The extensions opened parallel code to a wide variety of standard optimizations by avoiding the problems with opaque procedure calls. According to the authors, the Tapir approach is premised on the notion that parallel programs can usually be executed serially and that there is a normative serial execution order for tasks. This allows them to define serial projection: A serial projection transforms a given program to replace parallel constructs with serial constructs. If the serial projection produces one of the valid behaviors of the original program, then the compiler and runtime system can execute parallel tasks serially. The authors argue that the serial projection property is supported by most task parallel programming models, but not necessarily by arbitrary concurrency mechanisms. They therefore strongly advocate that task parallelism and concurrency be treated as separate concepts. The paper describes how Tapir represents logically parallel tasks, how LLVM's analysis is adapted to Tapir programs, and how the authors evaluated Tapir in practice. Each of these topics is covered in detail, with examples and discussion. The authors discuss related work and present a comprehensive retrospective on where they have been and where they are going. I found the paper easy to read and engrossing; it should be accessible to anyone with a basic understanding of code optimization

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Parallel Computing Volume 6, Issue 4
December 2019
188 pages
ISSN:2329-4949
EISSN:2329-4957
DOI:10.1145/3372747
Editor:
David A. Bader
New Jersey Institute of Technology, USA
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 December 2019
- Accepted: 1 April 2019
- Revised: 1 January 2019
- Received: 1 April 2018
Published in topc Volume 6, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Cilk
LLVM
OpenMP
Tapir
compiling
control-flow graph
fork-join parallelism
multicore
optimization
parallel computing
serial-projection property
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 1,075
  Total Downloads
- Downloads (Last 12 months)356
- Downloads (Last 6 weeks)49
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format