Abstract
Tapir (pronounced TAY-per) is a compiler intermediate representation (IR) that embeds recursive fork-join parallelism, as supported by task-parallel programming platforms such as Cilk and OpenMP, into a mainstream compiler’s IR. Mainstream compilers typically treat parallel linguistic constructs as syntactic sugar for function calls into a parallel runtime. These calls prevent the compiler from performing optimizations on and across parallel control constructs. Remedying this situation has generally been thought to require an extensive reworking of compiler analyses and code transformations to handle parallel semantics. Tapir leverages the “serial-projection property,” which is commonly satisfied by task-parallel programs, to handle the semantics of these programs without an extensive rework of the compiler.
For recursive fork-join programs that satisfy the serial-projection property, Tapir enables effective compiler optimization of parallel programs with only minor changes to existing compiler analyses and code transformations. Tapir uses the serial-projection property to order logically parallel fine-grained tasks in the program’s control-flow graph. This ordered representation of parallel tasks allows the compiler to optimize parallel codes effectively with only minor modifications. For example, to implement Tapir/LLVM, a prototype of Tapir in the LLVM compiler, we added or modified less than 3,000 lines of LLVM’s half-million-line core middle-end functionality.
These changes sufficed to enable LLVM’s existing compiler optimizations for serial code—including loop-invariant-code motion, common-subexpression elimination, and tail-recursion elimination—to work with parallel control constructs such as parallel loops and Cilk’s Cilk_Spawn keyword. Tapir also supports parallel optimizations, such as loop scheduling, which restructure the parallel control flow of the program. By making use of existing LLVM optimizations and new parallel optimizations, Tapir/LLVM can optimize recursive fork-join programs more effectively than traditional compilation methods. On a suite of 35 Cilk application benchmarks, Tapir/LLVM produces more efficient executables for 30 benchmarks, with faster 18-core running times for 26 of them, compared to a nearly identical compiler that compiles parallel linguistic constructs the traditional way.
- Shivali Agarwal, Rajkishore Barik, Vivek Sarkar, and Rudrapatna K. Shyamasundar. 2007. May-happen-in-parallel analysis of X10 programs. In Proceedings of PPoPP. 183--193.Google Scholar
- Kunal Agrawal, Charles E. Leiserson, and Jim Sukha. 2010. Executing task graphs using work-stealing. In Proceedings of IPDPS. 1--12.Google ScholarCross Ref
- Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Addison-Wesley.Google ScholarDigital Library
- Jonathan Aldrich, Craig Chambers, EminGun Sirer, and Susan Eggers. 1999. Static analyses for eliminating unnecessary synchronization from Java programs. In Static Analysis, Agostino Cortesi and Gilberto Filé (Eds.). Lecture Notes in Computer Science, Vol. 1694. 19--38.Google Scholar
- Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of SPAA. 119--129.Google Scholar
- E. Ayguade, N. Copty, A. Duran, J. Hoeflinger, Yuan Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and Guansong Zhang. 2009. The design of OpenMP tasks. IEEE Trans. Parallel Distrib. Syst. 20, 3 (2009), 404--418.Google ScholarDigital Library
- Rajkishore Barik and Vivek Sarkar. 2009. Interprocedural load elimination for dynamic optimization of parallel programs. In Proceedings of PACT. 41--52.Google ScholarDigital Library
- Rajkishore Barik, Jisheng Zhao, and Vivek Sarkar. 2013. Interprocedural strength reduction of critical sections in explicitly-parallel programs. In Proceedings of PACT. 29--40.Google ScholarCross Ref
- Tom Bergan, Owen Anderson, Joseph Devietti, Luis Ceze, and Dan Grossman. 2010. CoreDet: A compiler and runtime system for deterministic multithreaded execution. In Proceedings of ASPLOS.Google ScholarDigital Library
- Emery D. Berger, Ting Yang, Tongping Liu, and Gene Novark. 2009. Grace: Safe multithreaded programming for C/C++. In Proceedings of OOPSLA. 81--96.Google ScholarDigital Library
- Guy E. Blelloch. 1996. Programming parallel algorithms. Commun. ACM 39, 3 (Mar. 1996).Google ScholarDigital Library
- Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In Proceedings of PPoPP. 181--192.Google ScholarDigital Library
- Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An efficient multithreaded runtime system. J. Parallel Distrib. Comput. 37, 1 (1996), 55--69.Google ScholarDigital Library
- Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46, 5 (1999), 720--748.Google ScholarDigital Library
- Robert L. Bocchino, Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir. 2009. Parallel programming must be deterministic by default. In Proceedings of HotPar.Google Scholar
- Hans-J. Boehm and Sarita V. Adve. 2008. Foundations of the C++ concurrency memory model. In Proceedings of PLDI. 68--78.Google Scholar
- Luca Cardelli. 1997. Program fragments, linking, and modularization. In Proceedings of POPL. 266--277.Google ScholarDigital Library
- Vincent Cavé, Jisheng Zhao, Jun Shirako, and Vivek Sarkar. 2011. Habanero-Java: The new adventures of old X10. In Proceedings of PPPJ. 51--61.Google ScholarDigital Library
- Prasanth Chatarasi, Jun Shirako, and Vivek Sarkar. 2015. Polyhedral optimizations of explicitly parallel programs. In Proceedings of PACT. 213--226.Google ScholarDigital Library
- John S. Danaher, I.-Ting Angelina Lee, and Charles E. Leiserson. 2008. Programming with exceptions in JCilk. Sci. Comput. Program. 63, 2 (Dec. 2008), 147--171.Google Scholar
- Joseph Devietti, Brandon Lucia, Luis Ceze, and Mark Oskin. 2009. DMP: Deterministic shared memory multiprocessing. In Proceedings of ASPLOS. 85--96.Google ScholarDigital Library
- Joseph Devietti, Jacob Nelson, Tom Bergan, Luis Ceze, and Dan Grossman. 2011. RCDC: A relaxed consistency deterministic computer. In Proceedings of ASPLOS. 67--78.Google ScholarDigital Library
- Wei Du, Renato Ferreira, and Gagan Agrawal. 2003. Compiler support for exploiting coarse-grained pipelined parallelism. In Proceedings of SC. 8--21.Google ScholarDigital Library
- Mingdong Feng and Charles E. Leiserson. 1997. Efficient detection of determinacy races in Cilk programs. In Proceedings of SPAA.Google Scholar
- Mingdong Feng and Charles E. Leiserson. 1999. Efficient detection of determinacy races in Cilk programs. Theory Comput. Syst. 32, 3 (1999), 301--326.Google ScholarCross Ref
- Jeremy T. Fineman and Charles E. Leiserson. 2011. Race detectors for Cilk and Cilk++ programs. In Encyclopedia of Parallel Computing, David Padua (Ed.). 1706--1719.Google Scholar
- The MPI Forum. 1993. MPI: A message passing interface. In Proceedings of Supercomputing. 878--883.Google ScholarDigital Library
- Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proceedings of SPAA. 79--90.Google ScholarDigital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of PLDI. 212--223.Google Scholar
- GCC Team. 2014. GCC 4.9 Release Series Changes, New Features, and Fixes. Retrieved from https://gcc.gnu.org/gcc-4.9/changes.html.Google Scholar
- GCC Team. 2015. GOMP—An OpenMP Implementation for GCC. Retrieved from https://gcc.gnu.org/projects/gomp/.Google Scholar
- P. B. Gibbons. 1989. A more practical PRAM model. In Proceedings of SPAA. 158--168.Google ScholarDigital Library
- Dan Grossman and Ruth E. Anderson. 2012. Introducing parallelism and concurrency in the data structures course. In Proceedings of the Technical Symposium on Computer Science Education (SIGCSE ’12). ACM, New York, NY, 505--510.Google Scholar
- Dirk Grunwald and Harini Srinivasan. 1993. Data flow equations for explicitly parallel programs. In Proceedings of PPoPP. 159--168.Google ScholarDigital Library
- Pablo Halpern. 2012. Strict Fork-Join Parallelism. Technical Report N3409. Intel Corporation.Google Scholar
- Pablo Halpern and Charles E. Leiserson. 2013. Thread-Local Storage in X-Parallel Computations. Technical Report N3556. Intel Corporation and MIT.Google Scholar
- Robert H. Halstead, Jr. 1985. Multilisp: A language for concurrent symbolic computation. ACM Trans. Program. Lang. Syst. 7, 4 (Oct. 1985), 501--538.Google ScholarDigital Library
- Yuxiong He, Charles E. Leiserson, and William M. Leiserson. 2010. The Cilkview scalability analyzer. In Proceedings of SPAA.Google Scholar
- Michael A. Heroux, Douglas W. Doerfler, Paul S. Crozier, James M. Willenbring, H. Carter Edwards, Alan Williams, Mahesh Rajan, Eric R. Keiter, Heidi K. Thornquist, and Robert W. Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.Google Scholar
- C. A. R. Hoare. 1961. Algorithm 64: Quicksort. Commun. ACM 4, 7 (1961), 321.Google ScholarDigital Library
- L. Hochstein, J. Carver, F. Shull, S. Asgari, V. Basili, J. K. Hollingsworth, and M. V. Zelkowitz. 2005. Parallel programmer productivity: A case study of novice parallel programmers. In Proceedings of SC.Google Scholar
- Derek R. Hower, Polina Dudnik, Mark D. Hill, and David A. Wood. 2011. Calvin: Deterministic or not? Free will to choose. In Proceedings of HPCA. 333--334.Google Scholar
- Institute of Electrical and Electronic Engineers. [n.d.]. Information Technology—Portable Operating System Interface (POSIX)—Part 1: System Application Program Interface (API) [C Language]. IEEE Standard 1003.1, 1996 Edition.Google Scholar
- Intel Corporation. 2010. Intel Cilk Plus Application Binary Interface Specification. Document Number: 324512-001US. Retrieved from https://software.intel.com/sites/products/cilk-plus/cilk_plus_abi.pdf.Google Scholar
- Intel Corporation. 2010. Intel Cilk Plus Language Specification. Document Number: 324396-001US. Retrieved from http://software.intel.com/sites/products/cilk-plus/cilk_plus_language_specification.pdf.Google Scholar
- Intel Corporation. 2013. Cilk Plus/LLVM. Retrieved from http://cilkplus.github.io/.Google Scholar
- Intel Corporation 2013. Intel Cilk Plus Language Extension Specification, Version 1.2. Intel Corporation. Retrieved from https://www.cilkplus.org/sites/default/files/open_specifications/Intel_Cilk_plus_lang_spec_1.2.htm.Google Scholar
- Intel Corporation. 2015. Intel C++ Compiler 16.0 User and Reference Guide.Google Scholar
- Intel Corporation. 2018. Intel Cilk Plus Samples. Retrieved from https://software.intel.com/en-us/code-samples/intel-compiler/all-samples-and-downloads.Google Scholar
- Mark C. Jeffrey, Suvinay Subramanian, Cong Yan, Joel Emer, and Daniel Sanchez. 2015. A scalable architecture for ordered parallelism. In Proceedings of MICRO. 228--241.Google ScholarDigital Library
- Pramod G. Joisha, Robert S. Schreiber, Prithviraj Banerjee, Hans J. Boehm, and Dhruva R. Chakrabarti. 2011. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code. In Proceedings of POPL. 623--636.Google Scholar
- Herbert Jordan, Simone Pellegrini, Peter Thoman, Klaus Kofler, and Thomas Fahringer. 2013. INSPIRE: The Insieme parallel intermediate representation. In Proceedings of PACT. 7--18.Google ScholarCross Ref
- Brian W. Kernighan and Dennis M. Ritchie. 1988. The C Programming Language (2nd ed.). Prentice Hall, Inc.Google ScholarDigital Library
- D. Khaldi, P. Jouvelot, C. Ancourt, and F. Irigoin. 2012. SPIRE, a Sequential to Parallel Intermediate Representation Extension. Technical Report. Technical Report CRI/A-487, MINES ParisTech.Google Scholar
- Dounia Khaldi, Pierre Jouvelot, François Irigoin, Corinne Ancourt, and Barbara Chapman. 2015. LLVM parallel intermediate representation: Design and evaluation using OpenSHMEM communications. In Proceedings of LLVM. 2:1--2:8.Google Scholar
- Jens Knoop, Bernhard Steffen, and Jürgen Vollmer. 1996. Parallelism for free: Efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst. 18, 3 (May 1996), 268--299.Google ScholarDigital Library
- Leslie Lamport. 1979. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput. C-28, 9 (Sept. 1979), 690--691.Google ScholarDigital Library
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of CGO. 75--87.Google ScholarCross Ref
- Edward A. Lee. 2006. The problem with threads. Computer 39 (2006), 33--42.Google ScholarDigital Library
- I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Zhunping Zhang, and Jim Sukha. 2015. On-the-fly pipeline parallelism. Trans. Parallel Comput. 2, 3, Article 17 (2015), 42 pages.Google Scholar
- I-Ting Angelina Lee, Aamir Shafi, and Charles E. Leiserson. 2012. Memory-mapping support for reducer hyperobjects. In Proceedings of SPAA. 287--297.Google Scholar
- Jaejin Lee, Samuel P. Midkiff, and David A. Padua. 1997. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In Proceedings of LCPC. 114--130.Google Scholar
- Charles E. Leiserson. 2010. The Cilk++ concurrency platform. J. Supercomput. 51, 3 (2010), 244--257.Google ScholarDigital Library
- LLVM Developer List. 2012. [LLVMdev] [cfe-dev] SPIR Provisional Specification Is Now Available in the Khronos Website. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053293.html.Google Scholar
- LLVM Developer List. 2012. [LLVMdev] [RFC] OpenMP Representation in LLVM IR. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053861.html.Google Scholar
- LLVM Developer List. 2012. [LLVMdev] [RFC] Parallelization Metadata and Intrinsics in LLVM (for OpenMP, Etc.). Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2012-September/053792.html.Google Scholar
- LLVM Developer List. 2015. [LLVMdev] LLVM Parallel IR. Retrieved from http://lists.llvm.org/pipermail/llvm-dev/2015-March/083314.html.Google Scholar
- LLVM Project. 2015. OpenMP: Support for the OpenMP Language. Retrieved from http://openmp.llvm.org/.Google Scholar
- LLVM Project. 2018. Exception Handling in LLVM. Retrieved from https://llvm.org/docs/ExceptionHandling.html.Google Scholar
- LLVM Project. 2018. LLVM Language Reference Manual. Retrieved from http://llvm.org/docs/LangRef.html.Google Scholar
- LLVM Project. 2018. LLVM’s Analysis and Transform Passes. Retrieved from http://llvm.org/docs/Passes.html.Google Scholar
- Michael McCool, Arch D. Robison, and James Reinders. 2012. Structured Parallel Programming: Patterns for Efficient Computation. Elsevier Science.Google ScholarDigital Library
- Don McCrady. 2008. Avoiding Contention Using Combinable Objects. Microsoft Developer Network blog post. Retrieved from http://blogs.msdn.com/nativeconcurrency/archive/2008/09/25/avoiding-contention-using-combinable-objects.aspx.Google Scholar
- Samuel P. Midkiff and David A. Padua. 1990. Issues in the optimization of parallel programs. In Proceedings of ICPP. 105--113.Google Scholar
- Carroll Morgan. 1994. Programming from Specifications (2nd ed.). Prentice Hall International (UK) Ltd.Google ScholarDigital Library
- Joel Moses. 1970. The Function of FUNCTION in LISP or Why the FUNARG Problem Should be Called the Environment Problem. Technical Report memo AI-199. Massachusetts Institute of Technology Artificial Intelligence Laboratory.Google Scholar
- Steven S. Muchnick. 1997. Advanced Compiler Design and Implementation. Morgan Kaufmann.Google Scholar
- Angeles Navarro, Rafael Asenjo, Siham Tabik, and Calin Cascaval. 2009. Analytical modeling of pipeline parallelism. In Proceedings of PACT. 281--290.Google ScholarDigital Library
- Robert H. B. Netzer and Barton P. Miller. 1992. What are race conditions?ACM Lett. Program. Lang. Syst. 1, 1 (1992), 74--88.Google ScholarDigital Library
- Diego Novillo, Ron Unrau, and Jonathan Schaeffer. 1998. Concurrent SSA form in the presence of mutual exclusion. In Proceedings of ICPP. 356--364.Google ScholarCross Ref
- Marek Olszewski, Jason Ansel, and Saman Amarasinghe. 2009. Kendo: Efficient deterministic multithreading in software. In Proceedings of ASPLOS. 97--108.Google ScholarDigital Library
- OpenMP Architecture Review Board. 2015. OpenMP Application Program Interface, Version 4.5. Retrieved from http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf.Google Scholar
- Suhas S. Patil. 1970. Closure properties of interconnections of determinate systems. In Record of the Project MAC Conference on Concurrent Systems and Parallel Computation, Jack B. Dennis (Ed.). ACM.Google ScholarDigital Library
- Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, M. Amber Hassaan, Rashid Kaleem, Tsung-Hsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The Tao of parallelism in algorithms. In Proceedings of ACM PLDI.Google ScholarDigital Library
- Antoniu Pop and Albert Cohen. 2010. Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers. In Proceedings of CPC.Google Scholar
- William Pugh. 1999. Fixing the Java memory model. In Proceedings of JAVA. 89--98.Google ScholarDigital Library
- Rolf Rabenseifner, Georg Hager, and Gabriele Jost. 2009. Hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. In Proceedings of PDP. 427--436.Google ScholarDigital Library
- James Reinders. 2007. Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly Media, Inc.Google Scholar
- Arch D. Robison and Ralph E. Johnson. 2010. Three layer cake for shared-memory programming. In Proceedings of ParaPLoP. 5:1--5:8.Google Scholar
- Erik Ruf. 2000. Effective synchronization removal for Java. In Proceedings of PLDI. 208--218.Google ScholarDigital Library
- Radu Rugina and Martin C. Rinard. 2003. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst. 25, 1 (Jan. 2003), 70--116.Google ScholarDigital Library
- Vivek Sarkar. 1998. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In Proceedings of LCPC. 94--113.Google Scholar
- Vivek Sarkar and Barbara Simons. 1994. Parallel program graphs and their classification. In Proceedings of LCPC. 633--655.Google ScholarCross Ref
- Tao B. Schardl. 2016. Performance Engineering of Multicore Software: Developing a Science of Fast Code for the Post-Moore Era. Ph.D. Dissertation. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
- Tao B. Schardl, Tyler Denniston, Damon Doucet, Bradley C. Kuszmaul, I-Ting Angelina Lee, and Charles E. Leiserson. 2017. The CSI framework for compiler-inserted program instrumentation. Proc. ACM Meas. Anal. Comput. Syst. 1, 2 (Dec. 2017).Google ScholarDigital Library
- Tao B. Schardl, Bradley C. Kuszmaul, I-Ting Angelina Lee, William M. Leiserson, and Charles E. Leiserson. 2015. The Cilkprof scalability profiler. In Proceedings of SPAA. 89--100.Google Scholar
- Tao B. Schardl, William S. Moses, and Charles E. Leiserson. 2017. Tapir: Embedding fork-join parallelism into LLVM’s intermediate representation. In Proceedings of PPoPP. 249--265.Google Scholar
- J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer. 2009. Phaser accumulators: A new reduction construct for dynamic parallelism. In Proceedings of IPDPS.Google Scholar
- Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. 2013. Reducing contention through priority updates. In Proceedings of SPAA. 152--163.Google Scholar
- Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief announcement: The problem-based benchmark suite. In Proceedings of SPAA. 68--70.Google ScholarDigital Library
- Harini Srinivasan and Dirk Grunwald. 1991. An Efficient Construction of Parallel Static Single Assignment Form for Structured Parallel Programs. Technical Report. Technical Report CU-CS-564-91, University of Colorado at Boulder.Google Scholar
- Harini Srinivasan, James Hook, and Michael Wolfe. 1993. Static single assignment for explicitly parallel programs. In Proceedings of POPL. 260--272.Google ScholarDigital Library
- Harini Srinivasan and Michael Wolfe. 1991. Analyzing programs with explicit parallelism. In Proceedings of LCPC. 405--419.Google Scholar
- Richard M. Stallman and the GCC Developer Community. 2016. Using the GNU Compiler Collection (for GCC version 6.1.0). Free Software Foundation.Google Scholar
- Guy L. Steele Jr. 1990. Making asynchronous parallelism safe for the world. In Proceedings of POPL. 218--231.Google ScholarDigital Library
- George Stelle, William S. Moses, Stephen L. Olivier, and Patrick McCormick. 2017. OpenMPIR: Implementing openmp tasks with tapir. In Proceedings of LLVM-HPC.Google ScholarDigital Library
- Bjarne Stroustrup. 2013. The C++ Programming Language (4th ed.). Addison-Wesley.Google Scholar
- Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably good and practically efficient parallel race detection for fork-join programs. In Proceedings of SPAA. 83--94.Google ScholarDigital Library
- Viktor Vafeiadis, Thibaut Balabonski, Soham Chakraborty, Robin Morisset, and Francesco Zappa Nardelli. 2015. Common compiler optimisations are invalid in the C11 memory model and what we can do about it. In Proceedings of POPL. 209--220.Google ScholarDigital Library
- Eelco Visser. 2001. A survey of rewriting strategies in program transformation systems. Electr. Notes Theoret. Comput. Sci. 57 (2001), 109--143.Google ScholarCross Ref
- Martin Wimmer. 2013. Wait-free hyperobjects for task-parallel programming systems. In Proceedings of IPDPS. 803--812.Google ScholarDigital Library
- Jie Yu and Satish Narayanasamy. 2009. A case for an interleaving constrained shared-memory multi-processor. In Proceedings of ISCA. 325--336.Google ScholarDigital Library
- Jisheng Zhao and Vivek Sarkar. 2011. Intermediate language extensions for parallelism. In Proceedings of SPLASH. 329--340.Google ScholarDigital Library
Index Terms
- Tapir: Embedding Recursive Fork-join Parallelism into LLVM’s Intermediate Representation
Recommendations
OpenCilk: A Modular and Extensible Software Infrastructure for Fast Task-Parallel Code
PPoPP '23: Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel ProgrammingThis paper presents OpenCilk, an open-source software infrastructure for task-parallel programming that allows for substantial code reuse and easy exploration of design choices in language abstraction, compilation strategy, runtime mechanism, and ...
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17: Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingThis paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...
Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation
PPoPP '17This paper explores how fork-join parallelism, as supported by concurrency platforms such as Cilk and OpenMP, can be embedded into a compiler's intermediate representation (IR). Mainstream compilers typically treat parallel linguistic constructs as ...
Comments