ABSTRACT
With advances of modern multi-core processors and accelerators, many modern applications are increasingly turning to compiler-assisted parallel and vector programming models such as OpenMP, OpenCL, Halide, Python and TensorFlow. It is crucial to ensure that LLVM-based compilers can optimize parallel and vector code as effectively as possible. In this paper, we first present a set of updated LLVM IR extensions for explicitly parallel, vector, and offloading program constructs in the context of C/C++/OpenCL. Secondly, we describe our LLVM design and implementation for advanced features in OpenMP such as parallel loop reduction, task and taskloop, SIMD loop and functions, and we discuss the impact of our updated implementation on existing LLVM optimization passes. Finally, we present a re-use case of our infrastructure to enable explicit parallelization and vectorization extensions in our OpenCL compiler to achieve ~35x performance speedup for a well-known autonomous driving workload on a multi-core platform configured with Intel® Xeon® Scalable Processors.
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--86, 2004. Google ScholarCross Ref
- X. Tian, M. Girkar, A. J.C. Bik, and H. Saito, "Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs," The Computer Journal, Oxford, Vol. 48, Issue 5, pps. 558--601, 2005.Google Scholar
- X. Tian, H. Saito, M. Girkar, S. Preis, S. Kozhukhov, A.G. Cherkasov, C. Nelson, N. Panchenko, R. Geva, Compiling C/C++ SIMD Extensions for Function and Loop Vectorization on Multicore-SIMD Processors. In Proc. of IEEE 26th International Parallel and Distributed Processing Symposium - Multicore and GPU Prog. Models, Lang. and Compilers Workshop, pp. 2349--2358, 2012.Google Scholar
- OpenMP Architecture Review Board, "OpenMP Application Program Interface," v4.5, Oct. 2015, http://www.openmp.orgGoogle Scholar
- J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM intermediate representation for verified program transformations. In POPL '12, pages 427--440, 2012. Google ScholarDigital Library
- Intel Corporation, LLVM Intrinsic function and Tag name string interface specitication for directive representation, April 12, 2017Google Scholar
- A. Zaks, et..al., "[llvm-dev] RFC: Extending LV to vectorize outerloops", Sept. 21, 2016, Intel Corporation.Google Scholar
- H. Finkel and X. Tian "[llvm-dev] RPC: A Proposal for adding an experimental IR-level region-annotation infrastructure, Jan. 11, 2017. http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html.Google Scholar
- H. Saito, et. al., "Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization", LLVM Developer's Conference, Nov. 2016Google Scholar
- X. Tian, et.al. "Proposal for function vectorization and loop vectorization with function calls", March 2, 2016. Intel Corp. http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html.Google Scholar
- F. Homm, N. Kaempchen, J. Ota and D. Burschka, "Efficient Occupancy Grid Computation on GPU with Lidar and Radar for Road Boundary Detection", In Proc. of IEEE Intelligent Vehicle Symposium, pp. 1006--1013 Universiry of California, San Diego, CA, USA, June 21-24, 2010. Google ScholarCross Ref
- X. Tian, H. Saito, E. Su, A. Gaba, M. Masten, E. Garcia, A. Zaks, "LLVM Framework and IR Extensions for Parallelization, SIMD Vectorization and Offloading". LLVM-HPC@SC 2016: 21--31.Google Scholar
- T.B. Schardl, W.S. Moses, C.E. Leiserson, "Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation", PPoPP'17, Feburary. 4-7, 2017, Austin, Texas, USA. Google ScholarDigital Library
- LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization
Recommendations
LLVM framework and IR extensions for parallelization, SIMD vectorization and offloading
LLVM-HPC '16: Proceedings of the Third Workshop on LLVM Compiler Infrastructure in HPCLLVM has become an integral part of the software-development ecosystem for developing advanced compilers, high-performance computing software and tools. This paper presents a small set of LLVM IR extensions for explicitly parallel vector, and offloading ...
SIMD parallel MCMC sampling with applications for big-data Bayesian analytics
Computational intensity and sequential nature of estimation techniques for Bayesian methods in statistics and machine learning, combined with their increasing applications for big data analytics, necessitate both the identification of potential ...
Support OpenCL 2.0 Compiler on LLVM for PTX Simulators
Heterogeneous systems that consist of multiple CPUs and GPUs for high-performance computing are becoming increasingly popular, and OpenCL (Open Computing Language) provides a framework for writing programs that can be executed across heterogeneous ...
Comments