skip to main content
10.1145/3148173.3148191acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization

Authors Info & Claims
Published:12 November 2017Publication History

ABSTRACT

With advances of modern multi-core processors and accelerators, many modern applications are increasingly turning to compiler-assisted parallel and vector programming models such as OpenMP, OpenCL, Halide, Python and TensorFlow. It is crucial to ensure that LLVM-based compilers can optimize parallel and vector code as effectively as possible. In this paper, we first present a set of updated LLVM IR extensions for explicitly parallel, vector, and offloading program constructs in the context of C/C++/OpenCL. Secondly, we describe our LLVM design and implementation for advanced features in OpenMP such as parallel loop reduction, task and taskloop, SIMD loop and functions, and we discuss the impact of our updated implementation on existing LLVM optimization passes. Finally, we present a re-use case of our infrastructure to enable explicit parallelization and vectorization extensions in our OpenCL compiler to achieve ~35x performance speedup for a well-known autonomous driving workload on a multi-core platform configured with Intel® Xeon® Scalable Processors.

References

  1. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In CGO '04, pages 75--86, 2004. Google ScholarGoogle ScholarCross RefCross Ref
  2. X. Tian, M. Girkar, A. J.C. Bik, and H. Saito, "Practical Compiler Techniques on Efficient Multithreaded Code Generation for OpenMP Programs," The Computer Journal, Oxford, Vol. 48, Issue 5, pps. 558--601, 2005.Google ScholarGoogle Scholar
  3. X. Tian, H. Saito, M. Girkar, S. Preis, S. Kozhukhov, A.G. Cherkasov, C. Nelson, N. Panchenko, R. Geva, Compiling C/C++ SIMD Extensions for Function and Loop Vectorization on Multicore-SIMD Processors. In Proc. of IEEE 26th International Parallel and Distributed Processing Symposium - Multicore and GPU Prog. Models, Lang. and Compilers Workshop, pp. 2349--2358, 2012.Google ScholarGoogle Scholar
  4. OpenMP Architecture Review Board, "OpenMP Application Program Interface," v4.5, Oct. 2015, http://www.openmp.orgGoogle ScholarGoogle Scholar
  5. J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM intermediate representation for verified program transformations. In POPL '12, pages 427--440, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Intel Corporation, LLVM Intrinsic function and Tag name string interface specitication for directive representation, April 12, 2017Google ScholarGoogle Scholar
  7. A. Zaks, et..al., "[llvm-dev] RFC: Extending LV to vectorize outerloops", Sept. 21, 2016, Intel Corporation.Google ScholarGoogle Scholar
  8. H. Finkel and X. Tian "[llvm-dev] RPC: A Proposal for adding an experimental IR-level region-annotation infrastructure, Jan. 11, 2017. http://lists.llvm.org/pipermail/llvm-dev/2017-January/108906.html.Google ScholarGoogle Scholar
  9. H. Saito, et. al., "Extending LoopVectorizer towards supporting OpenMP4.5 SIMD and outer loop auto-vectorization", LLVM Developer's Conference, Nov. 2016Google ScholarGoogle Scholar
  10. X. Tian, et.al. "Proposal for function vectorization and loop vectorization with function calls", March 2, 2016. Intel Corp. http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html.Google ScholarGoogle Scholar
  11. F. Homm, N. Kaempchen, J. Ota and D. Burschka, "Efficient Occupancy Grid Computation on GPU with Lidar and Radar for Road Boundary Detection", In Proc. of IEEE Intelligent Vehicle Symposium, pp. 1006--1013 Universiry of California, San Diego, CA, USA, June 21-24, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  12. X. Tian, H. Saito, E. Su, A. Gaba, M. Masten, E. Garcia, A. Zaks, "LLVM Framework and IR Extensions for Parallelization, SIMD Vectorization and Offloading". LLVM-HPC@SC 2016: 21--31.Google ScholarGoogle Scholar
  13. T.B. Schardl, W.S. Moses, C.E. Leiserson, "Tapir: Embedding Fork-Join Parallelism into LLVM's Intermediate Representation", PPoPP'17, Feburary. 4-7, 2017, Austin, Texas, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. LLVM Compiler Implementation for Explicit Parallelization and SIMD Vectorization

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      LLVM-HPC'17: Proceedings of the Fourth Workshop on the LLVM Compiler Infrastructure in HPC
      November 2017
      106 pages
      ISBN:9781450355650
      DOI:10.1145/3148173

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 November 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      LLVM-HPC'17 Paper Acceptance Rate9of10submissions,90%Overall Acceptance Rate16of22submissions,73%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader