ABSTRACT
Recently, Graphics Processing Units(GPUs) have emerged as a very promisingly powerful resource in scientific computing. Algorithmic Differentiation is a technique to numerically evaluate first and higher derivatives of a function specified by a computer program efficiently up to machine precision. Derivative programs which are used to compute derivatives of functions are so-called tangent-linear program and adjoint program. This paper aims to offload any particular independent loop in tangent-linear program to GPUs. The proposed technique is OpenACC APIs for annotating an independent loop to be executed in parallel on GPUs. Our case study for OpenACC tangent-linear code shows an enormous speedup. OpenACC shows its simplicity of accelerating tangent-linear code by hiding the data movement between CPU and GPU memory.
- The OpenACC#8482; Application Programming Interface version 1.0, November 2011.Google Scholar
- M. Förster, U. Naumann, and J. Utke. Toward Adjoint OpenMP. Technical Report AIB-2011-13, RWTH Aachen, July 2011.Google Scholar
- T. P. Group. OpenACC Kernels and Parallel Constructs. http://www.pgroup.com/lit/articles/insider/v4n2a1.htm, August 2012. {Online; accessed 29-July-2013}.Google Scholar
- T. P. Group. Userforum: Initialize global variables with OpenACC pragma. www.pgroup.com/userforum/viewtopic.php?t=3869, May 2013. {Online; accessed 03-August-2013}.Google Scholar
- B. T. Minh. Tangent-Linear and Adjoint GPU Code. diploma thesis, The Sirindhorn International Thai-German Graduate School of Engineering, King Mongkut's University of Technology North Bangkok, May 2013.Google Scholar
- U. Naumann. The Art of Differentiating Computer Programs: An Introduction to Algorithmic Differentiation. SIAM, 2012. Google ScholarDigital Library
Index Terms
- Towards tangent-linear GPU programs using OpenACC
Recommendations
OpenACC Execution Models for Manycore Processor with ARM SVE
HPCAsia '23 Workshops: Proceedings of the HPC Asia 2023 WorkshopsOpenACC is designed to offer performance portability across CPUs with SIMD extensions and accelerators based on GPU or manycore architecture. We are working on the design of OpenACC compiler for A64FX manycore processor with Arm SVE. We use a source-to-...
OpenACC acceleration of the Nek5000 spectral element code
We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. ...
Hybridizing S3D into an Exascale application using OpenACC: An approach for moving to multi-petaflops and beyond
SC '12: Proceedings of the 2012 International Conference for High Performance Computing, Networking, Storage and AnalysisHybridization is the process of converting an application with a single level of parallelism to an application with multiple levels of parallelism. Over the past 15 years a majority of the applications that run on High Performance Computing systems have ...
Comments