research-article

Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC

Authors:
Christopher P. Stone

Computational Science and Engineering, LLC, Chicago, IL

Computational Science and Engineering, LLC, Chicago, IL
View Profile

,
Bracy H. Elton

Engility Corporation, Wright-Patterson Air Force Base, OH

Engility Corporation, Wright-Patterson Air Force Base, OH
View Profile

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using DirectivesNovember 2015Article No.: 2Pages 1–7https://doi.org/10.1145/2832105.2832110

Published:15 November 2015Publication History

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives

Pages 1–7

ABSTRACT

The multi-zone scalar pentadiagonal (SP-MZ) benchmark, part of the multi-zone NAS Parallel Benchmark suite, is ported to graphics processing units (GPUs) using OpenACC compiler directives. The sequence of optimizations necessary to transform the SP-MZ algorithm from CPU-oriented to GPU-oriented is presented. The performance of the OpenACC implementation on GPUs is measured using predefined mesh sizes. We observe a 30% speed-up using the OpenACC implement on an NVIDIA Kepler K40 GPU compared to an eight-core Intel Xeon E5-2670 CPU with the small Class-A mesh (256 thousand points). Setting inter-zone boundary conditions directly on the device reduced run-time by 22% due to the high cost of host-device communication. Multi-device benchmarks with the larger Class-C mesh (4.3 million points) were scaled to 32 GPU nodes and matched or outperformed the CPU baseline with ten cores per node. Combining both CPU and GPU computing power improved the throughput on the Class-C mesh by 75%. We define a larger zone size with one million points per node to better reflect modern usage with codes similar to SP-MZ. The OpenACC GPU implementation outperformed the baseline multi-core CPU by 29% on this real-world mesh size.

References

Van der Wijngaart, R. F., Haoqiang, J., "NASA Parallel Benchmarks, Multi-Zone Versions," NAS Technical Report NAS-03-010, July, 2003.Google Scholar
Buning, P., Parks, S., Chan, W., and Renze, K., "Application of the Chimera Overlapped Grid Scheme to Simulation of Space Shuttle Ascent Flows," Proceedings of the 4th International Symposium on Computational Fluid Dynamics, Vol. 1, 1991, pp. 132--137.Google Scholar
Visbal, M. and Gaitonde, D., "On the Use of Higher-Order Finite-Difference Schemes on Curvilinear and Deforming Meshes," J. of Computational Physics, Vol. 181(1), pp. 155--185, 2002. Google ScholarDigital Library
Xu, R., Tian, X., Chandrasekaran, S., Yan, Y., and Chapman, B., "OpenACC Parallelization and optimization of NAS parallel benchmarks," GPU Technology Conference 2014.Google Scholar
www.openacc.org, accessed on July 28, 2015.Google Scholar
www.nvidia.com/object/cuda_home_new.html, accessed on July 27, 2015.Google Scholar
Y. Zhang, J. Cohen, J. D. Owens, "Fast tridiagonal solvers on the GPU," ACM Sigplan Notices, 45 (2010) 127--136. Google ScholarDigital Library
C. P. Stone, E. P. Duque, Y. Zhang, D. Car, J. D. Owens, R. L. Davis, "GPGPU parallel algorithms for structured-grid CFD codes," AIAA paper, 2011-3221, 2011.Google Scholar

Index Terms

Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel programming languages
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption
ARMS-CC '17: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing

Many modern parallel computing systems are heterogeneous at their node level. Such nodes may comprise general purpose CPUs and accelerators (such as, GPU, or Intel Xeon Phi) that provide high performance with suitable energy-consumption characteristics. ...
Read More
Accelerating financial applications on the GPU
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

The QuantLib library is a popular library used for many areas of computational finance. In this work, the parallel processing power of the GPU is used to accelerate QuantLib financial applications. Black-Scholes, Monte-Carlo, Bonds, and Repo code paths ...
Read More
A preliminary evaluation of OpenACC implementations

During the last few years, the availability of hardware accelerators, such as GPUs, has rapidly increased. However, the entry cost to GPU programming is high and requires a considerable porting and tuning effort. Some research groups and vendors have ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives
November 2015
68 pages
ISBN:9781450340144
DOI:10.1145/2832105
Program Chairs:
Sunita Chandrasekaran
University of Houston
,
Fernanda Foertter
ORNL
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ADI
CUDA
GPGPU
OpenACC
Qualifiers
- research-article
Conference

Acceptance Rates
WACCPD '15 Paper Acceptance Rate7of14submissions,50%Overall Acceptance Rate7of14submissions,50%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 119
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Accelerating financial applications on the GPU

A preliminary evaluation of OpenACC implementations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Accelerating the multi-zone scalar pentadiagonal CFD algorithm with OpenACC

WACCPD '15: Proceedings of the Second Workshop on Accelerator Programming using Directives

ABSTRACT

References

Cited By

Index Terms

Recommendations

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: Programming Productivity, Performance, and Energy Consumption

Accelerating financial applications on the GPU

A preliminary evaluation of OpenACC implementations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media