ABSTRACT
We present a highly parallel CUDA kernel based on the Lattice Monte Carlo (LMC) method for transient thermal conduction, which achieves a peak acceleration of more than 100x over a single-threaded Fortran version. A number of memory and branching optimizations for the Graphic Processing Unit (GPU) architectures are described. Combining all tweaks, a fully-optimized kernel is able to outperform the initial speed-up of around 13x observed for a naïve CUDA implementation by another order of magnitude, to reach the peak performance reported (on a single NVIDIA Tesla C2050). Comparison benchmarks are also provided for the Tesla C1060, whereas the Fortran code was executed on an Intel i5 CPU running at 3.6 GHz.
Supplemental Material
Available for Download
- I. V. Belova, G. E. Murch, T. Fiedler, and A. Öchsner. The lattice monte carlo method for solving phenomenological mass and heat transport problems. Diffusion Fundamentals, 4:15.1--15.23, 2007.Google Scholar
Index Terms
- Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations
Recommendations
Fast in-place sorting with CUDA based on bitonic sort
PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part IState of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance coprocessors for general-purpose computing. Sorting is ...
Out-of-core implementation for accelerator kernels on heterogeneous clouds
Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
A performance study of general-purpose applications on graphics processors using CUDA
Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Comments