poster

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations

Authors:
Michael Wang

The University of Melbourne, Parkville, Victoria, Australia

The University of Melbourne, Parkville, Victoria, Australia
View Profile

,
Paul J. Mignone

The University of Melbourne, Parkville, Victoria, Australia

The University of Melbourne, Parkville, Victoria, Australia
View Profile

,
Daniel P. Riley

Australian Nuclear Science and Technology Organisation, Lucas Heights, New South Wales, Australia

Australian Nuclear Science and Technology Organisation, Lucas Heights, New South Wales, Australia
View Profile

,
George V. Franks

The University of Melbourne, Parkville, Victoria, Australia

The University of Melbourne, Parkville, Victoria, Australia
View Profile

,
Thomas Fiedler

The University of Newcastle, Callaghan, New South Wales, Australia

The University of Newcastle, Callaghan, New South Wales, Australia
View Profile

,
Graeme E. Murch

The University of Newcastle, Callaghan, New South Wales, Australia

The University of Newcastle, Callaghan, New South Wales, Australia
View Profile

,
Irina V. Belova

The University of Newcastle, Callaghan, New South Wales, Australia

The University of Newcastle, Callaghan, New South Wales, Australia
View Profile

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionNovember 2011Pages 93–94https://doi.org/10.1145/2148600.2148648

Published:12 November 2011Publication History

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

Pages 93–94

ABSTRACT

We present a highly parallel CUDA kernel based on the Lattice Monte Carlo (LMC) method for transient thermal conduction, which achieves a peak acceleration of more than 100x over a single-threaded Fortran version. A number of memory and branching optimizations for the Graphic Processing Unit (GPU) architectures are described. Combining all tweaks, a fully-optimized kernel is able to outperform the initial speed-up of around 13x observed for a naïve CUDA implementation by another order of magnitude, to reach the peak performance reported (on a single NVIDIA Tesla C2050). Comparison benchmarks are also provided for the Tesla C1060, whereas the Fortran code was executed on an Intel i5 CPU running at 3.6 GHz.

Supplemental Material

Available for Download

pdf

epost106.pdf (1.4 MB)

References

I. V. Belova, G. E. Murch, T. Fiedler, and A. Öchsner. The lattice monte carlo method for solving phenomenological mass and heat transport problems. Diffusion Fundamentals, 4:15.1--15.23, 2007.Google Scholar

Index Terms

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations

Recommendations

Fast in-place sorting with CUDA based on bitonic sort
PPAM'09: Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I

State of the art graphics processors provide high processing power and furthermore, the high programmability of GPUs offered by frameworks like CUDA increases their usability as high-performance coprocessors for general-purpose computing. Sorting is ...
Read More
Out-of-core implementation for accelerator kernels on heterogeneous clouds

Cloud environments today are increasingly featuring hybrid nodes containing multicore CPU processors and a diverse mix of accelerators such as Graphics Processing Units (GPUs), Intel Xeon Phi co-processors, and Field-Programmable Gate Arrays (FPGAs) to ...
Read More
A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
November 2011
166 pages
ISBN:9781450310307
DOI:10.1145/2148600
Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
c1060
c2050
composites
cuda
diffusion
gpu
lattice monte carlo
parallel
tesla
thermal conduction
transient
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 106
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Fast in-place sorting with CUDA based on bitonic sort

Out-of-core implementation for accelerator kernels on heterogeneous clouds

A performance study of general-purpose applications on graphics processors using CUDA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Fast in-place sorting with CUDA based on bitonic sort

Out-of-core implementation for accelerator kernels on heterogeneous clouds

A performance study of general-purpose applications on graphics processors using CUDA

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media