poster

Poster: determining code segments that can benefit from execution on GPUs

Authors:
Ashay Rane

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
Saurabh Sardeshpande

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

,
James Browne

The University of Texas at Austin, Austin, TX, USA

The University of Texas at Austin, Austin, TX, USA
View Profile

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionNovember 2011Pages 55–56https://doi.org/10.1145/2148600.2148629

Published:12 November 2011Publication History

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

Pages 55–56

ABSTRACT

Graphics Processing Units (GPUs) are a low cost, low power means of exploiting large-scale parallelism. Source-to-source transformation tools for mapping CPU code to GPU code (e.g. PGI Accelerator) are available. But identification of those code segments in an application that, when run on a GPU will attain significant performance enhancement, requires expert knowledge of algorithms, architectures, compilers and the program structure which many application developers may not possess. This poster demonstrates a process for identifying the code segments in programs optimized for multicore chip execution that are candidates for GPU execution and ranking these code segments by probable speedup. The identification and ranking are based on measurements of the programs by the PerfExpert tool and a new tool MACPO, which measures execution properties of data structures. The poster describes the identification and ranking process, gives the results of applying the process to the Rodina parallel benchmarks and gives the underlying assumptions for and the limitations of the process.

References

M. Wolfe, "Implementing the pgi accelerator model." in GPGPU, 2010. pp. 43--50. Google ScholarDigital Library
M. Burtscher, B. D. Kim, J. Diamond, J. Mccalpin, L. Koesterke, and J. Browne. "PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications," in Computer. IEEE, 2010, pp. 1--11. Google ScholarDigital Library
O. A. Sopeju, M. Burtscher, A. Rane, and J. Browne, "AutoSCOPE : Automatic Suggestions for Code Optimizations using PerfExpert," Evaluation.Google Scholar
A. Rane and J. Browne, "Performance optimization of data structures using memory access characterization," in CLUSTER. IEEE, 2011, pp. 570--574. Google ScholarDigital Library
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing,". 2009 IEEE International Symposium on Workload Characterization IISWC. vol. 2009, no. c, pp. 44--54, 2009. Google ScholarDigital Library

Index Terms

Poster: determining code segments that can benefit from execution on GPUs
1. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

We present a highly parallel CUDA kernel based on the Lattice Monte Carlo (LMC) method for transient thermal conduction, which achieves a peak acceleration of more than 100x over a single-threaded Fortran version. A number of memory and branching ...
Read More
Poster: 3D tixels: a highly efficient algorithm for gpu/cpu-acceleration of molecular dynamics on heterogeneous parallel architectures
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

Several GPU-based algorithms have been developed to accelerate biomolecular simulations, but although they provide benefits over single-core implementations, they have not been able to surpass the performance of state-of-the art SIMD CPU implementations ...
Read More
POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and Compilation

Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
November 2011
166 pages
ISBN:9781450310307
DOI:10.1145/2148600
Conference Chair:
Scott Lathrop
University of Chicago
,
Program Chairs:
Jim Costa
Sandia National Laboratories
,
William Kramer
National Center for Supercomputing Applications
Copyright © 2011 Authors
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 November 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPU
Qualifiers
- poster
Conference

Acceptance Rates
Overall Acceptance Rate1,516of6,373submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 364
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Poster: determining code segments that can benefit from execution on GPUs

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

ABSTRACT

References

Cited By

Index Terms

Recommendations

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations

Poster: 3D tixels: a highly efficient algorithm for gpu/cpu-acceleration of molecular dynamics on heterogeneous parallel architectures

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Poster: determining code segments that can benefit from execution on GPUs

SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

ABSTRACT

References

Cited By

Index Terms

Recommendations

Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations

Poster: 3D tixels: a highly efficient algorithm for gpu/cpu-acceleration of molecular dynamics on heterogeneous parallel architectures

POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media