ABSTRACT
Graphics Processing Units (GPUs) are a low cost, low power means of exploiting large-scale parallelism. Source-to-source transformation tools for mapping CPU code to GPU code (e.g. PGI Accelerator) are available. But identification of those code segments in an application that, when run on a GPU will attain significant performance enhancement, requires expert knowledge of algorithms, architectures, compilers and the program structure which many application developers may not possess. This poster demonstrates a process for identifying the code segments in programs optimized for multicore chip execution that are candidates for GPU execution and ranking these code segments by probable speedup. The identification and ranking are based on measurements of the programs by the PerfExpert tool and a new tool MACPO, which measures execution properties of data structures. The poster describes the identification and ranking process, gives the results of applying the process to the Rodina parallel benchmarks and gives the underlying assumptions for and the limitations of the process.
- M. Wolfe, "Implementing the pgi accelerator model." in GPGPU, 2010. pp. 43--50. Google ScholarDigital Library
- M. Burtscher, B. D. Kim, J. Diamond, J. Mccalpin, L. Koesterke, and J. Browne. "PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications," in Computer. IEEE, 2010, pp. 1--11. Google ScholarDigital Library
- O. A. Sopeju, M. Burtscher, A. Rane, and J. Browne, "AutoSCOPE : Automatic Suggestions for Code Optimizations using PerfExpert," Evaluation.Google Scholar
- A. Rane and J. Browne, "Performance optimization of data structures using memory access characterization," in CLUSTER. IEEE, 2011, pp. 570--574. Google ScholarDigital Library
- S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing,". 2009 IEEE International Symposium on Workload Characterization IISWC. vol. 2009, no. c, pp. 44--54, 2009. Google ScholarDigital Library
Index Terms
- Poster: determining code segments that can benefit from execution on GPUs
Recommendations
Electronic poster: a massively parallel lattice Monte Carlo algorithm in CUDA for thermal conduction simulations
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionWe present a highly parallel CUDA kernel based on the Lattice Monte Carlo (LMC) method for transient thermal conduction, which achieves a peak acceleration of more than 100x over a single-threaded Fortran version. A number of memory and branching ...
Poster: 3D tixels: a highly efficient algorithm for gpu/cpu-acceleration of molecular dynamics on heterogeneous parallel architectures
SC '11 Companion: Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis CompanionSeveral GPU-based algorithms have been developed to accelerate biomolecular simulations, but although they provide benefits over single-core implementations, they have not been able to surpass the performance of state-of-the art SIMD CPU implementations ...
POSTER: Pagoda: A Runtime System to Maximize GPU Utilization in Data Parallel Tasks with Limited Parallelism
PACT '16: Proceedings of the 2016 International Conference on Parallel Architectures and CompilationMassively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains ...
Comments