research-article

Public Access

Cross-Accelerator Performance Profiling

Authors:
Esthela Gallardo

University of Texas at El Paso, El Paso, TX

University of Texas at El Paso, El Paso, TX
View Profile

,
Patricia J. Teller

University of Texas at El Paso, El Paso, TX

University of Texas at El Paso, El Paso, TX
View Profile

,
Arturo Argueta

University of Notre Dame, Notre Dame, IN

University of Notre Dame, Notre Dame, IN
View Profile

,
Jaime Jaloma

University of Texas at El Paso, El Paso, TX

University of Texas at El Paso, El Paso, TX
View Profile

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at ScaleJuly 2016Article No.: 19Pages 1–8https://doi.org/10.1145/2949550.2949567

Published:17 July 2016Publication History

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

Pages 1–8

ABSTRACT

The computing requirements of scientific applications have influenced processor design, and have motivated the introduction and use of many-core processors, i.e., accelerators, for high performance computing (HPC). Consequently, it is now common for the compute nodes of HPC clusters to be comprised of multiple computing devices, including accelerators. Although execution time can be used to compare the performance of different computing devices, there exists no standard way to analyze application performance across devices with very different architectural designs and, thus, understand why one outperforms another. Without this knowledge, a developer is handicapped when attempting to effectively tune application performance, as is a hardware designer when trying to understand how best to improve the design of computing devices. In this paper, we use the LULESH 1.0 proxy application to compare and analyze the performance of three different accelerators: the Intel® Xeon Phi™ and the NVIDIA Fermi and Kepler GPUs. Our study shows that LULESH 1.0 exhibits similar execution-time behavior across the three accelerators, but runs up to 7X faster on the Kepler. Despite the significant architectural differences between the Xeon Phi™ and the GPUs, and the differences in the metrics used to characterize their performance, we were able to quantify why the Kepler outperforms both the Fermi and the Xeon Phi™. To do this, we compared their achieved instructions per cycle and vectorization usage, as well as their memory behavior and power and energy consumption.

References

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proc. of the 2009 IEEE Int. Symp. on Workload Characterization, IISWC '09, pages 44--54, Washington, DC, USA, 2009. IEEE Computer Society. Google ScholarDigital Library
A. Danalis, G. Marin, C. McCurdy, J. S. Meredith, P. C. Roth, K. Spafford, V. Tipparaju, and J. S. Vetter. The Scalable Heterogeneous Computing (SHOC) Benchmark Suite. In Proc. of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU-3, pages 63--74, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
E. Gallardo. A Case Study of Accelerator Performance. Master's thesis, University of Texas at El Paso, El Paso, TX, 2015.Google Scholar
J. Jeffers and J. Reinders. Intel Xeon Phi Coprocessor High-Performance Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013. Google ScholarDigital Library
I. Karlin, A. Bhatele, B. L. Chamberlain, J. Cohen, Z. Devito, M. Gokhale, R. Haque, R. Hornung, J. Keasler, D. Laney, E. Luke, S. Lloyd, J. McGraw, R. Neely, D. Richards, M. Schulz, C. H. Still, F. Wang, and D. Wong. LULESH Programming Model and Performance Ports Overview. Technical Report LLNL-TR-608824, December 2012.Google ScholarCross Ref
K. Krommydas, W. C. Feng, C. D. Antonopoulos, and N. Bellas. OpenDwarfs: Characterization of Dwarf-Based Benchmarks on Fixed and Reconfigurable Architectures. Journal of Signal Processing Systems, pages 1--20, 2015. Google ScholarDigital Library
K. Krommydas, T. Scogland, and W. C. Feng. On the Programmability and Performance of Heterogeneous Platforms. In Proc. of the 19th IEEE Int. Conf. on Parallel and Distributed Systems, ICPADS, pages 224--231, Washington, DC, USA, 2013. IEEE Computer Society. Google ScholarDigital Library
J. LaGrone, A. Aribuki, and B. Chapman. A Set of Microbenchmarks for Measuring OpenMP Task Overheads. In Proc. of the Int. Conf. on Parallel and Distributed Processing Techniques and Applications, pages 594--600. Citeseer, 2011.Google Scholar
B. Li, H. C. Chang, S. Song, C. Y. Su, T. Meyer, J. Mooring, and K. W. Cameron. The Power-Performance Tradeoffs of the Intel Xeon Phi on HPC Applications. In Proc. of the 2014 IEEE Int. Parallel & Distributed Processing Symp. Workshops, IPDPSW '14, pages 1448--1456, Washington, DC, USA, 2014. IEEE Computer Socieity. Google ScholarDigital Library
P. J. Mucci, S. Browne, C. Deane, and G. Ho. PAPI: A Portable Interface to Hardware Performance Counters. In Proc. of the Dept. of Defense HPCMP Users Group Conf., pages 7--10, 1999.Google Scholar
S. Muralidharan, K. O'Brien, and C. Lalanne. A Semi-Automated Tool Flow for Roofline Analysis of OpenCL Kernels on Accelerators. In Proc. of the 1st Int. Workshop on Heterogeneous High-performance Reconfigurable Computing, H2RC'15, 2015.Google Scholar
NVIDIA. NVIDIA System Management Interface. Retrieved from: https://developer.nvidia.com/nvidia-system-management-interface.Google Scholar
NVIDIA. NVIDIA Visual Profiler. Retrieved from: https://developer.nvidia.com/nvidia-visual-profiler.Google Scholar
V. V. Stegailov, N. D. Orekhov, and G. S. Smirnov. HPC Hardware Efficiency for Quantum and Classical Molecular Dynamics. In Parallel Computing Technologies, pages 469--473. Springer, 2015. Google ScholarDigital Library

Cross-Accelerator Performance Profiling

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Hybrid-core systems speedup applications by offloading certain compute operations that can run faster on hardware accelerators. However, such systems require significant programming and porting effort to gain a performance benefit from the accelerators. ...
Read More
Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Computers with hardware accelerators, also referred to as hybrid-core systems, speedup applications by offloading certain compute operations that can run faster on accelerators. Thus, it is not surprising that many of top500 supercomputers use ...
Read More
An OpenACC-based unified programming model for multi-accelerator systems
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale
July 2016
405 pages
ISBN:9781450347556
DOI:10.1145/2949550
General Chair:
Kelly Gaither
Texas Advanced Computing Center
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 17 July 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Accelerators
HPC
Profiling
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate129of190submissions,68%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 263
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Cross-Accelerator Performance Profiling

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

ABSTRACT

References

Cited By

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators

An OpenACC-based unified programming model for multi-accelerator systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Cross-Accelerator Performance Profiling

XSEDE16: Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale

ABSTRACT

References

Cited By

Recommendations

Modeling and predicting performance of high performance computing applications on hardware accelerators

Modeling and Predicting Performance of High Performance Computing Applications on Hardware Accelerators

An OpenACC-based unified programming model for multi-accelerator systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media