research-article

Public Access

Modeling performance and energy for applications offloaded to Intel Xeon Phi

Authors:
Gary Lawson

Old Dominion University, Norfolk, Virginia

Old Dominion University, Norfolk, Virginia
View Profile

,
Vaibhav Sundriyal

Old Dominion University, Norfolk, Virginia

Old Dominion University, Norfolk, Virginia
View Profile

,
Masha Sosonkina

Old Dominion University, Norfolk, Virginia

Old Dominion University, Norfolk, Virginia
View Profile

,
Yuzhong Shen

Old Dominion University, Norfolk, Virginia

Old Dominion University, Norfolk, Virginia
View Profile

Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance ComputingNovember 2015Article No.: 7Pages 1–8https://doi.org/10.1145/2834899.2834903

Published:15 November 2015Publication History

Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing

Pages 1–8

ABSTRACT

Accelerators are adopted to increase performance, reduce time-to-solution, and minimize energy-to-solution. However, employing them efficiently, given system and application characteristics, is often a daunting task. A goal of this work is to propose a general model that predicts performance and power requirements for an application, computational portions of which are offloaded to an accelerator. Intel Xeon Phi is the only accelerator type investigated here, and only in offload execution mode. This mode is also employed by other accelerator types, such as GPU; thus the proposed model is applicable directly. The predictive capabilities of the model are demonstrated by determining the best hardware-software configuration instances with respect to the minimum energy consumption for the CoMD proxy application executed on single or multiple nodes. For the CoMD problem sizes investigated here, the best modeled configuration was relatively close to the best measured configuration with relative error under 5% of the energy consumed for most configurations. Initial model validation also confirmed the model accuracy for a variety of model parameters, such as host computation time and power consumption on the host and accelerator. The model also provides estimates of the total data movement and computational throughput as well as of some key metrics, such as FLOPs-per-joule and bytes-per-joule, which are commonly used to study the energy-performance trade-offs.

References

S. Cepeda. Optimization and performance tuning for Intel Xeon Phi coprocessors, part 2: Understanding and using hardware events, 2012. https://software.intel.com/en-us/articles/.Google Scholar
J. Choi, M. Mukhan, X. Liu, and R. Vudue. Algorithmic time, energy, and power on candidate HPC compute building blocks. In 2014 IEEE 28th International Symposium on Parallel Distributed Processing (IPDPS), Arizona, USA, May 2014. Google ScholarDigital Library
J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. A roofline model of energy. In Parallel Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 661--672, May 2013. Google ScholarDigital Library
K. Choi, R. Soma, and M. Pedram. Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Jan 2005. Google ScholarDigital Library
M. Corden. How to compile for Intel AVX, 2012. https://software.intel.com/en-us/articles/how-to-compile-for-intel-avx.Google Scholar
DOE. Co-design, 2013. http://science.energy.gov/ascr/research/scidac/co-design/.Google Scholar
ExMatEx. CoMD proxy application, 2012. http://www.exmatex.org/comd.html.Google Scholar
R. Hayashi and S. Horiguchi. Domain decomposition scheme for parallel molecular dynamics simulation. In High Performance Computing on the Information Superhighway, 1997. HPC Asia '97, pages 595--600, Apr 1997. Google ScholarDigital Library
ICL:UT. Performance application programming interface PAPI, 2015. http://icl.cs.utk.edu/papi/.Google Scholar
Intel. How to use huge pages to improve application performance on pIntel Xeon Phi coprocessor, 2012. https://software.intel.com/sites/default/files/Large_pages_mic_0.pdf.Google Scholar
G. Lawson, M. Sosonkina, and Yuzhong S. Energy evaluation for applications with different thread affinities on the Intel Xeon Phi. In Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014 International Symposium on, Oct 2014. Google ScholarDigital Library
G. Lawson, M. Sosonkina, and Y. Shen. Performance and energy evaluation of CoMD on Intel Xeon Phi co-processors. In Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC '14, Piscataway, NJ, USA, 2014. IEEE Press. http://dx.doi.org/10.1109/Co-HPC.2014.12. Google ScholarDigital Library
G. Lawson, M. Sosonkina, and Y. Shen. Changing CPU frequency in CoMD proxy application offloaded to Intel Xeon Phi co-processors. Procedia Computer Science, 51(0):100--109, 2015. International Conference On Computational Science, ICCS 2015.Google ScholarDigital Library
G. Lawson, M. Sosonkina, and Y. Shen. Towards modeling energy consumption of Xeon Phi. CoRR, abs/1505.06539, 2015. http://arxiv.org/abs/1505.06539.Google Scholar
G. Lawson, V. Sundriyal, M. Sosonkina, and Y. Shen. Experimentation procedure for offloaded mini-apps executed on cluster architectures with Xeon Phi accelerators, 2015. http://arxiv.org/abs/1509.02135.Google Scholar
B. Li, H. Chang, S. L. Song, C. Su, T. Meyer, J. Mooring, and K. Cameron. The power-performance tradeoffs of the Intel Xeon Phi on HPC applications, 2014. http://scape.cs.vt.edu/wp-content/uploads/2014/06/lspp14-Li.pdf.Google ScholarDigital Library
J. Mohd-Yusof, S. Swaminarayan, and T. C. Germann. Co-design for molecular dynamics: An exascale proxy application, 2013. http://www.lanl.gov/orgs/adtsc/publications/science_highlights_2013/docs/Pg88_89.pdf.Google Scholar
Y. S. Shao and D. Brooks. Energy characterization and instruction-level energy model of Intel's Xeon Phi processor, 2013. http://www.eecs.harvard.edu/~shao/papers/shao2013-islped.pdf.Google Scholar
V. Sundriyal and M. Sosonkina. Analytical modeling of the CPU frequency to minimize energy consumption in parallel applications. Submitted for publication to: Elsevier, 2015.Google Scholar
S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, April 2009. http://doi.acm.org/10.1145/1498765.1498785. Google ScholarDigital Library

Index Terms

Modeling performance and energy for applications offloaded to Intel Xeon Phi

Recommendations

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

The Intel® Xeon Phi™ coprocessor platform has a new software stack that enables new programming models. One such model is offload of computation from a host processor to a coprocessor that is a fully-capable Intel® Architecture CPU, namely, the Intel® ...
Read More
Energy and Power Characterization of Parallel Programs Running on Intel Xeon Phi
ICPPW '14: Proceedings of the 2014 43rd International Conference on Parallel Processing Workshops

Intel's Xeon Phi coprocessor has successfully proved its capability by being used in Tianhe-2 and Stampede, two of the top ten most powerful supercomputers today. It is almost certain that the popularity of Xeon Phi in heterogeneous computing will grow ...
Read More
Direct MPI Library for Intel Xeon Phi Co-Processors
IPDPSW '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

DCFA-MPI is an MPI library implementation for Intel Xeon Phi co-processor clusters, where a compute node consists of an Intel Xeon Phi co-processor card connected to the host via PCI Express with InfiniBand. DCFA-MPI enables direct data transfer between ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing
November 2015
61 pages
ISBN:9781450339926
DOI:10.1145/2834899

Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 November 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HPC cluster
Xeon Phi
energy
heterogeneous architecture modeling
offload
performance
Qualifiers
- research-article
Conference

Acceptance Rates
Co-HPC '15 Paper Acceptance Rate7of13submissions,54%Overall Acceptance Rate7of13submissions,54%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 348
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Modeling performance and energy for applications offloaded to Intel Xeon Phi

Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Energy and Power Characterization of Parallel Programs Running on Intel Xeon Phi

Direct MPI Library for Intel Xeon Phi Co-Processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Modeling performance and energy for applications offloaded to Intel Xeon Phi

Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor

Energy and Power Characterization of Parallel Programs Running on Intel Xeon Phi

Direct MPI Library for Intel Xeon Phi Co-Processors

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Offload Compiler Runtime for the Intel® Xeon Phi Coprocessor