skip to main content
10.1145/2834899.2834903acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

Modeling performance and energy for applications offloaded to Intel Xeon Phi

Published:15 November 2015Publication History

ABSTRACT

Accelerators are adopted to increase performance, reduce time-to-solution, and minimize energy-to-solution. However, employing them efficiently, given system and application characteristics, is often a daunting task. A goal of this work is to propose a general model that predicts performance and power requirements for an application, computational portions of which are offloaded to an accelerator. Intel Xeon Phi is the only accelerator type investigated here, and only in offload execution mode. This mode is also employed by other accelerator types, such as GPU; thus the proposed model is applicable directly. The predictive capabilities of the model are demonstrated by determining the best hardware-software configuration instances with respect to the minimum energy consumption for the CoMD proxy application executed on single or multiple nodes. For the CoMD problem sizes investigated here, the best modeled configuration was relatively close to the best measured configuration with relative error under 5% of the energy consumed for most configurations. Initial model validation also confirmed the model accuracy for a variety of model parameters, such as host computation time and power consumption on the host and accelerator. The model also provides estimates of the total data movement and computational throughput as well as of some key metrics, such as FLOPs-per-joule and bytes-per-joule, which are commonly used to study the energy-performance trade-offs.

References

  1. S. Cepeda. Optimization and performance tuning for Intel Xeon Phi coprocessors, part 2: Understanding and using hardware events, 2012. https://software.intel.com/en-us/articles/.Google ScholarGoogle Scholar
  2. J. Choi, M. Mukhan, X. Liu, and R. Vudue. Algorithmic time, energy, and power on candidate HPC compute building blocks. In 2014 IEEE 28th International Symposium on Parallel Distributed Processing (IPDPS), Arizona, USA, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc. A roofline model of energy. In Parallel Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pages 661--672, May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Choi, R. Soma, and M. Pedram. Fine-grained dynamic voltage and frequency scaling for precise energy and performance tradeoff based on the ratio of off-chip access to on-chip computation times. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, Jan 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Corden. How to compile for Intel AVX, 2012. https://software.intel.com/en-us/articles/how-to-compile-for-intel-avx.Google ScholarGoogle Scholar
  6. DOE. Co-design, 2013. http://science.energy.gov/ascr/research/scidac/co-design/.Google ScholarGoogle Scholar
  7. ExMatEx. CoMD proxy application, 2012. http://www.exmatex.org/comd.html.Google ScholarGoogle Scholar
  8. R. Hayashi and S. Horiguchi. Domain decomposition scheme for parallel molecular dynamics simulation. In High Performance Computing on the Information Superhighway, 1997. HPC Asia '97, pages 595--600, Apr 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. ICL:UT. Performance application programming interface PAPI, 2015. http://icl.cs.utk.edu/papi/.Google ScholarGoogle Scholar
  10. Intel. How to use huge pages to improve application performance on pIntel Xeon Phi coprocessor, 2012. https://software.intel.com/sites/default/files/Large_pages_mic_0.pdf.Google ScholarGoogle Scholar
  11. G. Lawson, M. Sosonkina, and Yuzhong S. Energy evaluation for applications with different thread affinities on the Intel Xeon Phi. In Computer Architecture and High Performance Computing Workshop (SBAC-PADW), 2014 International Symposium on, Oct 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Lawson, M. Sosonkina, and Y. Shen. Performance and energy evaluation of CoMD on Intel Xeon Phi co-processors. In Proceedings of the 1st International Workshop on Hardware-Software Co-Design for High Performance Computing, Co-HPC '14, Piscataway, NJ, USA, 2014. IEEE Press. http://dx.doi.org/10.1109/Co-HPC.2014.12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. G. Lawson, M. Sosonkina, and Y. Shen. Changing CPU frequency in CoMD proxy application offloaded to Intel Xeon Phi co-processors. Procedia Computer Science, 51(0):100--109, 2015. International Conference On Computational Science, ICCS 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. G. Lawson, M. Sosonkina, and Y. Shen. Towards modeling energy consumption of Xeon Phi. CoRR, abs/1505.06539, 2015. http://arxiv.org/abs/1505.06539.Google ScholarGoogle Scholar
  15. G. Lawson, V. Sundriyal, M. Sosonkina, and Y. Shen. Experimentation procedure for offloaded mini-apps executed on cluster architectures with Xeon Phi accelerators, 2015. http://arxiv.org/abs/1509.02135.Google ScholarGoogle Scholar
  16. B. Li, H. Chang, S. L. Song, C. Su, T. Meyer, J. Mooring, and K. Cameron. The power-performance tradeoffs of the Intel Xeon Phi on HPC applications, 2014. http://scape.cs.vt.edu/wp-content/uploads/2014/06/lspp14-Li.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Mohd-Yusof, S. Swaminarayan, and T. C. Germann. Co-design for molecular dynamics: An exascale proxy application, 2013. http://www.lanl.gov/orgs/adtsc/publications/science_highlights_2013/docs/Pg88_89.pdf.Google ScholarGoogle Scholar
  18. Y. S. Shao and D. Brooks. Energy characterization and instruction-level energy model of Intel's Xeon Phi processor, 2013. http://www.eecs.harvard.edu/~shao/papers/shao2013-islped.pdf.Google ScholarGoogle Scholar
  19. V. Sundriyal and M. Sosonkina. Analytical modeling of the CPU frequency to minimize energy consumption in parallel applications. Submitted for publication to: Elsevier, 2015.Google ScholarGoogle Scholar
  20. S. Williams, A. Waterman, and D. Patterson. Roofline: An insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, April 2009. http://doi.acm.org/10.1145/1498765.1498785. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Modeling performance and energy for applications offloaded to Intel Xeon Phi

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            Co-HPC '15: Proceedings of the 2nd International Workshop on Hardware-Software Co-Design for High Performance Computing
            November 2015
            61 pages
            ISBN:9781450339926
            DOI:10.1145/2834899

            Copyright © 2015 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 November 2015

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Co-HPC '15 Paper Acceptance Rate7of13submissions,54%Overall Acceptance Rate7of13submissions,54%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader