skip to main content
10.1145/2925426.2926289acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Variation Among Processors Under Turbo Boost in HPC Systems

Published: 01 June 2016 Publication History

Abstract

The design and manufacture of present-day CPUs causes inherent variation in supercomputer architectures such as variation in power and temperature of the chips. The variation also manifests itself as frequency differences among processors under Turbo Boost dynamic overclocking. This variation can lead to unpredictable and suboptimal performance in tightly coupled HPC applications. In this study, we use compute-intensive kernels and applications to analyze the variation among processors in four top supercomputers: Edison, Cab, Stampede, and Blue Waters. We observe that there is an execution time difference of up to 16% among processors on the Turbo Boost-enabled supercomputers: Edison, Cab, Stampede. There is less than 1% variation on Blue Waters, which does not have a dynamic overclocking feature. We analyze measurements from temperature and power instrumentation and find that intrinsic differences in the chips' power efficiency is the culprit behind the frequency variation. Moreover, we analyze potential solutions such as disabling Turbo Boost, leaving idle cores and replacing slow chips to mitigate the variation. We also propose a speed-aware dynamic task redistribution (load balancing) algorithm to reduce the negative effects of performance variation. Our speed-aware load balancing algorithm improves the performance up to 18% compared to no load balancing performance and 6% better than the non-speed aware counterpart.

References

[1]
Cab supercomputer at LLNL. https://computing.llnl.gov/tutorials/bgq/https://computing.llnl.gov/?set=resources&page=OCF_resources#cab.
[2]
Edison Supercomputer at NERSC. https://www.nersc.gov/users/computational-systems/edison/.
[3]
Intel Turbo Boost Technology 2.0. http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-technology.html.
[4]
Intel Xeon Processor E5 v2 Product Family, Specification Update. http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v2-spec-update.pdf.
[5]
Lenovo showcases high-performance computing innovations at supercomputing 2014. http://news.lenovo.com/article display.cfm?article_id=1865.
[6]
PAPI 5.4.1.0, Cycle Ratio. https://icl.cs.utk.edu/papi/docs/da/dab/cycle__ratio_8c_source.html.
[7]
Stampede supercomputer at TACC. https://www.tacc.utexas.edu/stampede/.
[8]
Brian Austin and Nicholas J Wright. Measurement and interpretation of microbenchmark and application energy use on the Cray XC30. In Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, E2SC '14, pages 51--59, Piscataway, NJ, USA, 2014. IEEE.
[9]
Ganesh Balakrishnan. Understanding Intel Xeon 5500 Turbo Boost Technology. How to Use Turbo Boost Technology to Your Advantage, IBM, 2009.
[10]
S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A Portable Programming Interface for Performance Evaluation on Modern Processors. Int. J. High Perform. Comput. Appl., 14(3):189--204, 2000.
[11]
James Charles, Preet Jassi, Narayan S Ananth, Abbas Sadat, and Alexandra Fedorova. Evaluation of the Intel® Core i7 Turbo Boost feature. In IEEE International Symposium on Workload Characterization (IISWC), 2009., pages 188--197. IEEE, 2009.
[12]
Jeonghwan Choi, Chen-Yong Cher, Hubertus Franke, Henrdrik Hamann, Alan Weger, and Pradip Bose. Thermal-aware task scheduling at the system software level. In Proceedings of the 2007 International Symposium on Low Power Electronics and Design, ISLPED '07, pages 213--218. ACM, 2007.
[13]
Saurabh Dighe, Sriram R Vangal, Paolo Aseron, Shasi Kumar, Tiju Jacob, Keith A Bowman, Jason Howard, James Tschanz, Vasantha Erraguntla, Nitin Borkar, et al. Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core teraflops processor. Solid-State Circuits, IEEE Journal of, 46(1):184--193, Jan 2011.
[14]
Rong Ge, Xizhou Feng, and Kirk W Cameron. Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC '05, Washington, DC, USA, 2005. IEEE.
[15]
Adam Hammouda, Andrew R Siegel, and Stephen F Siegel. Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans. Parallel Comput., 2(1):7:1--7:33, April 2015.
[16]
Yuichi Inadomi, Tapasya Patki, Koji Inoue, Mutsumi Aoyagi, Barry Rountree, Martin Schulz, David Lowenthal, Yasutaka Wada, Keiichiro Fukazawa, Masatsugu Ueda, et al. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 78. ACM, 2015.
[17]
Intel. Intel-64 and IA-32 Architectures Software Developer's Manual, Volume 3A and 3B: System Programming Guide, 2011.
[18]
Laxmikant Kale, Akhil Langer, and Osman Sarood. Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability. In DoE Workshop on Modeling and Simulation of Exascale Systems and Applications (MODSIM), Seattle, Washington, August 2014.
[19]
Nandini Kappiah, Vincent W Freeh, and David K Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC '05, Washington, DC, USA, 2005. IEEE.
[20]
Rakesh Kumar, Keith Farkas, Norman P Jouppi, Parthasarathy Ranganathan, Dean M Tullsen, et al. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 81--92. IEEE, 2003.
[21]
Akhil Langer, Ehsan Totoni, Udatta S. Palekar, and Laxmikant V. Kalé. Energy-efficient computing for hpc workloads on heterogeneous manycore chips. In Proceedings of Programming Models and Applications on Multicores and Manycores. ACM, 2015.
[22]
Harshitha Menon, Bilge Acun, Simon Garcia De Gonzalo, Osman Sarood, and Laxmikant Kalé. Thermal aware automated load balancing for hpc applications. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--8. IEEE, 2013.
[23]
National Center for Supercomputing Applications. Blue Waters project. http://www.ncsa.illinois.edu/BlueWaters/.
[24]
Fabrizio Petrini, Darren Kerbyson, and Scott Pakin. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In ACM/IEEE SC2003, Phoenix, Arizona, November 10--16, 2003.
[25]
James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kalé, and Klaus Schulten. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26(16):1781--1802, 2005.
[26]
Efraim Rotem, Alon Naveh, Avinash Ananthakrishnan, Doron Rajwan, and Eliezer Weissmann. Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro, (2):20--27, 2012.
[27]
Barry Rountree, Dong H Ahn, Bronis R de Supinski, David K Lowenthal, and Martin Schulz. Beyond DVFS: A First Look at Performance Under a Hardware-enforced Power Bound. In IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012.
[28]
Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant V. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '14, New York, NY, USA, 2014. ACM.
[29]
Osman Sarood, Esteban Meneses, and L. V. Kale. A `Cool' Way of Improving the Reliability of HPC Machines. In Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, November 2013.
[30]
Ehsan Totoni. Power and Energy Management of Modern Architectures in Adaptive HPC Runtime Systems. PhD thesis, Dept. of Computer Science, University of Illinois, 2014.
[31]
Lizhe Wang, Gregor von Laszewski, Jai Dayal, and Thomas R Furlani. Thermal aware workload scheduling with backfilling for green data centers. In Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International, pages 289--296. IEEE, 2009.
[32]
Samuel Williams, Andrew Waterman, and David Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009.
[33]
Kaicheng Zhang, Seda Ogrenci-Memik, Gokhan Memik, Kazutomo Yoshii, Rajesh Sankaran, and Pete Beckman. Minimizing thermal variation across system components. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pages 1139--1148. IEEE, 2015.
[34]
Gengbin Zheng. Achieving high performance on extremely large parallel machines: performance prediction and load balancing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.

Cited By

View all
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
  • (2024)Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00110(1504-1517)Online publication date: 2-Nov-2024
  • (2024)HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00051(613-627)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICS '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)25
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
  • (2024)Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00110(1504-1517)Online publication date: 2-Nov-2024
  • (2024)HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00051(613-627)Online publication date: 29-Jun-2024
  • (2024)TacVar: Tackling Variability in Short-Interval Timing Measurements on X86 Processors2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00062(496-506)Online publication date: 6-May-2024
  • (2023)Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC BenchmarksIEICE Transactions on Electronics10.1587/transele.2022LHP0001E106.C:6(303-311)Online publication date: 1-Jun-2023
  • (2023)Characterizing Performance Impacts of Subnormal Numbers on Vector Instructions and Transcendental Functions2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00120(799-806)Online publication date: 17-Dec-2023
  • (2023)Reducing energy consumption using heterogeneous voltage frequency scaling of data-parallel applications for multicore systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.01.005175(121-133)Online publication date: May-2023
  • (2023)AOA: Adaptive Overclocking Algorithm on CPU-GPU Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_14(253-272)Online publication date: 11-Jan-2023
  • (2022)Not all GPUs are created equalProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571971(1-15)Online publication date: 13-Nov-2022
  • (2022)Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545084(1-12)Online publication date: 29-Aug-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media