research-article

Variation Among Processors Under Turbo Boost in HPC Systems

Authors:

Laxmikant V. KaleAuthors Info & Claims

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

Article No.: 6, Pages 1 - 12

https://doi.org/10.1145/2925426.2926289

Published: 01 June 2016 Publication History

Abstract

The design and manufacture of present-day CPUs causes inherent variation in supercomputer architectures such as variation in power and temperature of the chips. The variation also manifests itself as frequency differences among processors under Turbo Boost dynamic overclocking. This variation can lead to unpredictable and suboptimal performance in tightly coupled HPC applications. In this study, we use compute-intensive kernels and applications to analyze the variation among processors in four top supercomputers: Edison, Cab, Stampede, and Blue Waters. We observe that there is an execution time difference of up to 16% among processors on the Turbo Boost-enabled supercomputers: Edison, Cab, Stampede. There is less than 1% variation on Blue Waters, which does not have a dynamic overclocking feature. We analyze measurements from temperature and power instrumentation and find that intrinsic differences in the chips' power efficiency is the culprit behind the frequency variation. Moreover, we analyze potential solutions such as disabling Turbo Boost, leaving idle cores and replacing slow chips to mitigate the variation. We also propose a speed-aware dynamic task redistribution (load balancing) algorithm to reduce the negative effects of performance variation. Our speed-aware load balancing algorithm improves the performance up to 18% compared to no load balancing performance and 6% better than the non-speed aware counterpart.

References

[1]

Cab supercomputer at LLNL. https://computing.llnl.gov/tutorials/bgq/https://computing.llnl.gov/?set=resources&page=OCF_resources#cab.

[2]

Edison Supercomputer at NERSC. https://www.nersc.gov/users/computational-systems/edison/.

[3]

Intel Turbo Boost Technology 2.0. http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-technology.html.

[4]

Intel Xeon Processor E5 v2 Product Family, Specification Update. http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v2-spec-update.pdf.

[5]

Lenovo showcases high-performance computing innovations at supercomputing 2014. http://news.lenovo.com/article display.cfm?article_id=1865.

[6]

PAPI 5.4.1.0, Cycle Ratio. https://icl.cs.utk.edu/papi/docs/da/dab/cycle__ratio_8c_source.html.

[7]

Stampede supercomputer at TACC. https://www.tacc.utexas.edu/stampede/.

[8]

Brian Austin and Nicholas J Wright. Measurement and interpretation of microbenchmark and application energy use on the Cray XC30. In Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, E2SC '14, pages 51--59, Piscataway, NJ, USA, 2014. IEEE.

Digital Library

[9]

Ganesh Balakrishnan. Understanding Intel Xeon 5500 Turbo Boost Technology. How to Use Turbo Boost Technology to Your Advantage, IBM, 2009.

[10]

S. Browne, J. Dongarra, N. Garner, G. Ho, and P. Mucci. A Portable Programming Interface for Performance Evaluation on Modern Processors. Int. J. High Perform. Comput. Appl., 14(3):189--204, 2000.

Digital Library

[11]

James Charles, Preet Jassi, Narayan S Ananth, Abbas Sadat, and Alexandra Fedorova. Evaluation of the Intel® Core i7 Turbo Boost feature. In IEEE International Symposium on Workload Characterization (IISWC), 2009., pages 188--197. IEEE, 2009.

Digital Library

[12]

Jeonghwan Choi, Chen-Yong Cher, Hubertus Franke, Henrdrik Hamann, Alan Weger, and Pradip Bose. Thermal-aware task scheduling at the system software level. In Proceedings of the 2007 International Symposium on Low Power Electronics and Design, ISLPED '07, pages 213--218. ACM, 2007.

Digital Library

[13]

Saurabh Dighe, Sriram R Vangal, Paolo Aseron, Shasi Kumar, Tiju Jacob, Keith A Bowman, Jason Howard, James Tschanz, Vasantha Erraguntla, Nitin Borkar, et al. Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core teraflops processor. Solid-State Circuits, IEEE Journal of, 46(1):184--193, Jan 2011.

[14]

Rong Ge, Xizhou Feng, and Kirk W Cameron. Performance-constrained distributed DVS scheduling for scientific applications on power-aware clusters. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC '05, Washington, DC, USA, 2005. IEEE.

Digital Library

[15]

Adam Hammouda, Andrew R Siegel, and Stephen F Siegel. Noise-tolerant explicit stencil computations for nonuniform process execution rates. ACM Trans. Parallel Comput., 2(1):7:1--7:33, April 2015.

Digital Library

[16]

Yuichi Inadomi, Tapasya Patki, Koji Inoue, Mutsumi Aoyagi, Barry Rountree, Martin Schulz, David Lowenthal, Yasutaka Wada, Keiichiro Fukazawa, Masatsugu Ueda, et al. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, page 78. ACM, 2015.

Digital Library

[17]

Intel. Intel-64 and IA-32 Architectures Software Developer's Manual, Volume 3A and 3B: System Programming Guide, 2011.

[18]

Laxmikant Kale, Akhil Langer, and Osman Sarood. Power-aware and Temperature Restrain Modeling for Maximizing Performance and Reliability. In DoE Workshop on Modeling and Simulation of Exascale Systems and Applications (MODSIM), Seattle, Washington, August 2014.

[19]

Nandini Kappiah, Vincent W Freeh, and David K Lowenthal. Just in time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs. In Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, SC '05, Washington, DC, USA, 2005. IEEE.

Digital Library

[20]

Rakesh Kumar, Keith Farkas, Norman P Jouppi, Parthasarathy Ranganathan, Dean M Tullsen, et al. Single-ISA heterogeneous multi-core architectures: The potential for processor power reduction. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 81--92. IEEE, 2003.

Digital Library

[21]

Akhil Langer, Ehsan Totoni, Udatta S. Palekar, and Laxmikant V. Kalé. Energy-efficient computing for hpc workloads on heterogeneous manycore chips. In Proceedings of Programming Models and Applications on Multicores and Manycores. ACM, 2015.

Digital Library

[22]

Harshitha Menon, Bilge Acun, Simon Garcia De Gonzalo, Osman Sarood, and Laxmikant Kalé. Thermal aware automated load balancing for hpc applications. In Cluster Computing (CLUSTER), 2013 IEEE International Conference on, pages 1--8. IEEE, 2013.

[23]

National Center for Supercomputing Applications. Blue Waters project. http://www.ncsa.illinois.edu/BlueWaters/.

[24]

Fabrizio Petrini, Darren Kerbyson, and Scott Pakin. The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q. In ACM/IEEE SC2003, Phoenix, Arizona, November 10--16, 2003.

Digital Library

[25]

James C. Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhorshid, Elizabeth Villa, Christophe Chipot, Robert D. Skeel, Laxmikant Kalé, and Klaus Schulten. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry, 26(16):1781--1802, 2005.

[26]

Efraim Rotem, Alon Naveh, Avinash Ananthakrishnan, Doron Rajwan, and Eliezer Weissmann. Power-management architecture of the Intel microarchitecture code-named Sandy Bridge. IEEE Micro, (2):20--27, 2012.

Digital Library

[27]

Barry Rountree, Dong H Ahn, Bronis R de Supinski, David K Lowenthal, and Martin Schulz. Beyond DVFS: A First Look at Performance Under a Hardware-enforced Power Bound. In IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012.

Digital Library

[28]

Osman Sarood, Akhil Langer, Abhishek Gupta, and Laxmikant V. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '14, New York, NY, USA, 2014. ACM.

Digital Library

[29]

Osman Sarood, Esteban Meneses, and L. V. Kale. A `Cool' Way of Improving the Reliability of HPC Machines. In Proceedings of The International Conference for High Performance Computing, Networking, Storage and Analysis, Denver, CO, USA, November 2013.

Digital Library

[30]

Ehsan Totoni. Power and Energy Management of Modern Architectures in Adaptive HPC Runtime Systems. PhD thesis, Dept. of Computer Science, University of Illinois, 2014.

[31]

Lizhe Wang, Gregor von Laszewski, Jai Dayal, and Thomas R Furlani. Thermal aware workload scheduling with backfilling for green data centers. In Performance Computing and Communications Conference (IPCCC), 2009 IEEE 28th International, pages 289--296. IEEE, 2009.

[32]

Samuel Williams, Andrew Waterman, and David Patterson. Roofline: an insightful visual performance model for multicore architectures. Commun. ACM, 52(4):65--76, 2009.

Digital Library

[33]

Kaicheng Zhang, Seda Ogrenci-Memik, Gokhan Memik, Kazutomo Yoshii, Rajesh Sankaran, and Pete Beckman. Minimizing thermal variation across system components. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pages 1139--1148. IEEE, 2015.

Digital Library

[34]

Gengbin Zheng. Achieving high performance on extremely large parallel machines: performance prediction and load balancing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.

Digital Library

Cited By

Solórzano ASato KYamamoto KShoji FBrandt JSchwaller BWalton SGreen JTiwari D(2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00030
Ji HVanavasam SZhou YXia QHuang JYuan YWang RGupta PChitlur BJeong IKim N(2024)Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00110(1504-1517)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00110
Huang JLou JVanavasam SKong XJi HJeong IZhuo DLee EKim N(2024)HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00051(613-627)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00051
Show More Cited By

Variation Among Processors Under Turbo Boost in HPC Systems

Recommendations

Early Performance Evaluation of New Six-Core Intel® Xeon® 5600 Family Processors for HPC
ISPDC '10: Proceedings of the 2010 Ninth International Symposium on Parallel and Distributed Computing

In this paper we take a look at what the newest member of the Intel Xeon Processor family, code named Westmere brings to high performance computing. We compare three generations of Intel Xeon based systems and present a performance evolutions based on ...
Using many-core coprocessor to boost up Erlang VM
Erlang '13: Proceedings of the twelfth ACM SIGPLAN workshop on Erlang

The trend in processor design is to build more cores on a single chip. Commercial many-core processor is emerging these years. Intel Xeon Phi coprocessor , which is equipped with at least 60 relatively slow cores, is the first commercial many-core ...
Experiences with mobile processors for energy efficient HPC
DATE '13: Proceedings of the Conference on Design, Automation and Test in Europe

The performance of High Performance Computing (HPC) systems is already limited by their power consumption. The majority of top HPC systems today are built from commodity server components that were designed for maximizing the compute performance. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '16: Proceedings of the 2016 International Conference on Supercomputing

June 2016

547 pages

ISBN:9781450343619

DOI:10.1145/2925426

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

ICS '16

Sponsor:

SIGARCH

ICS '16: 2016 International Conference on Supercomputing

June 1 - 3, 2016

Istanbul, Turkey

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
388
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Solórzano ASato KYamamoto KShoji FBrandt JSchwaller BWalton SGreen JTiwari D(2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SC41406.2024.00030
Ji HVanavasam SZhou YXia QHuang JYuan YWang RGupta PChitlur BJeong IKim N(2024)Demystifying a CXL Type-2 Device: A Heterogeneous Cooperative Computing Perspective2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00110(1504-1517)Online publication date: 2-Nov-2024
https://doi.org/10.1109/MICRO61859.2024.00110
Huang JLou JVanavasam SKong XJi HJeong IZhuo DLee EKim N(2024)HAL: Hardware-assisted Load Balancing for Energy-efficient SNIC-Host Cooperative Computing2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00051(613-627)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00051
Liao QLin J(2024)TacVar: Tackling Variability in Short-Interval Timing Measurements on X86 Processors2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00062(496-506)Online publication date: 6-May-2024
https://doi.org/10.1109/CCGrid59990.2024.00062
KODAMA YKONDO MSATO M(2023)Evaluation of Performance and Power Consumption on Supercomputer Fugaku Using SPEC HPC BenchmarksIEICE Transactions on Electronics10.1587/transele.2022LHP0001E106.C:6(303-311)Online publication date: 1-Jun-2023
https://doi.org/10.1587/transele.2022LHP0001
Wang XLin JLiao Q(2023)Characterizing Performance Impacts of Subnormal Numbers on Vector Instructions and Transcendental Functions2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00120(799-806)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00120
Bratek PSzustak LWyrzykowski ROlas T(2023)Reducing energy consumption using heterogeneous voltage frequency scaling of data-parallel applications for multicore systemsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.01.005175(121-133)Online publication date: May-2023
https://doi.org/10.1016/j.jpdc.2023.01.005
Ou ZChen JSun YXu TJiang GTan ZQi X(2023)AOA: Adaptive Overclocking Algorithm on CPU-GPU Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_14(253-272)Online publication date: 11-Jan-2023
https://doi.org/10.1007/978-3-031-22677-9_14
Sinha PGuliani AJain RTran BSinclair MVenkataraman SWolf FShende SCulhane CAlam SJagode H(2022)Not all GPUs are created equalProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571971(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571971
Yoshida KSageyama RMiwa SYamaki HHonda H(2022)Analyzing Performance and Power-Efficiency Variations among NVIDIA GPUsProceedings of the 51st International Conference on Parallel Processing10.1145/3545008.3545084(1-12)Online publication date: 29-Aug-2022
https://dl.acm.org/doi/10.1145/3545008.3545084
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten