skip to main content
10.1145/2925426.2926279acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes

Published: 01 June 2016 Publication History

Abstract

Current large scale systems show increasing power demands, to the point that it has become a huge strain on facilities and budgets. Researchers in academia, labs and industry are focusing on dealing with this "power wall", striving to find a balance between performance and power consumption. Some commodity processors enable power capping, which opens up new opportunities for applications to directly manage their power behavior at user level. However, while power capping ensures a system will never exceed a given power limit, it also leads to a new form of heterogeneity: natural manufacturing variability, which was previously hidden by varying power to achieve homogeneous performance, now results in heterogeneous performance caused by different CPU frequencies, potentially for each core, to enforce the power limit.
In this work we show how a parallel runtime system can be used to effectively deal with this new kind of performance heterogeneity by compensating the uneven effects of power capping. In the context of a NUMA node composed of several multi-core sockets, our system is able to optimize the energy and concurrency levels assigned to each socket to maximize performance. Applied transparently within the parallel runtime system, it does not require any programmer interaction like changing the application source code or manually reconfiguring the parallel system. We compare our novel runtime analysis with an offline approach and demonstrate that it can achieve equal performance at a fraction of the cost.

References

[1]
P. E. Bailey, A. Marathe, D. K. Lowenthal, B. Rountree, and M. Schulz. Finding the limits of power-constrained application performance. In SC, pages 79:1--79:12, 2015.
[2]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: Characterization and architectural implications. In PACT, pages 72--81, 2008.
[3]
R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In PPoPP, pages 207--216, 1995.
[4]
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. J. ACM, 46(5):720--748, Sept. 1999.
[5]
S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De. Parameter variations and impact on circuits and microarchitecture. In DAC, pages 338--342, 2003.
[6]
BSC. Programming models group. the nanos++ parallel runtime. https://pm.bsc.es/nanox, 2015.
[7]
M. Casas, R. M. Badia, and J. Labarta. Automatic phase detection and structure extraction of mpi applications. Int. J. High Perform. Comput. Appl., 24(3):335--360, Aug. 2010.
[8]
M. Casas, M. Moreto, L. Alvarez, E. Castillo, D. Chasapis, T. Hayes, L. Jaulmes, O. Palomar, O. Unsal, A. Cristal, E. Ayguade, J. Labarta, and M. Valero. Euro-Par 2015, chapter Runtime-Aware Architectures, pages 16--27. August 2015.
[9]
D. Chasapis, M. Casas, M. Moretó, R. Vidal, E. Ayguadé, J. Labarta, and M. Valero. Parsecss: Evaluating the impact of task parallelism in the parsec benchmark suite. ACM Trans. Archit. Code Optim., 12(4):41:1--41:22, Dec. 2015.
[10]
R. Cochran, C. Hankendi, A. K. Coskun, and S. Reda. Pack & cap: Adaptive dvfs and thread packing under power caps. In MICRO, pages 175--185, 2011.
[11]
J. D. Davis, S. Rivoire, M. Goldszmidt, and E. K. Ardestani. Accounting for Variability in Large-Scale Cluster Power Models. In EXERT, 2011.
[12]
J. W. Demmel. Applied Numerical Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 1997.
[13]
D. A. Ellsworth, A. D. Malony, B. Rountree, and M. Schulz. POW: System-wide Dynamic Reallocation of Limited Power in HPC. In HPDC, pages 145--148, 2015.
[14]
M. Etinski, J. Corbalan, J. Labarta, and M. Valero. Linear programming based parallel job scheduling for power constrained systems. In HPCS, pages 72--80, July 2011.
[15]
L. R. Harriott. Limits of lithography. Proceedings of the IEEE, 89(3):366--374, Mar 2001.
[16]
S. Herbert, S. Garg, and D. Marculescu. Exploiting process variability in voltage/frequency control. IEEE Trans. Very Large Scale Integr. Syst., 20(8):1392--1404, Aug. 2012.
[17]
S. Herbert and D. Marculescu. Variation-aware dynamic voltage/frequency scaling. In HPCA, pages 301--312, 2009.
[18]
Y. Inadomi, T. Patki, K. Inoue, M. Aoyagi, B. Rountree, M. Schulz, D. Lowenthal, Y. Wada, K. Fukazawa, M. Ueda, M. Kondo, and I. Miyoshi. Analyzing and mitigating the impact of manufacturing variability in power-constrained supercomputing. In SC, pages 78:1--78:12, 2015.
[19]
Intel. Intel-64 and IA-32 Architectures Software Developer's Manual. Intel, December 2011.
[20]
K. E. Isaacs, A. Bhatele, J. Lifflander, D. Böhme, T. Gamblin, M. Schulz, B. Hamann, and P.-T. Bremer. Recovering logical structure from charm++ event traces. In SC, pages 49:1--49:12, 2015.
[21]
B. Lin, A. Mallik, P. Dinda, G. Memik, and R. Dick. User- and process-driven dynamic voltage and frequency scaling. In ISPASS, pages 11--22, April 2009.
[22]
Livermore Computing. The Catalyst supercomputer. http://computation.llnl.gov/computers/catalyst, 2014.
[23]
A. Marathe, P. Bailey, D. Lowenthal, B. Rountree, M. Schulz, and B. de Supinski. A run-time system for power-constrained HPC applications. In High Performance Computing, volume 9137 of Lecture Notes in Computer Science, pages 394--408. 2015.
[24]
T. Patki, D. K. Lowenthal, B. Rountree, M. Schulz, and B. R. de Supinski. Exploring hardware overprovisioning in power-constrained, high performance computing. In ICS, pages 173--182, 2013.
[25]
N. Rajovic, P. Carpenter, I. Gelado, N. Puzovic, A. Ramirez, and M. Valero. Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? In SC, pages 1--12, Nov 2013.
[26]
K. Ravichandran, S. Lee, and S. Pande. Work stealing for multi-core hpc clusters. In Euro-Par, pages 205--217, 2011.
[27]
B. Rountree, D. Ahn, B. de Supinski, D. Lowenthal, and M. Schulz. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In IPDPS Workshops PhD Forum, pages 947--953, May 2012.
[28]
P. B. S. Ashby and, J. Chen, P. Colella, B. Collins, D. Crawford, J. Dongarra, D. Kothe, R. Lusk, P. Messina, T. Mezzacappa, P. Moin, M. Norman, R. Rosner, V. Sarkar, A. Siegel, F. Streitz, A. White, and M. Wright. The opportunities and challenges of exascale computing. DOE Technical Report, 2010.
[29]
S. Samaan. The impact of device parameter variations on the frequency and performance of VLSI chips. In ICCAD, pages 343--346, Nov 2004.
[30]
O. Sarood, A. Langer, A. Gupta, and L. Kale. Maximizing throughput of overprovisioned hpc data centers under a strict power budget. In SC, pages 807--818, 2014.
[31]
K. Shoga, B. Rountree, and M. Schulz. Whitelisting MSRs with msr-safe, November 2014.
[32]
R. Teodorescu and J. Torrellas. Variation-aware application scheduling and power management for chip multiprocessors. SIGARCH Comput. Archit. News, 36(3):363--374, June 2008.
[33]
E. Totoni, J. Torrellas, and L. V. Kale. Using an adaptive hpc runtime system to reconfigure the cache hierarchy. In SC, pages 1047--1058, 2014.
[34]
J. Tschanz, J. Kao, S. Narendra, R. Nair, D. Antoniadis, A. Chandrakasan, and V. De. Adaptive body bias for reducing impacts of die-to-die and within-die parameter variations on microprocessor frequency and leakage. Solid-State Circuits, IEEE Journal of, 37(11):1396--1402, Nov 2002.
[35]
M. Valero, M. Moreto, M. Casas, E. Ayguade, and J. Labarta. Runtime-aware architectures: A first approach. Supercomputing frontiers and innovations, 1(1), 2014.
[36]
G. Zheng, A. Bhatelé, E. Meneses, and L. V. Kalé. Periodic hierarchical load balancing for large supercomputers. Int. J. High Perform. Comput. Appl., 25(4):371--385, Nov. 2011.

Cited By

View all
  • (2024)PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU ClustersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00032(1-18)Online publication date: 17-Nov-2024
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
  • (2022)TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming ModelsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00085(1-15)Online publication date: Nov-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '16: Proceedings of the 2016 International Conference on Supercomputing
June 2016
547 pages
ISBN:9781450343619
DOI:10.1145/2925426
This paper is authored by an employee(s) of the United States Government and is in the public domain. Non-exclusive copying or redistribution is allowed, provided that the article citation is given and the authors and agency are clearly identified as its source.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. High Performance Computing
  2. Manufacturing Variability
  3. Parallel Architectures
  4. Parallel Programming
  5. Pararallel Runtimes
  6. Power and Energy

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICS '16
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU ClustersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00032(1-18)Online publication date: 17-Nov-2024
  • (2024)Toward Sustainable HPC: In-Production Deployment of Incentive-Based Power Efficiency Mechanism on the Fugaku SupercomputerProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC41406.2024.00030(1-16)Online publication date: 17-Nov-2024
  • (2022)TD-NUCA: Runtime Driven Management of NUCA Caches in Task Dataflow Programming ModelsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00085(1-15)Online publication date: Nov-2022
  • (2022)Not All GPUs Are Created Equal: Characterizing Variability in Large-Scale, Accelerator-Rich SystemsSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00070(01-15)Online publication date: Nov-2022
  • (2021)Efficient and Precise Profiling, Modeling and Management on Power and Performance for Power Constrained HPC SystemsIEICE Transactions on Electronics10.1587/transele.2020LHP0005E104.C:6(237-246)Online publication date: 1-Jun-2021
  • (2021)Mitigating Process Variations with Cooperative Tuning for Performance and Power through a Simple DSL2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW)10.1109/CANDARW53999.2021.00023(94-100)Online publication date: Nov-2021
  • (2020)A Case Study and Characterization of a Many-socket, Multi-tier NUMA HPC Platform2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar)10.1109/LLVMHPCHiPar51896.2020.00013(74-84)Online publication date: Nov-2020
  • (2020)Compiler Abstractions and Runtime for Extreme-scale SAR and CFD Workloads2020 IEEE/ACM Fifth International Workshop on Extreme Scale Programming Models and Middleware (ESPM2)10.1109/ESPM251964.2020.00010(1-7)Online publication date: Nov-2020
  • (2019)Power efficient job scheduling by predicting the impact of processor manufacturing variabilityProceedings of the ACM International Conference on Supercomputing10.1145/3330345.3330372(296-307)Online publication date: 26-Jun-2019
  • (2019)Contention Aware Workload and Resource Co-Scheduling on Power-Bounded Systems2019 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS.2019.8834721(1-8)Online publication date: Aug-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media