research-article

Open access

A Reconfiguration Algorithm for Power-Aware Parallel Applications

Authors:

Daniele De Sensi,

Massimo Torquati,

Marco DaneluttoAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 13, Issue 4

Article No.: 43, Pages 1 - 25

https://doi.org/10.1145/3004054

Published: 02 December 2016 Publication History

Abstract

In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available power budget. On the other hand, for some applications, it could be possible to decrease their performance, yet maintain an acceptable level, in order to reduce their power consumption. To provide such guarantees, a possible solution consists in changing the number of cores assigned to the application, their clock frequency, and the placement of application threads over the cores. However, power consumption and performance have different trends depending on the application considered and on its input. Finding a configuration of resources satisfying user requirements is, in the general case, a challenging task.

In this article, we propose Nornir, an algorithm to automatically derive, without relying on historical data about previous executions, performance and power consumption models of an application in different configurations. By using these models, we are able to select a close-to-optimal configuration for the given user requirement, either performance or power consumption. The configuration of the application will be changed on-the-fly throughout the execution to adapt to workload fluctuations, external interferences, and/or application’s phase changes. We validate the algorithm by simulating it over the applications of the Parsec benchmark suit. Then, we implement our algorithm and we analyse its accuracy and overhead over some of these applications on a real execution environment. Eventually, we compare the quality of our proposal with that of the optimal algorithm and of some state-of-the-art solutions.

References

[1]

Ferdinando Alessi, Peter Thoman, Giorgis Georgakoudis, Thomas Fahringer, and Dimitrios S. Nikolopoulos. 2015. OpenMP: Heterogenous Execution and Data Movements 11th Intl. Workshop on OpenMP (IWOMP’15). Springer, Chapter Application-Level Energy Awareness for OpenMP, 219--232.

[2]

Pedro Alonso, Manuel F. Dolz, Rafael Mayo, and Enrique S. Quintana-Ort. 2014. Modeling power and energy of the task-parallel Cholesky factorization on multicore processors. Computer Science - Research and Development 29, 2 (2014), 105--112.

[3]

Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A view of cloud computing. Communications of the ACM 53, 4 (April 2010), 50--58.

Digital Library

[4]

Arka A. Bhattacharya, David Culler, Aman Kansal, Sriram Govindan, and Sriram Sankar. 2012. The need for speed and stability in data center power capping. In Proc. of IGCC 2012. IEEE Computer Society, 1--10.

Digital Library

[5]

A. P. Chandrakasan and R. W. Brodersen. 1995. Minimizing power consumption in digital CMOS circuits. Proc. of the IEEE 83, 4 (April 1995), 498--523.

[6]

Hao Chen, Can Hankendi, Michael C. Caramanis, and Ayse K. Coskun. 2013. Dynamic server power capping for enabling data center participation in power markets. In Proc. of the Intl. Conf. on Computer-Aided Design (ICCAD’13). IEEE, Piscataway, NJ, 122--129.

Digital Library

[7]

Ryan Cochran, Can Hankendi, Ayse Coskun, and Sherief Reda. 2011a. Identifying the optimal energy-efficient operating points of parallel workloads. (Nov. 2011), 608--615.

Digital Library

[8]

Ryan Cochran, Can Hankendi, Ayse K. Coskun, and Sherief Reda. 2011b. Pack 8 cap: Adaptive DVFS and thread packing under power caps. In Proc. of the 44th Annual IEEE/ACM Intl. Symposium on Microarchitecture (MICRO-44’11). ACM Press, New York, New York, 175.

Digital Library

[9]

M. Curtis-Maury, F. Blagojevic, C. D. Antonopoulos, and D. S. Nikolopoulos. 2008a. Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Transactions on Parallel and Distributed Systems 19, 10 (Oct. 2008), 1396--1410.

Digital Library

[10]

Matthew Curtis-Maury, Ankur Shah, Filip Blagojevic, Dimitrios S. Nikolopoulos, Bronis R. de Supinski, and Martin Schulz. 2008b. Prediction models for multi-dimensional power-performance optimization on many cores. In Proc. of the 17th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 250--259.

Digital Library

[11]

M. Danelutto, D. De Sensi, and M. Torquati. 2015. Energy driven adaptivity in stream parallel computations. In Proc. of the 2015 23rd Intl. Conf. on Parallel, Distributed and Network-Based Processing (PDP’15). 103--110.

Digital Library

[12]

Marco Danelutto and Massimo Torquati. 2015. Structured parallel programming with “core” FastFlow. In Central European Functional Programming School. LNCS, Vol. 8606. Springer, 29--75.

[13]

Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proc. of the 8th ACM Intl. Conf. on Autonomic Computing (ICAC’11). ACM, New York, NY, 31--40.

Digital Library

[14]

Daniele De Sensi. 2016. Predicting performance and power consumption of parallel applications. In Proc. of the 24th Euromicro Intl. Conf. on Parallel, Distributed, and Network-Based Processing (PDP’16).

[15]

Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. SIGARCH Comput. Archit. News 41, 1 (March 2013), 77--88.

Digital Library

[16]

Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. SIGARCH Comput. Archit. News 42, 1 (Feb. 2014), 127--144.

Digital Library

[17]

Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. 2008. A helper thread based EDP reduction scheme for adapting application execution in CMPs. In Proc. of the IEEE Intl. Symp. on Parallel and Distributed Processing. 1--14.

[18]

Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (Oct. 2012), 78--87.

Digital Library

[19]

Antonio Filieri, Henry Hoffmann, and Martina Maggio. 2014. Automated design of self-adaptive software with control-theoretical formal guarantees. In Proc. of the 36th Intl. Conf. on Software Engineering (ICSE’14). ACM, New York, NY, 299--310.

Digital Library

[20]

Anshul Gandhi, Mor Harchol-Balter, Rajarshi Das, Jeffrey Kephart, and Charles Lefurgy. 2009. Power capping via forced idleness. In Proc. of Workshop on Energy-Efficient Design (WEED 09) Austin, Texas.

[21]

Larry D. Gray, Anil Kumar, and Harry H. Li. 2008. Workload Characterization of the SPECpower_ssj2008 Benchmark. Springer, Berlin, Germany, 262--282.

Digital Library

[22]

Neil J. Gunther. 2006. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer-Verlag New York, Inc., Secaucus, NJ.

Digital Library

[23]

Marcus Hähnel, Björn Döbel, Marcus Völp, and Hermann Härtig. 2012. Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40, 3 (Jan. 2012), 13--17.

Digital Library

[24]

Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic knobs for responsive power-aware computing. SIGPLAN Not. 46, 3 (2011), 199--212.

Digital Library

[25]

Chung-hsing Hsu and Wu-chun Feng. 2005. A power-aware run-time system for high-performance computing. In Proc. of the ACM/IEEE SC 2005 Conf. 1--1.

Digital Library

[26]

N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan. 2003. Leakage current: Moore’s law meets static power. Computer 36, 12 (Dec. 2003), 68--75.

Digital Library

[27]

Sang-Jeong Lee, Hae-Kag Lee, and Pen-Chung Yew. 2007. Runtime performance projection model for dynamic power management. In Proc. of the 12th Asia-Pacific Conf. on Advances in Computer Systems Architecture (ACSAC’07). Springer-Verlag, Berlin, Germany, 186--197.

Digital Library

[28]

Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: A prelude to power shifting. Cluster Computing 11, 2 (June 2008), 183--195.

Digital Library

[29]

Jian Li and José F. Martínez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proc. of Intl. Symposium on High-Performance Computer Architecture (2006), 77--87.

[30]

M. Maggio, H. Hoffmann, M. D. Santambrogio, A. Agarwal, and A. Leva. 2010. Controlling software applications via resource allocation within the heartbeats framework. In Proc. of the 2010 49th IEEE Conf. on Decision and Control (CDC). 3736--3741.

[31]

Aniruddha Marathe, Peter E. Bailey, David K. Lowenthal, Barry Rountree, Martin Schulz, and Bronis R. de Supinski. 2015. A run-time system for power-constrained HPC applications. In Proc. of the 30th Intl. Conf. on High Performance Computing. 394--408.

[32]

Nikita Mishra, Huazhe Zhang, John D. Lafferty, and Henry Hoffmann. 2015. A probabilistic graphical model-based approach for minimizing energy under performance constraints. ACM SIGARCH Computer Architecture News 43, 1 (March 2015), 267--281.

Digital Library

[33]

Douglas C. Montgomery and Elizabeth Peck. Introduction to Linear Regression Analysis. John Wiley 8 Sons.

Digital Library

[34]

Priya Nagpurkar, Chandra Krintz, Michael Hind, Peter F. Sweeney, and V. T. Rajan. 2006. Online phase detection algorithms. In Proc. of the Intl. Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Washington, D.C., 111--123.

Digital Library

[35]

Paula Petrica, Adam M. Izraelevitz, David H. Albonesi, and Christine A. Shoemaker. 2013. Flicker: A dynamically adaptive architecture for power limited multicore systems. ACM SIGARCH Computer Architecture News 41, 3 (July 2013), 13.

Digital Library

[36]

Allan K. Porterfield, Stephen L. Olivier, Sridutt Bhalachandra, and Jan F. Prins. 2013. Power measurement and concurrency throttling for energy reduction in OpenMP programs. In Proc. of IPDPSW 2013. IEEE, 884--891.

Digital Library

[37]

Kishore Kumar Pusukuri, Rajiv Gupta, and Laxmi N. Bhuyan. 2011. Thread reinforcer: Dynamically determining number of threads via OS level monitoring. In Proc. of the 2011 IEEE Intl. Symposium on Workload Characterization (IISWC’11). IEEE Computer Society, Washington, D.C., 116--125.

Digital Library

[38]

Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated energy management across parallel applications. In Proc. of ICS 2016 (ICS’16). ACM, New York, NY, Article 4, 12 pages.

Digital Library

[39]

Barry Rountree, Dong H. Ahn, Bronis R. de Supinski, David K. Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In Proc. of the 2012 IEEE 26th Intl. Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’12). IEEE Computer Society, Washington, D.C., 947--953.

Digital Library

[40]

A. Sembrant, D. Black-Schaffer, and E. Hagersten. 2012. Phase behavior in serial and parallel applications. In Proc. of the 2012 IEEE Intl. Symposium on Workload Characterization (IISWC). 47--58.

Digital Library

[41]

A. Sembrant, D. Eklov, and E. Hagersten. 2011. Efficient software-based online phase classification. In Proc. of the 2011 IEEE Intl. Symposium on Workload Characterization (IISWC). 104--115.

Digital Library

[42]

Rishad A. Shafik, Anup Das, Sheng Yang, Geoff Merrett, and Bashir M. Al-Hashimi. 2015. Adaptive energy minimization of OpenMP parallel applications on many-core systems. In Proc. of the 6th PARMA-DITAM workshop’15. ACM Press, New York, New York, 19--24.

Digital Library

[43]

Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2013. Holistic run-time parallelism management for time and energy efficiency. In Proc. of the 27th Intl. ACM Conf. on Supercomputing - ICS’13. ACM Press, New York, New York, 337.

Digital Library

[44]

M. Aater Suleman, Moinuddin K. Qureshi, and Yale N. Patt. 2008. Feedback-driven threading. In Proc. of the 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems - ASPLOS XIII, Vol. 42. ACM Press, New York, New York, 277.

[45]

Alexander Thomasian and Paul F. Bay. 1986. Analytic queueing network models for parallel processing of task systems. IEEE Transactions on Computers, 100, 12 (1986), 1045--1054.

Digital Library

[46]

Ehsan Totoni, Nikhil Jain, and Laxmikant V. Kalé. 2015. Power management of extreme-scale networks with on/off links in runtime systems. TOPC 1, 2 (2015), 16.

Digital Library

[47]

Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. SIGPLAN Not. 50, 8 (2015), 275--276.

Digital Library

[48]

Wei Wang, A. Porterfield, J. Cavazos, and S. Bhalachandra. 2015. Using per-loop CPU clock modulation for energy efficiency in OpenMP applications. In Proc. of ICPP 2015. 629--638.

Digital Library

[49]

Fen Xie, Margaret Martonosi, and Sharad Malik. 2005. Efficient behavior-driven runtime dynamic voltage scaling policies. In Proc. of the 3rd IEEE/ACM/IFIP Intl. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). ACM, New York, NY, 105--110.

Digital Library

[50]

Albert Y. Zomaya and Young Choon Lee. 2012. Energy Efficient Distributed Computing Systems (1st ed.). Wiley-IEEE Computer Society Pr.

Digital Library

Cited By

Maas Wde Souza PLuizelli MRossi FNavaux PLorenzon A(2024)An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC SystemsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649185(138-146)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649185
da Silva Vde Lima ESchwarzrock JRossi FLuizelli MBeck ALorenzon A(2024)Synergistically Rebalancing the EDP of Container-Based Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335735335:3(484-498)Online publication date: Mar-2024
https://doi.org/10.1109/TPDS.2024.3357353
Chheda SVerma GTian SChapman BDoerfert J(2024)Evaluating Tuning Opportunities of the LLVM/OpenMP RuntimeProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00131(919-929)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00131
Show More Cited By

Index Terms

A Reconfiguration Algorithm for Power-Aware Parallel Applications

Recommendations

Power Capping of CPU-GPU Heterogeneous Systems using Power and Performance Models
SMARTGREENS 2015: Proceedings of the 4th International Conference on Smart Cities and Green ICT Systems

Recent high performance computing (HPC) systems and supercomputers are built under strict power budgets

and the limitation will be even severer. Thus power control is becoming more important, especially on the

systems with accelerators such as GPUs, ...
Benefits in Relaxing the Power Capping Constraint
ANDARE '17: Proceedings of the 1st Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems

In this manuscript we evaluate the impact of HW power capping mechanisms on a real scientific application composed by parallel execution. By comparing HW capping mechanism against static frequency allocation schemes we show that a speed up can be ...
Operation-Aware Power Capping
Euro-Par 2020: Parallel Processing
Abstract
Once the peak power draw of a large-scale high-performance-computing (HPC) cluster exceeds the capacity of its surrounding infrastructures, the cluster’s power consumption needs to be capped to avoid hardware damage. However, power capping often ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 13, Issue 4

December 2016

648 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3012405

Editor:
Koen De Bosschere
Ghent University

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2016

Accepted: 01 September 2016

Revised: 01 August 2016

Received: 01 June 2016

Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

EU FP7-ICT-2013-10 project REPARA
University of Pisa Project PRA_2016_64
EU H2020-ICT-2014-1 project REPHRASE

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
765
Total Downloads

Downloads (Last 12 months)126
Downloads (Last 6 weeks)9

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Maas Wde Souza PLuizelli MRossi FNavaux PLorenzon A(2024)An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC SystemsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649185(138-146)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3649153.3649185
da Silva Vde Lima ESchwarzrock JRossi FLuizelli MBeck ALorenzon A(2024)Synergistically Rebalancing the EDP of Container-Based Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335735335:3(484-498)Online publication date: Mar-2024
https://doi.org/10.1109/TPDS.2024.3357353
Chheda SVerma GTian SChapman BDoerfert J(2024)Evaluating Tuning Opportunities of the LLVM/OpenMP RuntimeProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00131(919-929)Online publication date: 17-Nov-2024
https://dl.acm.org/doi/10.1109/SCW63240.2024.00131
Moori MRocha HLorenzon ABeck A(2024)Efficient Thread Tuning for Asymmetric Multicores2024 37th SBC/SBMicro/IEEE Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI62366.2024.10703981(1-5)Online publication date: 2-Sep-2024
https://doi.org/10.1109/SBCCI62366.2024.10703981
Schwarzrock JLorenzon Ade Souza SBeck A(2024)Integration Framework for Online Thread Throttling with Thread and Page Mapping on NUMA Systems2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00202(1189-1192)Online publication date: 27-May-2024
https://doi.org/10.1109/IPDPSW63119.2024.00202
de Lima ERossi FLuizelli MCalheiros RLorenzon A(2024)A neural network framework for optimizing parallel computing in cloud serversJournal of Systems Architecture10.1016/j.sysarc.2024.103131150(103131)Online publication date: May-2024
https://doi.org/10.1016/j.sysarc.2024.103131
Cabrera AAlmeida FCastellanos-Nieves DOleksiak ABlanco V(2024)Energy efficient power cap configurations through Pareto front analysis and machine learning categorizationCluster Computing10.1007/s10586-023-04151-227:3(3433-3449)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10586-023-04151-2
Azhar MManivannan MStenström P(2023)Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing ConstraintsACM Transactions on Architecture and Code Optimization10.1145/360521420:3(1-25)Online publication date: 22-Jul-2023
https://dl.acm.org/doi/10.1145/3605214
Moori MRocha HLorenzon ABeck A(2023)Searching for the Ideal Number of Threads on Asymmetric Multiprocessors2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC60926.2023.10324167(1-6)Online publication date: 21-Nov-2023
https://doi.org/10.1109/SBESC60926.2023.10324167
Kunas CRossi FLuizelli MCalheiros RNavaux PLorenzon A(2023)NeurOPar, A Neural Network-Driven EDP Optimization Strategy for Parallel Workloads2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00026(170-180)Online publication date: 17-Oct-2023
https://doi.org/10.1109/SBAC-PAD59825.2023.00026
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents