skip to main content
research-article
Open access

A Reconfiguration Algorithm for Power-Aware Parallel Applications

Published: 02 December 2016 Publication History

Abstract

In current computing systems, many applications require guarantees on their maximum power consumption to not exceed the available power budget. On the other hand, for some applications, it could be possible to decrease their performance, yet maintain an acceptable level, in order to reduce their power consumption. To provide such guarantees, a possible solution consists in changing the number of cores assigned to the application, their clock frequency, and the placement of application threads over the cores. However, power consumption and performance have different trends depending on the application considered and on its input. Finding a configuration of resources satisfying user requirements is, in the general case, a challenging task.
In this article, we propose Nornir, an algorithm to automatically derive, without relying on historical data about previous executions, performance and power consumption models of an application in different configurations. By using these models, we are able to select a close-to-optimal configuration for the given user requirement, either performance or power consumption. The configuration of the application will be changed on-the-fly throughout the execution to adapt to workload fluctuations, external interferences, and/or application’s phase changes. We validate the algorithm by simulating it over the applications of the Parsec benchmark suit. Then, we implement our algorithm and we analyse its accuracy and overhead over some of these applications on a real execution environment. Eventually, we compare the quality of our proposal with that of the optimal algorithm and of some state-of-the-art solutions.

References

[1]
Ferdinando Alessi, Peter Thoman, Giorgis Georgakoudis, Thomas Fahringer, and Dimitrios S. Nikolopoulos. 2015. OpenMP: Heterogenous Execution and Data Movements 11th Intl. Workshop on OpenMP (IWOMP’15). Springer, Chapter Application-Level Energy Awareness for OpenMP, 219--232.
[2]
Pedro Alonso, Manuel F. Dolz, Rafael Mayo, and Enrique S. Quintana-Ort. 2014. Modeling power and energy of the task-parallel Cholesky factorization on multicore processors. Computer Science - Research and Development 29, 2 (2014), 105--112.
[3]
Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia. 2010. A view of cloud computing. Communications of the ACM 53, 4 (April 2010), 50--58.
[4]
Arka A. Bhattacharya, David Culler, Aman Kansal, Sriram Govindan, and Sriram Sankar. 2012. The need for speed and stability in data center power capping. In Proc. of IGCC 2012. IEEE Computer Society, 1--10.
[5]
A. P. Chandrakasan and R. W. Brodersen. 1995. Minimizing power consumption in digital CMOS circuits. Proc. of the IEEE 83, 4 (April 1995), 498--523.
[6]
Hao Chen, Can Hankendi, Michael C. Caramanis, and Ayse K. Coskun. 2013. Dynamic server power capping for enabling data center participation in power markets. In Proc. of the Intl. Conf. on Computer-Aided Design (ICCAD’13). IEEE, Piscataway, NJ, 122--129.
[7]
Ryan Cochran, Can Hankendi, Ayse Coskun, and Sherief Reda. 2011a. Identifying the optimal energy-efficient operating points of parallel workloads. (Nov. 2011), 608--615.
[8]
Ryan Cochran, Can Hankendi, Ayse K. Coskun, and Sherief Reda. 2011b. Pack 8 cap: Adaptive DVFS and thread packing under power caps. In Proc. of the 44th Annual IEEE/ACM Intl. Symposium on Microarchitecture (MICRO-44’11). ACM Press, New York, New York, 175.
[9]
M. Curtis-Maury, F. Blagojevic, C. D. Antonopoulos, and D. S. Nikolopoulos. 2008a. Prediction-based power-performance adaptation of multithreaded scientific codes. IEEE Transactions on Parallel and Distributed Systems 19, 10 (Oct. 2008), 1396--1410.
[10]
Matthew Curtis-Maury, Ankur Shah, Filip Blagojevic, Dimitrios S. Nikolopoulos, Bronis R. de Supinski, and Martin Schulz. 2008b. Prediction models for multi-dimensional power-performance optimization on many cores. In Proc. of the 17th Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY, 250--259.
[11]
M. Danelutto, D. De Sensi, and M. Torquati. 2015. Energy driven adaptivity in stream parallel computations. In Proc. of the 2015 23rd Intl. Conf. on Parallel, Distributed and Network-Based Processing (PDP’15). 103--110.
[12]
Marco Danelutto and Massimo Torquati. 2015. Structured parallel programming with “core” FastFlow. In Central European Functional Programming School. LNCS, Vol. 8606. Springer, 29--75.
[13]
Howard David, Chris Fallin, Eugene Gorbatov, Ulf R. Hanebutte, and Onur Mutlu. 2011. Memory power management via dynamic voltage/frequency scaling. In Proc. of the 8th ACM Intl. Conf. on Autonomic Computing (ICAC’11). ACM, New York, NY, 31--40.
[14]
Daniele De Sensi. 2016. Predicting performance and power consumption of parallel applications. In Proc. of the 24th Euromicro Intl. Conf. on Parallel, Distributed, and Network-Based Processing (PDP’16).
[15]
Christina Delimitrou and Christos Kozyrakis. 2013. Paragon: QoS-aware scheduling for heterogeneous datacenters. SIGARCH Comput. Archit. News 41, 1 (March 2013), 77--88.
[16]
Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. SIGARCH Comput. Archit. News 42, 1 (Feb. 2014), 127--144.
[17]
Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. 2008. A helper thread based EDP reduction scheme for adapting application execution in CMPs. In Proc. of the IEEE Intl. Symp. on Parallel and Distributed Processing. 1--14.
[18]
Pedro Domingos. 2012. A few useful things to know about machine learning. Commun. ACM 55, 10 (Oct. 2012), 78--87.
[19]
Antonio Filieri, Henry Hoffmann, and Martina Maggio. 2014. Automated design of self-adaptive software with control-theoretical formal guarantees. In Proc. of the 36th Intl. Conf. on Software Engineering (ICSE’14). ACM, New York, NY, 299--310.
[20]
Anshul Gandhi, Mor Harchol-Balter, Rajarshi Das, Jeffrey Kephart, and Charles Lefurgy. 2009. Power capping via forced idleness. In Proc. of Workshop on Energy-Efficient Design (WEED 09) Austin, Texas.
[21]
Larry D. Gray, Anil Kumar, and Harry H. Li. 2008. Workload Characterization of the SPECpower_ssj2008 Benchmark. Springer, Berlin, Germany, 262--282.
[22]
Neil J. Gunther. 2006. Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services. Springer-Verlag New York, Inc., Secaucus, NJ.
[23]
Marcus Hähnel, Björn Döbel, Marcus Völp, and Hermann Härtig. 2012. Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform. Eval. Rev. 40, 3 (Jan. 2012), 13--17.
[24]
Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, and Martin Rinard. 2011. Dynamic knobs for responsive power-aware computing. SIGPLAN Not. 46, 3 (2011), 199--212.
[25]
Chung-hsing Hsu and Wu-chun Feng. 2005. A power-aware run-time system for high-performance computing. In Proc. of the ACM/IEEE SC 2005 Conf. 1--1.
[26]
N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan. 2003. Leakage current: Moore’s law meets static power. Computer 36, 12 (Dec. 2003), 68--75.
[27]
Sang-Jeong Lee, Hae-Kag Lee, and Pen-Chung Yew. 2007. Runtime performance projection model for dynamic power management. In Proc. of the 12th Asia-Pacific Conf. on Advances in Computer Systems Architecture (ACSAC’07). Springer-Verlag, Berlin, Germany, 186--197.
[28]
Charles Lefurgy, Xiaorui Wang, and Malcolm Ware. 2008. Power capping: A prelude to power shifting. Cluster Computing 11, 2 (June 2008), 183--195.
[29]
Jian Li and José F. Martínez. 2006. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In Proc. of Intl. Symposium on High-Performance Computer Architecture (2006), 77--87.
[30]
M. Maggio, H. Hoffmann, M. D. Santambrogio, A. Agarwal, and A. Leva. 2010. Controlling software applications via resource allocation within the heartbeats framework. In Proc. of the 2010 49th IEEE Conf. on Decision and Control (CDC). 3736--3741.
[31]
Aniruddha Marathe, Peter E. Bailey, David K. Lowenthal, Barry Rountree, Martin Schulz, and Bronis R. de Supinski. 2015. A run-time system for power-constrained HPC applications. In Proc. of the 30th Intl. Conf. on High Performance Computing. 394--408.
[32]
Nikita Mishra, Huazhe Zhang, John D. Lafferty, and Henry Hoffmann. 2015. A probabilistic graphical model-based approach for minimizing energy under performance constraints. ACM SIGARCH Computer Architecture News 43, 1 (March 2015), 267--281.
[33]
Douglas C. Montgomery and Elizabeth Peck. Introduction to Linear Regression Analysis. John Wiley 8 Sons.
[34]
Priya Nagpurkar, Chandra Krintz, Michael Hind, Peter F. Sweeney, and V. T. Rajan. 2006. Online phase detection algorithms. In Proc. of the Intl. Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Washington, D.C., 111--123.
[35]
Paula Petrica, Adam M. Izraelevitz, David H. Albonesi, and Christine A. Shoemaker. 2013. Flicker: A dynamically adaptive architecture for power limited multicore systems. ACM SIGARCH Computer Architecture News 41, 3 (July 2013), 13.
[36]
Allan K. Porterfield, Stephen L. Olivier, Sridutt Bhalachandra, and Jan F. Prins. 2013. Power measurement and concurrency throttling for energy reduction in OpenMP programs. In Proc. of IPDPSW 2013. IEEE, 884--891.
[37]
Kishore Kumar Pusukuri, Rajiv Gupta, and Laxmi N. Bhuyan. 2011. Thread reinforcer: Dynamically determining number of threads via OS level monitoring. In Proc. of the 2011 IEEE Intl. Symposium on Workload Characterization (IISWC’11). IEEE Computer Society, Washington, D.C., 116--125.
[38]
Haris Ribic and Yu David Liu. 2016. AEQUITAS: Coordinated energy management across parallel applications. In Proc. of ICS 2016 (ICS’16). ACM, New York, NY, Article 4, 12 pages.
[39]
Barry Rountree, Dong H. Ahn, Bronis R. de Supinski, David K. Lowenthal, and Martin Schulz. 2012. Beyond DVFS: A first look at performance under a hardware-enforced power bound. In Proc. of the 2012 IEEE 26th Intl. Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’12). IEEE Computer Society, Washington, D.C., 947--953.
[40]
A. Sembrant, D. Black-Schaffer, and E. Hagersten. 2012. Phase behavior in serial and parallel applications. In Proc. of the 2012 IEEE Intl. Symposium on Workload Characterization (IISWC). 47--58.
[41]
A. Sembrant, D. Eklov, and E. Hagersten. 2011. Efficient software-based online phase classification. In Proc. of the 2011 IEEE Intl. Symposium on Workload Characterization (IISWC). 104--115.
[42]
Rishad A. Shafik, Anup Das, Sheng Yang, Geoff Merrett, and Bashir M. Al-Hashimi. 2015. Adaptive energy minimization of OpenMP parallel applications on many-core systems. In Proc. of the 6th PARMA-DITAM workshop’15. ACM Press, New York, New York, 19--24.
[43]
Srinath Sridharan, Gagan Gupta, and Gurindar S. Sohi. 2013. Holistic run-time parallelism management for time and energy efficiency. In Proc. of the 27th Intl. ACM Conf. on Supercomputing - ICS’13. ACM Press, New York, New York, 337.
[44]
M. Aater Suleman, Moinuddin K. Qureshi, and Yale N. Patt. 2008. Feedback-driven threading. In Proc. of the 13th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems - ASPLOS XIII, Vol. 42. ACM Press, New York, New York, 277.
[45]
Alexander Thomasian and Paul F. Bay. 1986. Analytic queueing network models for parallel processing of task systems. IEEE Transactions on Computers, 100, 12 (1986), 1045--1054.
[46]
Ehsan Totoni, Nikhil Jain, and Laxmikant V. Kalé. 2015. Power management of extreme-scale networks with on/off links in runtime systems. TOPC 1, 2 (2015), 16.
[47]
Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2015. A programming model and runtime system for significance-aware energy-efficient computing. SIGPLAN Not. 50, 8 (2015), 275--276.
[48]
Wei Wang, A. Porterfield, J. Cavazos, and S. Bhalachandra. 2015. Using per-loop CPU clock modulation for energy efficiency in OpenMP applications. In Proc. of ICPP 2015. 629--638.
[49]
Fen Xie, Margaret Martonosi, and Sharad Malik. 2005. Efficient behavior-driven runtime dynamic voltage scaling policies. In Proc. of the 3rd IEEE/ACM/IFIP Intl. Conf. on Hardware/Software Codesign and System Synthesis (CODES+ISSS’05). ACM, New York, NY, 105--110.
[50]
Albert Y. Zomaya and Young Choon Lee. 2012. Energy Efficient Distributed Computing Systems (1st ed.). Wiley-IEEE Computer Society Pr.

Cited By

View all
  • (2024)An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC SystemsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649185(138-146)Online publication date: 7-May-2024
  • (2024)Synergistically Rebalancing the EDP of Container-Based Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335735335:3(484-498)Online publication date: Mar-2024
  • (2024)Evaluating Tuning Opportunities of the LLVM/OpenMP RuntimeProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00131(919-929)Online publication date: 17-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 13, Issue 4
December 2016
648 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/3012405
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 December 2016
Accepted: 01 September 2016
Revised: 01 August 2016
Received: 01 June 2016
Published in TACO Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DVFS
  2. Power-aware computing
  3. dynamic concurrency throttling
  4. multi-core
  5. online learning
  6. power capping
  7. self-adaptive runtime

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • EU FP7-ICT-2013-10 project REPARA
  • University of Pisa Project PRA_2016_64
  • EU H2020-ICT-2014-1 project REPHRASE

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)126
  • Downloads (Last 6 weeks)9
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)An ANN-Guided Multi-Objective Framework for Power-Performance Balancing in HPC SystemsProceedings of the 21st ACM International Conference on Computing Frontiers10.1145/3649153.3649185(138-146)Online publication date: 7-May-2024
  • (2024)Synergistically Rebalancing the EDP of Container-Based Parallel ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.335735335:3(484-498)Online publication date: Mar-2024
  • (2024)Evaluating Tuning Opportunities of the LLVM/OpenMP RuntimeProceedings of the SC '24 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis10.1109/SCW63240.2024.00131(919-929)Online publication date: 17-Nov-2024
  • (2024)Efficient Thread Tuning for Asymmetric Multicores2024 37th SBC/SBMicro/IEEE Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI62366.2024.10703981(1-5)Online publication date: 2-Sep-2024
  • (2024)Integration Framework for Online Thread Throttling with Thread and Page Mapping on NUMA Systems2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00202(1189-1192)Online publication date: 27-May-2024
  • (2024)A neural network framework for optimizing parallel computing in cloud serversJournal of Systems Architecture10.1016/j.sysarc.2024.103131150(103131)Online publication date: May-2024
  • (2024)Energy efficient power cap configurations through Pareto front analysis and machine learning categorizationCluster Computing10.1007/s10586-023-04151-227:3(3433-3449)Online publication date: 1-Jun-2024
  • (2023)Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing ConstraintsACM Transactions on Architecture and Code Optimization10.1145/360521420:3(1-25)Online publication date: 22-Jul-2023
  • (2023)Searching for the Ideal Number of Threads on Asymmetric Multiprocessors2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC)10.1109/SBESC60926.2023.10324167(1-6)Online publication date: 21-Nov-2023
  • (2023)NeurOPar, A Neural Network-Driven EDP Optimization Strategy for Parallel Workloads2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00026(170-180)Online publication date: 17-Oct-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media