ABSTRACT
Energy consumption is an important concern in modern multicore processors. The energy consumed during the execution of an application can be minimized by tuning the hardware state utilizing knobs such as frequency, voltage etc. The existing theoretical work on energy minimization using Global DVFS (Dynamic Voltage and Frequency Scaling), despite being thorough, ignores the energy consumed by the CPU on memory accesses and the dynamic energy consumed by the idle cores. This article presents an analytical energy-performance model for parallel workloads that accounts for the energy consumed by the CPU chip on memory accesses in addition to the energy consumed on CPU instructions. In addition, the model we present also accounts for the dynamic energy consumed by the idle cores. We present an analytical framework around our energy-performance model to predict the operating frequencies for global DVFS that minimize the overall CPU energy consumption. We show how the optimal frequencies in our model differ from the optimal frequencies in a model that does not account for memory accesses.
- A. Benoit, P. Renaud-Goud, and Y. Robert. Models and complexity results for performance and energy optimization of concurrent streaming applications. International Journal of High Performance Computing Applications, 25(3):261--273, 2011. Google ScholarDigital Library
- S. Cho and R. Melhem. On the interplay of parallelization, program performance, and energy consumption. Parallel and Distributed Systems, IEEE Transactions on, 21(3):342--353, March 2010. Google ScholarDigital Library
- J. L. H. Marco E.T. Gerards and J. Kuper. On the interplay between global dvfs and scheduling tasks with precedence constraints. IEEE TRANSACTIONS ON COMPUTERS, 64(06), 2015.Google Scholar
Index Terms
Brief Announcement: Energy Optimization of Memory Intensive Parallel Workloads
Recommendations
Evaluation of Knight Landing High Bandwidth Memory for HPC Workloads
IA3'17: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and AlgorithmsThe Intel Knight Landing (KNL) manycore chip includes 3D-stacked memory named MCDRAM, also known as High Bandwidth Memory (HBM) for parallel applications that needs to scale to high thread count. In this paper, we provide a quantitative study of the KNL ...
Brief Announcement: MIC++: Accelerating Maximal Information Coefficient Calculation with GPUs and FPGAs
SPAA '16: Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and ArchitecturesTo discover relationships and associations between pairs of variables in large data sets have become one of the most significant challenges for bioinformatics scientists. To tackle this problem, maximal information coefficient (MIC) is widely applied as ...
A Tour into Ambient Energy Resources and Battery Optimization
ICSIP '14: Proceedings of the 2014 Fifth International Conference on Signal and Image ProcessingModern mobile devices incorporate rich collection of sensing and communication capabilities allowing the design of diverse range of interactive context aware applications. Intensive use of these resources comes at a cost, typically in the form of ...
Comments