research-article

GPUWattch: enabling energy optimizations in GPGPUs

Authors:
Jingwen Leng

The University of Texas at Austin

The University of Texas at Austin
View Profile

,
Tayler Hetherington

University of British Columbia

University of British Columbia
View Profile

,
Ahmed ElTantawy

University of British Columbia

University of British Columbia
View Profile

,
Syed Gilani

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Nam Sung Kim

University of Wisconsin-Madison

University of Wisconsin-Madison
View Profile

,
Tor M. Aamodt

University of British Columbia

University of British Columbia
View Profile

,
Vijay Janapa Reddi

The University of Texas at Austin

The University of Texas at Austin
View Profile

Authors Info & Claims

ACM SIGARCH Computer Architecture News Volume 41 Issue 3June 2013pp 487–498https://doi.org/10.1145/2508148.2485964

Published:23 June 2013Publication History

ACM SIGARCH Computer Architecture News

Abstract

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly explore new ways to optimize GPGPUs for energy efficiency. We propose a new GPGPU power model that is configurable, capable of cycle-level calculations, and carefully validated against real hardware measurements. To achieve configurability, we use a bottom-up methodology and abstract parameters from the microarchitectural components as the model's inputs. We developed a rigorous suite of 80 microbenchmarks that we use to bound any modeling uncertainties and inaccuracies. The power model is comprehensively validated against measurements of two commercially available GPUs, and the measured error is within 9.9% and 13.4% for the two target GPUs (GTX 480 and Quadro FX5600). The model also accurately tracks the power consumption trend over time. We integrated the power model with the cycle-level simulator GPGPU-Sim and demonstrate the energy savings by utilizing dynamic voltage and frequency scaling (DVFS) and clock gating. Traditional DVFS reduces GPU energy consumption by 14.4% by leveraging within-kernel runtime variations. More finer-grained SM cluster-level DVFS improves the energy savings from 6.6% to 13.6% for those benchmarks that show clustered execution behavior. We also show that clock gating inactive lanes during divergence reduces dynamic power by 11.2%.

References

MacSim, http://code.google.com/p/macsim.Google Scholar
Predictive technology model, http://ptm.asu.edu.Google Scholar
Synopsys Inc., Power Compiler, www.synopsys.com.Google Scholar
A. Bakhoda et al. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS, 2009.Google ScholarCross Ref
M. Bauer et al. CudaDMA: optimizing GPU memory bandwidth via warp specialization. In SC, 2011. Google ScholarDigital Library
D. Brooks et al. Wattch: a framework for architectural-level power analysis and optimizations. In ISCA, 2000. Google ScholarDigital Library
S. Che et al. Rodinia: A benchmark suite for heterogeneous computing. In IISWC, 2009. Google ScholarDigital Library
S. Collange et al. Power consumption of GPUs from a software perspective. In ICCS, 2009. Google ScholarDigital Library
W. J. Dally. Moving the needle, computer architecture research in academe and industry. In ISCA, 2010. Google ScholarDigital Library
J. M. V. Dyke et al. Graphics system with virtual memory pages and non-power of two number of memory elements, 2011.Google Scholar
W. Fung and T. Aamodt. Thread block compaction for efficient SIMT control flow. In HPCA, 2011. Google ScholarDigital Library
W. Fung et al. Dynamic warp formation and scheduling for efficient GPU control flow. In MICRO, 2007. Google ScholarDigital Library
S. Hong and H. Kim. An integrated GPU power and performance model. In ISCA, 2010. Google ScholarDigital Library
C. Isci et al. Live, runtime phase monitoring and prediction on real systems with application to dynamic power management. In MICRO, 2006. Google ScholarDigital Library
H. Jacobson et al. Stretching the limits of clock-gating efficiency in server-class processors. In HPCA, 2005. Google ScholarDigital Library
T. Kailath, A. Sayed, and B. Hassibi. Linear Estimation. Prentice Hall, 2000.Google Scholar
K. Kasichayanula et al. Power aware computing on GPUs. SAAHPC, 2012. Google ScholarDigital Library
S. Keckler. Life After Dennard and How I Learned to Love the Picojoule. In MICRO, 2012.Google Scholar
W. Kim et al. System level analysis of fast, per-core DVFS using on-chip switching regulators. In HPCA, 2008.Google Scholar
J. Lee et al. Improving throughput of power-constrained GPUs using dynamic voltage/frequency and core scaling. In PACT, 2011. Google ScholarDigital Library
H. Li et al. Deterministic clock gating for microprocessor power reduction. In HPCA, 2003. Google ScholarDigital Library
S. Li et al. McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures. In MICRO, 2009. Google ScholarDigital Library
E. Lindholm et al. NVIDIA Tesla: A unified graphics and computing architecture. Micro, IEEE, 2008. Google ScholarDigital Library
J. E. Lindholm et al. Simulating multiported memories using lower port count memories, 2008.Google Scholar
S. Liu et al. Operand collector architecture, 2010.Google Scholar
H. Nagasaka et al. Statistical power modeling of GPU kernels using performance counters. In Green Computing Conference, 2010. Google ScholarDigital Library
V. Narasiman et al. Improving GPU performance via large warps and two-level warp scheduling. In MICRO, 2011. Google ScholarDigital Library
NVIDIA. Fermi Compute Architecture Whitepaper, 2009.Google Scholar
NVIDIA. Compute Visual Profiler - User Guide, Version 4, 2011.Google Scholar
NVIDIA. NVIDIA CUDA C Programming Guide, 2012.Google Scholar
H.-J. Oh et al. A fully pipelined single-precision floating-point unit in the synergistic processor element of a CELL processor. JSSC, 2006.Google ScholarCross Ref
V. Sathish et al. Lossless and lossy memory-link compression techniques for improving performance of memory-bound GPGPU workloads. In PACT, 2012. Google ScholarDigital Library
S. Thoziyoor et al. A comprehensive memory modeling tool and its application to the design and analysis of future memory hierarchies. In ISCA, 2008. Google ScholarDigital Library
R. Ubal et al. Multi2Sim: A simulation framework for CPU-GPU computing. In PACT, 2012. Google ScholarDigital Library
T. Vogelsang. Understanding the energy consumption of dynamic random access memories. In MICRO, 2010. Google ScholarDigital Library
H. Wang and Q. Chen. Power estimating model and analysis of general programming on GPU. Journal of Software, 2012.Google Scholar
Q. Wu et al. A dynamic compilation framework for controlling microprocessor energy and performance. In MICRO, 2005. Google ScholarDigital Library
Y. Zhang et al. Performance and power analysis of ATI GPU: A statistical approach. In NSA, 2011. Google ScholarDigital Library

Index Terms

GPUWattch: enabling energy optimizations in GPGPUs
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

An integrated GPU power and performance model
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is ...
Read More
GPUWattch: enabling energy optimizations in GPGPUs
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture

General-purpose GPUs (GPGPUs) are becoming prevalent in mainstream computing, and performance per watt has emerged as a more crucial evaluation metric than peak performance. As such, GPU architects require robust tools that will enable them to quickly ...
Read More
An integrated GPU power and performance model
ISCA '10

GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Performance optimization for multi-core processors has been a challenge for programmers. Furthermore, optimizing for power consumption is ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGARCH Computer Architecture News Volume 41, Issue 3
ICSA '13
June 2013
666 pages
ISSN:0163-5964
DOI:10.1145/2508148
Issue’s Table of Contents
ISCA '13: Proceedings of the 40th Annual International Symposium on Computer Architecture
June 2013
686 pages
ISBN:9781450320795
DOI:10.1145/2485922
General Chair:
Avi Mendelson
Technion
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 June 2013
Check for updates
Author Tags
CUDA
GPU architecture
energy
power
power estimation
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 482
  Total Citations
  View Citations
- 2,303
  Total Downloads
- Downloads (Last 12 months)259
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GPUWattch: enabling energy optimizations in GPGPUs

ACM SIGARCH Computer Architecture News

Abstract

References

Cited By

Index Terms

Recommendations

An integrated GPU power and performance model

GPUWattch: enabling energy optimizations in GPGPUs

An integrated GPU power and performance model