research-article

Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators

Authors:
Jeff Zhang

New York University

New York University
View Profile

,
Kartheek Rangineni

IIT Kanpur

IIT Kanpur
View Profile

,
Zahra Ghodsi

New York University

New York University
View Profile

,
Siddharth Garg

New York University

New York University
View Profile

DAC '18: Proceedings of the 55th Annual Design Automation ConferenceJune 2018Article No.: 19Pages 1–6https://doi.org/10.1145/3195970.3196129

Published:24 June 2018Publication History

DAC '18: Proceedings of the 55th Annual Design Automation Conference

Pages 1–6

ABSTRACT

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-performance DNN accelerators without compromising classification accuracy even in the presence of high timing error rates. Using post-synthesis timing simulations of a DNN accelerator modeled on the Google TPU, we show that Thundervolt enables between 34%-57% energy savings on state-of-the-art speech and image recognition benchmarks with less than 1% loss in classification accuracy and no performance loss. Further, we show that Thundervolt is synergistic with and can further increase the energy efficiency of commonly used run-time DNN pruning techniques like Zero-Skip.

References

Jorge Albericio et al. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In Processdings of ACM/IEEE ISCA. 1--13. Google ScholarDigital Library
Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In MICRO. 1--12. Google ScholarDigital Library
Arash Ardakani et al. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE VLSI (2017).Google Scholar
Jimmy Ba et al. 2014. Do deep nets really need to be deep?. In NIPS. 2654--2662. Google ScholarDigital Library
Ana Margarida Cachopo et al. 2007. Improving methods for single-label text categorization. Instituto Superior Técnico, Portugal (2007).Google Scholar
Yu-Hsin Chen et al. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE JSSC (2017), 127--138.Google Scholar
Mihir Choudhury et al. 2010. TIMBER: Time borrowing and error relaying for online timing error resilience. In Proceedings of IEEE DATE. 1554--1559. Google ScholarDigital Library
Shidhartha Das et al. 2009. RazorII: In situ error detection and correction for PVT and SER tolerance. IEEE JSCC (2009), 32--48.Google Scholar
Jia Deng et al. 2009. Imagenet: A large-scale hierarchical image database. In IEEE CVPR. 248--255.Google Scholar
Dan Ernst et al. 2004. Razor: circuit-level correction of timing errors for low-power operation. IEEE Micro (2004), 10--20. Google ScholarDigital Library
Matthew Fojtik et al. 2012. Bubble Razor: An architecture-independent approach to timing-error detection and correction. In IEEE ISSCC. 488--490.Google Scholar
Brian Greskamp et al. 2009. Blueshift: Designing processors for timing speculation from the ground up.. In IEEE ISCA. 213--224.Google Scholar
Suyog Gupta et al. 2015. Deep learning with limited numerical precision. In Proceedings of ICML. 1737--1746. Google ScholarDigital Library
Song Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
Geoffrey Hinton et al. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).Google Scholar
Xun Jiao et al. 2017. An Assessment of Vulnerability of Hardware Neural Networks to Dynamic Voltage and Temperature Variations. In Proceedings of ICCAD. Google ScholarDigital Library
Norman Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th ISCA. 1--12. Google ScholarDigital Library
Alex Krizhevsky et al. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).Google Scholar
HT Kung. 1982. Why systolic architectures? IEEE computer (1982), 37--46. Google ScholarDigital Library
Seogoo Lee et al. 2017. High-level Synthesis of Approximate Hardware Under Joint Precision and Voltage Scaling. In Proceedings of DATE. Google ScholarDigital Library
Yuxi Liu et al. 2011. Re-synthesis for cost-efficient circuit-level timing speculation. In Proceedings of ACM DAC. 158--163. Google ScholarDigital Library
Farzane Nakhaee et al. 2017. Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications. Integration, the VLSI Journal (2017). Google ScholarDigital Library
Angshuman Parashar et al. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. SIGARCH Comput. Archit. News (2017). Google ScholarDigital Library
Atul Rahman et al. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In Proceedings of DATE. 1393--1398. Google ScholarDigital Library
Brandon Reagen et al. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In Proceedings of ACM/IEEE ISCA. 267--278. Google ScholarDigital Library
Ao Ren et al. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. In Proceedings of ACM ASPLOS. 405--418. Google ScholarDigital Library
Jonathan Ross et al. 2016. Neural Network Processor. (2016).Google Scholar
Syed Shakib Sarwar et al. 2016. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In Proceedings of DATE. 145--150. Google ScholarDigital Library
Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks (2015). Google ScholarDigital Library
Li Wan et al. 2013. Regularization of neural networks using dropconnect. In Proceedings of ICML. 1058--1066. Google ScholarDigital Library
Xuechao Wei et al. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of IEEE DAC. Google ScholarDigital Library
Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In NIPS. 2074--2082. Google ScholarDigital Library
Paul N Whatmough et al. 2013. Circuit-level timing error tolerance for low-power DSP filters and transforms. IEEE VLSI (2013), 989--999. Google ScholarDigital Library
Paul N Whatmough et al. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with > 0.1 timing error rate tolerance for IoT applications. In Processings of IEEE ISSCC. 242--243.Google Scholar
Atif Yasin et al. 2016. Synergistic timing speculation for multi-threaded programs. In Proceedings of DAC. 51--56. Google ScholarDigital Library
Jiecao Yu et al. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of ACM ISCA. 548--560. Google ScholarDigital Library
Jeff Zhang et al. 2017. BandiTS: dynamic timing speculation using multi-armed bandit based optimization. In Proceedings of DATE. 922--925. Google ScholarDigital Library
Jeff Zhang et al. 2018. Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator. In Proceedings of VTS.Google Scholar

Recommendations

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)
Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-...
Read More
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and Simulation

High performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAC '18: Proceedings of the 55th Annual Design Automation Conference
June 2018
1089 pages
ISBN:9781450357005
DOI:10.1145/3195970

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,770of5,499submissions,32%
Upcoming Conference
DAC '24

Sponsor:

sigda

61st ACM/IEEE Design Automation Conference

June 23 - 27, 2024

San Francisco , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 45
  Total Citations
  View Citations
- 874
  Total Downloads
- Downloads (Last 12 months)93
- Downloads (Last 6 weeks)12
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators

DAC '18: Proceedings of the 55th Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators

Evaluation of Rodinia Codes on Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators

DAC '18: Proceedings of the 55th Annual Design Automation Conference

ABSTRACT

References

Cited By

Recommendations

ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators

Evaluation of Rodinia Codes on Intel Xeon Phi

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media