ABSTRACT
Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-performance DNN accelerators without compromising classification accuracy even in the presence of high timing error rates. Using post-synthesis timing simulations of a DNN accelerator modeled on the Google TPU, we show that Thundervolt enables between 34%-57% energy savings on state-of-the-art speech and image recognition benchmarks with less than 1% loss in classification accuracy and no performance loss. Further, we show that Thundervolt is synergistic with and can further increase the energy efficiency of commonly used run-time DNN pruning techniques like Zero-Skip.
- Jorge Albericio et al. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In Processdings of ACM/IEEE ISCA. 1--13. Google ScholarDigital Library
- Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In MICRO. 1--12. Google ScholarDigital Library
- Arash Ardakani et al. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE VLSI (2017).Google Scholar
- Jimmy Ba et al. 2014. Do deep nets really need to be deep?. In NIPS. 2654--2662. Google ScholarDigital Library
- Ana Margarida Cachopo et al. 2007. Improving methods for single-label text categorization. Instituto Superior Técnico, Portugal (2007).Google Scholar
- Yu-Hsin Chen et al. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE JSSC (2017), 127--138.Google Scholar
- Mihir Choudhury et al. 2010. TIMBER: Time borrowing and error relaying for online timing error resilience. In Proceedings of IEEE DATE. 1554--1559. Google ScholarDigital Library
- Shidhartha Das et al. 2009. RazorII: In situ error detection and correction for PVT and SER tolerance. IEEE JSCC (2009), 32--48.Google Scholar
- Jia Deng et al. 2009. Imagenet: A large-scale hierarchical image database. In IEEE CVPR. 248--255.Google Scholar
- Dan Ernst et al. 2004. Razor: circuit-level correction of timing errors for low-power operation. IEEE Micro (2004), 10--20. Google ScholarDigital Library
- Matthew Fojtik et al. 2012. Bubble Razor: An architecture-independent approach to timing-error detection and correction. In IEEE ISSCC. 488--490.Google Scholar
- Brian Greskamp et al. 2009. Blueshift: Designing processors for timing speculation from the ground up.. In IEEE ISCA. 213--224.Google Scholar
- Suyog Gupta et al. 2015. Deep learning with limited numerical precision. In Proceedings of ICML. 1737--1746. Google ScholarDigital Library
- Song Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
- Geoffrey Hinton et al. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).Google Scholar
- Xun Jiao et al. 2017. An Assessment of Vulnerability of Hardware Neural Networks to Dynamic Voltage and Temperature Variations. In Proceedings of ICCAD. Google ScholarDigital Library
- Norman Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th ISCA. 1--12. Google ScholarDigital Library
- Alex Krizhevsky et al. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).Google Scholar
- HT Kung. 1982. Why systolic architectures? IEEE computer (1982), 37--46. Google ScholarDigital Library
- Seogoo Lee et al. 2017. High-level Synthesis of Approximate Hardware Under Joint Precision and Voltage Scaling. In Proceedings of DATE. Google ScholarDigital Library
- Yuxi Liu et al. 2011. Re-synthesis for cost-efficient circuit-level timing speculation. In Proceedings of ACM DAC. 158--163. Google ScholarDigital Library
- Farzane Nakhaee et al. 2017. Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications. Integration, the VLSI Journal (2017). Google ScholarDigital Library
- Angshuman Parashar et al. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. SIGARCH Comput. Archit. News (2017). Google ScholarDigital Library
- Atul Rahman et al. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In Proceedings of DATE. 1393--1398. Google ScholarDigital Library
- Brandon Reagen et al. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In Proceedings of ACM/IEEE ISCA. 267--278. Google ScholarDigital Library
- Ao Ren et al. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. In Proceedings of ACM ASPLOS. 405--418. Google ScholarDigital Library
- Jonathan Ross et al. 2016. Neural Network Processor. (2016).Google Scholar
- Syed Shakib Sarwar et al. 2016. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In Proceedings of DATE. 145--150. Google ScholarDigital Library
- Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks (2015). Google ScholarDigital Library
- Li Wan et al. 2013. Regularization of neural networks using dropconnect. In Proceedings of ICML. 1058--1066. Google ScholarDigital Library
- Xuechao Wei et al. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of IEEE DAC. Google ScholarDigital Library
- Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In NIPS. 2074--2082. Google ScholarDigital Library
- Paul N Whatmough et al. 2013. Circuit-level timing error tolerance for low-power DSP filters and transforms. IEEE VLSI (2013), 989--999. Google ScholarDigital Library
- Paul N Whatmough et al. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with > 0.1 timing error rate tolerance for IoT applications. In Processings of IEEE ISSCC. 242--243.Google Scholar
- Atif Yasin et al. 2016. Synergistic timing speculation for multi-threaded programs. In Proceedings of DAC. 51--56. Google ScholarDigital Library
- Jiecao Yu et al. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of ACM ISCA. 548--560. Google ScholarDigital Library
- Jeff Zhang et al. 2017. BandiTS: dynamic timing speculation using multi-armed bandit based optimization. In Proceedings of DATE. 922--925. Google ScholarDigital Library
- Jeff Zhang et al. 2018. Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator. In Proceedings of VTS.Google Scholar
Recommendations
ThUnderVolt: Enabling Aggressive Voltage Underscaling and Timing Error Resilience for Energy Efficient Deep Learning Accelerators
2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-...
Evaluation of Rodinia Codes on Intel Xeon Phi
ISMS '13: Proceedings of the 2013 4th International Conference on Intelligent Systems, Modelling and SimulationHigh performance computing (HPC) is a niche area where various parallel benchmarks are constantly used to explore and evaluate the performance of Heterogeneous computing systems on the horizon. The Rodinia benchmark suite, a collection of parallel ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Comments