skip to main content
10.1145/3195970.3196129acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators

Published:24 June 2018Publication History

ABSTRACT

Hardware accelerators are being increasingly deployed to boost the performance and energy efficiency of deep neural network (DNN) inference. In this paper we propose Thundervolt, a new framework that enables aggressive voltage underscaling of high-performance DNN accelerators without compromising classification accuracy even in the presence of high timing error rates. Using post-synthesis timing simulations of a DNN accelerator modeled on the Google TPU, we show that Thundervolt enables between 34%-57% energy savings on state-of-the-art speech and image recognition benchmarks with less than 1% loss in classification accuracy and no performance loss. Further, we show that Thundervolt is synergistic with and can further increase the energy efficiency of commonly used run-time DNN pruning techniques like Zero-Skip.

References

  1. Jorge Albericio et al. 2016. Cnvlutin: ineffectual-neuron-free deep neural network computing. In Processdings of ACM/IEEE ISCA. 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Manoj Alwani et al. 2016. Fused-layer CNN accelerators. In MICRO. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arash Ardakani et al. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE VLSI (2017).Google ScholarGoogle Scholar
  4. Jimmy Ba et al. 2014. Do deep nets really need to be deep?. In NIPS. 2654--2662. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Ana Margarida Cachopo et al. 2007. Improving methods for single-label text categorization. Instituto Superior Técnico, Portugal (2007).Google ScholarGoogle Scholar
  6. Yu-Hsin Chen et al. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE JSSC (2017), 127--138.Google ScholarGoogle Scholar
  7. Mihir Choudhury et al. 2010. TIMBER: Time borrowing and error relaying for online timing error resilience. In Proceedings of IEEE DATE. 1554--1559. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shidhartha Das et al. 2009. RazorII: In situ error detection and correction for PVT and SER tolerance. IEEE JSCC (2009), 32--48.Google ScholarGoogle Scholar
  9. Jia Deng et al. 2009. Imagenet: A large-scale hierarchical image database. In IEEE CVPR. 248--255.Google ScholarGoogle Scholar
  10. Dan Ernst et al. 2004. Razor: circuit-level correction of timing errors for low-power operation. IEEE Micro (2004), 10--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Matthew Fojtik et al. 2012. Bubble Razor: An architecture-independent approach to timing-error detection and correction. In IEEE ISSCC. 488--490.Google ScholarGoogle Scholar
  12. Brian Greskamp et al. 2009. Blueshift: Designing processors for timing speculation from the ground up.. In IEEE ISCA. 213--224.Google ScholarGoogle Scholar
  13. Suyog Gupta et al. 2015. Deep learning with limited numerical precision. In Proceedings of ICML. 1737--1746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Song Han et al. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google ScholarGoogle Scholar
  15. Geoffrey Hinton et al. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).Google ScholarGoogle Scholar
  16. Xun Jiao et al. 2017. An Assessment of Vulnerability of Hardware Neural Networks to Dynamic Voltage and Temperature Variations. In Proceedings of ICCAD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Norman Jouppi et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th ISCA. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alex Krizhevsky et al. 2014. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014).Google ScholarGoogle Scholar
  19. HT Kung. 1982. Why systolic architectures? IEEE computer (1982), 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Seogoo Lee et al. 2017. High-level Synthesis of Approximate Hardware Under Joint Precision and Voltage Scaling. In Proceedings of DATE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Yuxi Liu et al. 2011. Re-synthesis for cost-efficient circuit-level timing speculation. In Proceedings of ACM DAC. 158--163. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Farzane Nakhaee et al. 2017. Lifetime improvement by exploiting aggressive voltage scaling during runtime of error-resilient applications. Integration, the VLSI Journal (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Angshuman Parashar et al. 2017. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. SIGARCH Comput. Archit. News (2017). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Atul Rahman et al. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In Proceedings of DATE. 1393--1398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Brandon Reagen et al. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In Proceedings of ACM/IEEE ISCA. 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ao Ren et al. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. In Proceedings of ACM ASPLOS. 405--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jonathan Ross et al. 2016. Neural Network Processor. (2016).Google ScholarGoogle Scholar
  28. Syed Shakib Sarwar et al. 2016. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In Proceedings of DATE. 145--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Jürgen Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural networks (2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Li Wan et al. 2013. Regularization of neural networks using dropconnect. In Proceedings of ICML. 1058--1066. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xuechao Wei et al. 2017. Automated Systolic Array Architecture Synthesis for High Throughput CNN Inference on FPGAs. In Proceedings of IEEE DAC. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wei Wen et al. 2016. Learning structured sparsity in deep neural networks. In NIPS. 2074--2082. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Paul N Whatmough et al. 2013. Circuit-level timing error tolerance for low-power DSP filters and transforms. IEEE VLSI (2013), 989--999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Paul N Whatmough et al. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with > 0.1 timing error rate tolerance for IoT applications. In Processings of IEEE ISSCC. 242--243.Google ScholarGoogle Scholar
  35. Atif Yasin et al. 2016. Synergistic timing speculation for multi-threaded programs. In Proceedings of DAC. 51--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Jiecao Yu et al. 2017. Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism. In Proceedings of ACM ISCA. 548--560. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Jeff Zhang et al. 2017. BandiTS: dynamic timing speculation using multi-armed bandit based optimization. In Proceedings of DATE. 922--925. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Jeff Zhang et al. 2018. Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator. In Proceedings of VTS.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    DAC '18: Proceedings of the 55th Annual Design Automation Conference
    June 2018
    1089 pages
    ISBN:9781450357005
    DOI:10.1145/3195970

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 24 June 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate1,770of5,499submissions,32%

    Upcoming Conference

    DAC '24
    61st ACM/IEEE Design Automation Conference
    June 23 - 27, 2024
    San Francisco , CA , USA

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader