ABSTRACT
The world's appetite for abundant-data computing, where a massive amount of structured and unstructured data is analyzed, has increased dramatically. The computational demands of these applications, such as deep learning, far exceed the capabilities of today's systems, especially for energy-constrained embedded systems (e.g., mobile systems with limited battery capacity). These demands are unlikely to be met by isolated improvements in transistor or memory technologies, or integrated circuit (IC) architectures alone. Transformative nanosystems, which leverage the unique properties of emerging nanotechnologies to create new IC architectures, are required to deliver unprecedented functionality, performance, and energy efficiency. We show that the projected energy efficiency benefits of domain-specific 3D nanosystems is in the range of 1,000x (quantified using the product of system-level energy consumption and execution time) over today's domain-specific 2D systems with off-chip DRAM. Such a drastic improvement is key to enabling new capabilities such as deep learning in embedded systems.
- M.M.S. Aly et al., "Energy-Efficient Abundant-Data Computing: The N3XT 1,000X," IEEE Computer, 2015.Google Scholar
- J. Zhang et al., "Carbon Nanotube Robust Digital VLSI," IEEE Trans. CAD, 2012.Google Scholar
- H.Y. Chen et al., "HfOx based vertical resistive random-access memory for cost-effective 3D cross-point architecture without cell selector," IEDM, 2012. Google ScholarCross Ref
- D.J. Frank and L. Chang, "Technology Optimization for High Energy-Efficiency Computation," IEDM Short Course, 2012.Google Scholar
- G. Hills, "Variation-Aware Nanosystem Design Kit", https://nanohub.org/resources/22582Google Scholar
- G. Hills et al., "Rapid Co-optimization of Processing and Circuit Design to Overcome Carbon Nanotube Variations," IEEE Trans. CAD, 2015.Google Scholar
- M.M. Shulaker et al., "Carbon nanotube computer," Nature, 2013. Google ScholarCross Ref
- H.-S.P. Wong and S. Salahuddin, "Memory Leads the way to better computing," Nature, 2015. Google ScholarCross Ref
- R. Fackenthal et al., "A 16Gb ReRAM with 200MB/s Write and 1GB/s Read in 27nm Technology," ISSCC, 2014. Google ScholarCross Ref
- M.M. Shulaker et al., "Three-dimensional integration of nanotechnologies for computing and data storage on a single chip," Nature, 2017. Google ScholarCross Ref
- R. Braojos et al., "Nano-engineered architectures for ultra-low power wireless body sensor nodes," CODES+ISSS, 2016.Google Scholar
- N. Jouppi et al. "In-Datacenter Performance Analysis of a Tensor Processing Unit," ISCA, 2017. Google ScholarDigital Library
- M. Gao et al., "TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory," ASPLOS, 2017. Google ScholarDigital Library
- Y.-H. Chen et al., "Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks," IEEE JSSCC, 2017.Google Scholar
- C. De Sa et al., "Understanding and Optimizing Asynchronous Low-precision Stochastic Gradient Descent," ISCA, 2017.Google Scholar
- D. Sanchez et al., "ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems," ISCA, 2013. Google ScholarDigital Library
- V. Sze et al., "Efficient Processing of Deep Neural Networks:A Tutorial and Survey," arXiv preprint, 2017.Google Scholar
- A. Sridhar et al., "3D-ICE: A Compact Thermal Model for Early-Stage Design of Liquid-Cooled ICs," IEEE Trans. Computers, 2014.Google Scholar
- V. Chiriac et al., "A figure of merit for mobile device thermal management," IEEE ITherm, 2016.Google Scholar
- O. Vinyals et al., "Show and Tell: A Neural Image Caption Generator," IEEE CVPR, 2015.Google Scholar
- R. Jozefowicz et al., "Exploring the Limits of Language Modeling," arXiv preprint, 2016.Google Scholar
- A. Krizhevsky et al., "ImageNet Classification with Deep Convolution Neural Networks," NIPS, 2012.Google Scholar
- K. Simoyan et al., "Very Deep Convolutional Networks for Large-Scale Image Recognition," ICLR, 2015.Google Scholar
- K. He et al., "Deep Residual Learning for Image Recognition," IEEE CVPR, 2016. Google ScholarCross Ref
Recommendations
Improving Performance under Process and Voltage Variations in Near-Threshold Computing Using 3D ICs
Near-threshold computing (NTC) circuits have been shown to offer significant energy efficiency and power benefits but with a huge performance penalty. This performance loss exacerbates if process and voltage variations are considered. In this article, ...
Application of high-κ gate dielectrics and metal gate electrodes to enable silicon and non-silicon logic nanotechnology
High- gate dielectrics and metal gate electrodes are required for enabling continued equivalent gate oxide thickness scaling, and hence high performance, and for controlling gate oxide leakage for both future silicon and emerging non-silicon ...
Embedded Tutorial: Analog Circuit Performance Issues with Aggressively Scaled Gate Oxide CMOS Technologies
VLSID '06: Proceedings of the 19th International Conference on VLSI Design held jointly with 5th International Conference on Embedded Systems DesignMOS Transistors with sub 100 nm channel lengths need a gate oxide thickness in the range of 1 - 2 nm to combat the short channel effects. However at these gate dielectric thicknesses, the gate current is no longer negligible. In this paper, we report ...
Comments