invited-talk

Across the Stack Opportunities for Deep Learning Acceleration

Authors:
Vijayalakshmi Srinivasan

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Bruce Fleischer

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Sunil Shukla

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Matthew Ziegler

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Joel Silberman

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Jinwook Oh

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Jungwook Choi

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Silvia Mueller

Boeblingen Germany

Boeblingen Germany
View Profile

,
Ankur Agrawal

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Tina Babinsky

Boeblingen Germany

Boeblingen Germany
View Profile

,
Nianzheng Cao

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Chia-Yu Chen

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Pierce Chuang

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Thomas Fox

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
George Gristede

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Michael Guillorn

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Howard Haynie

IBM Systems Group, Poughkeepsie, NY

IBM Systems Group, Poughkeepsie, NY
View Profile

,
Michael Klaiber

Boeblingen Germany

Boeblingen Germany
View Profile

,
Dongsoo Lee

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Shih-Hsien Lo

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Gary Maier

East Fishkill, NY

East Fishkill, NY
View Profile

,
Michael Scheuermann

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Swagath Venkataramani

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Christos Vezyrtzis

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Naigang Wang

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Fanchieh Yee

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Ching Zhou

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Pong-Fei Lu

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Brian Curran

IBM Systems Group, Poughkeepsie, NY

IBM Systems Group, Poughkeepsie, NY
View Profile

,
Leland Chang

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

,
Kailash Gopalakrishnan

IBM TJ Watson Research Center, Yorktown Heights, NY

IBM TJ Watson Research Center, Yorktown Heights, NY
View Profile

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and DesignJuly 2018Article No.: 35Pages 1–2https://doi.org/10.1145/3218603.3241339

Published:23 July 2018Publication History

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

Pages 1–2

ABSTRACT

The combination of growth in compute capabilities and availability of large datasets has led to a re-birth of deep learning. Deep Neural Networks (DNNs) have become state-of-the-art in a variety of machine learning tasks spanning domains across vision, speech, and machine translation. Deep Learning (DL) achieves high accuracy in these tasks at the expense of 100s of ExaOps of computation; posing significant challenges to efficient large-scale deployment in both resource-constrained environments and data centers.

One of the key enablers to improve operational efficiency of DNNs is the observation that when extracting deep insight from vast quantities of structured and unstructured data the exactness imposed by traditional computing is not required. Relaxing the "exactness" constraint enables exploiting opportunities for approximate computing across all layers of the system stack.

In this talk we present a multi-TOPS AI core [3] for acceleration of deep learning training and inference in systems from edge devices to data centers. We demonstrate that to derive high sustained utilization and energy efficiency from the AI core requires ground-up re-thinking to exploit approximate computing across the stack including algorithms, architecture, programmability, and hardware.

Model accuracy is the fundamental measure of deep learning quality. The compute engine precision in our AI core is carefully calibrated to realize significant reduction in area and power while not compromising numerical accuracy. Our research at the DL algorithms/applications-level [2] shows that it is possible to carefully tune the precision of both weights and activations to as low as 2-bits for inference and was used to guide the choices of compute precision supported in the architecture and hardware for both training and inference. Similarly, distributed DL training's scalability is impacted by the communication overhead to exchange gradients and weights after each mini-batch. Our research on gradient compression [1] shows by selectively sending gradients larger than a threshold, and by further choosing the threshold based on the importance of the gradient we achieve achieve compression ratio of 40X for convolutional layers, and up to 200X for fully-connected layers of the network without losing model accuracy. These results guide the choice of interconnection network topology exploration for a system of accelerators built using the AI core.

Overall, our work shows how the benefits from exploiting approximation using algorithm/application's robustness to tolerate reduced precision, and compressed data communication can be combined effectively with the architecture and hardware of the accelerator designed to support these reduced-precision computation and compressed data communication. Our results demonstate improved end-to-end efficiency of the DL accelerator across different metrics such as high sustained TOPs, high TOPs/watt and TOPs/mm2 catering to different operating environments for both training and inference.

References

C. Chen, J. Choi, D. Brand, A. Agrawal, W. Zhang, and K. Gopalakrishnan. Adacomp: Adaptive residual gradient compression for data-parallel distributed training. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, February 2-7, 2018, 2018.Google Scholar
J. Choi, Z. Wang, S. Venkataramani, P. I. Chuang, V. Srinivasan, and K. Gopalakrishnan. PACT: parameterized clipping activation for quantized neural networks. CoRR, abs/1805.06085, 2018.Google Scholar
B. Fleischer, S. S. annd Matthew Ziegler, J. Silberman, J. Oh, V. Srinivasan, J. Choi, S. Mueller, A. Agrawal, T. Babinsky, N. Cao, C.-Y. Chen, P. Chuang, T. Fox, G. Gristede, M. Guillorn, H. Haynie, M. Klaiber, D. Lee, S.-H. Lo, G. Maier, M. Scheuermann, S. Venkataramani, C. Vezyrtzis, N. Wang, F. Yee, C. Zhou, P.-F. Lu, B. Curran, L. Chang, and K. Gopalakrishnan. A scalable multi-teraops deep learning processor core for ai training and inference. In Proceedings of VLSI Symposium, 2018.Google Scholar

Index Terms

Across the Stack Opportunities for Deep Learning Acceleration
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
  2. Very large scale integration design
    1. Application-specific VLSI designs
      1. Application specific processors

Recommendations

AMAIX: A Generic Analytical Model for Deep Learning Accelerators
Embedded Computer Systems: Architectures, Modeling, and Simulation
Abstract
In recent years the growing popularity of Convolutional Neural Networks (CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerators (DLAs). The large market for DLAs and the huge amount of papers published on ...
Read More
AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators
Abstract
In recent years the growing popularity of Convolutional Neural Network(CNNs) has driven the development of specialized hardware, so called Deep Learning Accelerator (DLAs). The large market for DLAs and the huge amount of papers published on DLA ...
Read More
Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel^® FPGAs
High Performance Computing
Abstract
AI and deep learning are experiencing explosive growth in almost every domain involving analysis of big data. Deep learning using Deep Neural Networks (DNNs) has shown great promise for such scientific data analysis applications. However, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design
July 2018
327 pages
ISBN:9781450357043
DOI:10.1145/3218603

Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 July 2018
Check for updates
Author Tags
DL Training
DL inference
DNN Optimizations
Deep Learning Accelerators
Qualifiers
- invited-talk
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate398of1,159submissions,34%
Upcoming Conference
ISLPED '24

Sponsor:

sigda

ACM/IEEE International Symposium on Low Power Electronics and Design

August 5 - 7, 2024

Newport Beach , CA , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 317
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Across the Stack Opportunities for Deep Learning Acceleration

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

ABSTRACT

References

Cited By

Index Terms

Recommendations

AMAIX: A Generic Analytical Model for Deep Learning Accelerators

AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators

Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel^® FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Across the Stack Opportunities for Deep Learning Acceleration

ISLPED '18: Proceedings of the International Symposium on Low Power Electronics and Design

ABSTRACT

References

Cited By

Index Terms

Recommendations

AMAIX: A Generic Analytical Model for Deep Learning Accelerators

AMAIX In-Depth: A Generic Analytical Model for Deep Learning Accelerators

Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel® FPGAs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

Acceleration of Scientific Deep Learning Models on Heterogeneous Computing Platform with Intel^® FPGAs