research-article

Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference

Authors:
Thierry Moreau

University of Washington, USA

University of Washington, USA
View Profile

,
Tianqi Chen

University of Washington, USA

University of Washington, USA
View Profile

,
Luis Ceze

University of Washington, USA

University of Washington, USA
View Profile

ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep LearningJune 2018Article No.: 5https://doi.org/10.1145/3229762.3229766

Published:20 June 2018Publication History

ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning

ABSTRACT

We present a full-stack design to accelerate deep learning inference with FPGAs. Our contribution is two-fold. At the software layer, we leverage and extend TVM, the end-to-end deep learning optimizing compiler, in order to harness FPGA-based acceleration. At the the hardware layer, we present the Versatile Tensor Accelerator (VTA) which presents a generic, modular, and customizable architecture for TPU-like accelerators. Our results take a ResNet-18 description in MxNet and compiles it down to perform 8-bit inference on a 256-PE accelerator implemented on a low-cost Xilinx Zynq FPGA, clocked at 100MHz. Our full hardware acceleration stack will be made available for the community to reproduce, and build upon at http://github.com/uwsaml/vta.

References

2018. ReQuEST at ASPLOS'18: 1st Reproducible Tournament on Pareto-efficient Image Classification. http://cknowledge.org/request-cfp-asplos2018.html.Google Scholar
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems (LearningSys'15).Google Scholar
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q. Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: End-to-End Optimization Stack for Deep Learning. CoRR abs/1802.04799 (2018). arXiv:1802.04799 http://arxiv.org/abs/1802.04799.Google Scholar
Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 1-12. Google ScholarDigital Library
Thierry Moreau, Anton Lokhmotov, and Grigori Fursin. 2018. Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments. ArXiv e-prints. arXiv:stat.ML/1801.06378.Google Scholar
James E. Smith. 1982. Decoupled Access/Execute Computer Architectures. In Proceedings of the 9th Annual Symposium on Computer Architecture (ISCA '82). IEEE Computer Society Press, Los Alamitos, CA, USA, 112-119. http://dl.acm.org/citation.cfm?id=800048.801719. Google ScholarDigital Library

Index Terms

Recommendations

Hardware-software co-design of AES on FPGA
ICACCI '12: Proceedings of the International Conference on Advances in Computing, Communications and Informatics

This paper presents a compact hardware-software co-design of Advanced Encryption Standard (AES) on the field programmable gate arrays (FPGA) designed for low-cost embedded systems. The design uses MicroBlaze, a soft-core processor from Xilinx. The ...
Read More
Acceleration of an FPGA router
FCCM '97: Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines

The authors describe their experience and progress in accelerating an FPGA router. Placement and routing is undoubtedly the most time-consuming process in automatic chip design or configuring programmable logic devices as reconfigurable computing ...
Read More
A framework for dynamically-loaded hardware library (HLL) in FPGA acceleration
ISSPIT '15: Proceedings of the 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Hardware acceleration is often used to address the need for speed and computing power in embedded systems. FPGAs always represented a good solution for HW acceleration and, recently, new SoC platforms extended the flexibility of the FPGAs by combining ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning
June 2018
44 pages
ISBN:9781450359238
DOI:10.1145/3229762
Organizing Chairs:
Luis Ceze
University of Washington, USA
,
Natalie Enright Jerger
University of Toronto, Canada
,
Babak Falsafi
EPFL, Switzerland
,
Grigori Fursin
cTuning foundation, France
,
Anton Lokhmotov
dividiti, UK
,
Thierry Moreau
University of Washington, USA
,
Adrian Sampson
Cornell University, USA
,
Phillip Stanley Marbell
University of Cambridge, UK
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Evaluated & Functional
- Results Reproduced / v1.1
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 1,080
  Total Downloads
- Downloads (Last 12 months)98
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference

ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hardware-software co-design of AES on FPGA

Acceleration of an FPGA router

A framework for dynamically-loaded hardware library (HLL) in FPGA acceleration