skip to main content
10.1145/3229762.3229766acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference

Published:20 June 2018Publication History

ABSTRACT

We present a full-stack design to accelerate deep learning inference with FPGAs. Our contribution is two-fold. At the software layer, we leverage and extend TVM, the end-to-end deep learning optimizing compiler, in order to harness FPGA-based acceleration. At the the hardware layer, we present the Versatile Tensor Accelerator (VTA) which presents a generic, modular, and customizable architecture for TPU-like accelerators. Our results take a ResNet-18 description in MxNet and compiles it down to perform 8-bit inference on a 256-PE accelerator implemented on a low-cost Xilinx Zynq FPGA, clocked at 100MHz. Our full hardware acceleration stack will be made available for the community to reproduce, and build upon at http://github.com/uwsaml/vta.

References

  1. 2018. ReQuEST at ASPLOS'18: 1st Reproducible Tournament on Pareto-efficient Image Classification. http://cknowledge.org/request-cfp-asplos2018.html.Google ScholarGoogle Scholar
  2. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Neural Information Processing Systems, Workshop on Machine Learning Systems (LearningSys'15).Google ScholarGoogle Scholar
  3. Tianqi Chen, Thierry Moreau, Ziheng Jiang, Haichen Shen, Eddie Q. Yan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: End-to-End Optimization Stack for Deep Learning. CoRR abs/1802.04799 (2018). arXiv:1802.04799 http://arxiv.org/abs/1802.04799.Google ScholarGoogle Scholar
  4. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg, John Hu, Robert Hundt, Dan Hurt, Julian Ibarz, Aaron Jaffey, Alek Jaworski, Alexander Kaplan, Harshit Khaitan, Daniel Killebrew, Andy Koch, Naveen Kumar, Steve Lacy, James Laudon, James Law, Diemthu Le, Chris Leary, Zhuyuan Liu, Kyle Lucke, Alan Lundin, Gordon MacKean, Adriana Maggiore, Maire Mahony, Kieran Miller, Rahul Nagarajan, Ravi Narayanaswami, Ray Ni, Kathy Nix, Thomas Norrie, Mark Omernick, Narayana Penukonda, Andy Phelps, Jonathan Ross, Matt Ross, Amir Salek, Emad Samadiani, Chris Severn, Gregory Sizikov, Matthew Snelham, Jed Souter, Dan Steinberg, Andy Swing, Mercedes Tan, Gregory Thorson, Bo Tian, Horia Toma, Erick Tuttle, Vijay Vasudevan, Richard Walter, Walter Wang, Eric Wilcox, and Doe Hyun Yoon. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). ACM, New York, NY, USA, 1-12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Thierry Moreau, Anton Lokhmotov, and Grigori Fursin. 2018. Introducing ReQuEST: an Open Platform for Reproducible and Quality-Efficient Systems-ML Tournaments. ArXiv e-prints. arXiv:stat.ML/1801.06378.Google ScholarGoogle Scholar
  6. James E. Smith. 1982. Decoupled Access/Execute Computer Architectures. In Proceedings of the 9th Annual Symposium on Computer Architecture (ISCA '82). IEEE Computer Society Press, Los Alamitos, CA, USA, 112-119. http://dl.acm.org/citation.cfm?id=800048.801719. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Leveraging the VTA-TVM Hardware-Software Stack for FPGA Acceleration of 8-bit ResNet-18 Inference

                          Recommendations

                          Comments

                          Login options

                          Check if you have access through your login credentials or your institution to get full access on this article.

                          Sign in
                          • Published in

                            cover image ACM Conferences
                            ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning
                            June 2018
                            44 pages
                            ISBN:9781450359238
                            DOI:10.1145/3229762

                            Copyright © 2018 ACM

                            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                            Publisher

                            Association for Computing Machinery

                            New York, NY, United States

                            Publication History

                            • Published: 20 June 2018

                            Permissions

                            Request permissions about this article.

                            Request Permissions

                            Check for updates

                            Qualifiers

                            • research-article

                          PDF Format

                          View or Download as a PDF file.

                          PDF

                          eReader

                          View online with eReader.

                          eReader