poster

Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems

Authors:
Bo Liu

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Wenbin Jiang

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Hai Jin

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Xuanhua Shi

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

,
Yang Ma

Huazhong University of Science and Technology, Wuhan, China

Huazhong University of Science and Technology, Wuhan, China
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 53 Issue 1January 2018pp 405–406https://doi.org/10.1145/3200691.3178528

Published:10 February 2018Publication History

ACM SIGPLAN Notices

Abstract

Growing accuracy and robustness of Deep Neural Networks (DNN) models are accompanied by growing model capacity (going deeper or wider). However, high memory requirements of those models make it difficult to execute the training process in one GPU. To address it, we first identify the memory usage characteristics for deep and wide convolutional networks, and demonstrate the opportunities of memory reuse on both intra-layer and inter-layer levels. We then present Layrub, a runtime data placement strategy that orchestrates the execution of training process. It achieves layer-centric reuse to reduce memory consumption for extreme-scale deep learning that cannot be run on one single GPU.

References

Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin. 2016. Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174 (2016).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16). IEEE, Las Vegas, NV, USA, 770--778.Google ScholarCross Ref
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia (MM'14). ACM, Orlando, Florida, USA, 675--678. Google ScholarDigital Library
Sergey Zagoruyko and Nikos Komodakis. 2016. Wide residual networks. arXiv preprint arXiv:1605.07146 (2016).Google Scholar

Index Terms

Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Neural networks

Recommendations

Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

Due to the popularity of Deep Neural Network (DNN) models, we have witnessed extreme-scale DNN models with the continued increase of the scale in terms of depth and width. However, the extremely high memory requirements for them make it difficult to run ...
Read More
Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Growing accuracy and robustness of Deep Neural Networks (DNN) models are accompanied by growing model capacity (going deeper or wider). However, high memory requirements of those models make it difficult to execute the training process in one GPU. To ...
Read More
In-Datacenter Performance Analysis of a Tensor Processing Unit
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer Architecture

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU) --- deployed in datacenters since 2015 that accelerates ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 53, Issue 1
PPoPP '18
January 2018
426 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3200691
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2018
442 pages
ISBN:9781450349826
DOI:10.1145/3178487
General Chair:
Andreas Krall
Vienna University of Technology, Austria
,
Program Chair:
Thomas R. Gross
ETH Zürich, Switzerland
Copyright © 2018 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 10 February 2018
Check for updates
Author Tags
DNN
GPU
data placement
memory efficiency
Qualifiers
- poster
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 356
  Total Downloads
- Downloads (Last 12 months)12
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems

ACM SIGPLAN Notices

Abstract

References

Cited By

Index Terms

Recommendations

Layer-Centric Memory Reuse and Data Migration for Extreme-Scale Deep Learning on Many-Core Architectures

Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systems

In-Datacenter Performance Analysis of a Tensor Processing Unit