research-article

Low power driven loop tiling for RRAM crossbar-based CNN

Authors:
Yuanhui Ni

Capital Normal University, Beijing, China

Capital Normal University, Beijing, China
View Profile

,
Keni Qiu

Capital Normal University, Beijing, China

Capital Normal University, Beijing, China
View Profile

,
Weiwen Chen

Capital Normal University, Beijing, China

Capital Normal University, Beijing, China
View Profile

,
Lixue Xia

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Yu Wang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingApril 2018Pages 375–380https://doi.org/10.1145/3167132.3167174

Published:09 April 2018Publication History

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Pages 375–380

ABSTRACT

Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Multiply and accumulate (MAC) operations serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design presents a very high overhead on peripheral circuits and memory accesses, limiting the gains of RCS.

Addressing the problem, recently a Multi-CLP (Convolutional Layer Processor) structure has been proposed, where the FPGA controlling resources can be shared by multiple computation units. Exploiting this idea, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed, with the underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. This paper adopts the above structures. It is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different CLPs. A loop tiling technique is proposed to enable memory accesses bypassing and further improve the energy of RCS. And to guarantee correct data dependency between layers, the safe starting time for a layer is discussed if its previous layer is tiled in a different CLP. The experiments of two convolutional applications validate that the loop tiling technique integrated with the Multi-CLP structure can efficiently meet power budgets and further reduce energy consumption by 61.7%.

References

M. Alwani, H. Chen, M. Ferdman, and P. Milder. 2016. Fused-layer CNN accelerators. In MICRO'16. 1--12. Google ScholarDigital Library
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In ASPLOS'14. 269--284. Google ScholarDigital Library
P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In ISCA'16. 27--39. Google ScholarDigital Library
T. Geng, L. Waeijen, M. Peemen, H. Corporaal, and Y. He. 2016. MacSim: A MAC-Enabled High-Performance Low-Power SIMD Architecture. In DSD'16. 160--167.Google Scholar
B. Li, X. Xia, P. Gu, Y. Wang, and H. Yang. 2015. Merging the Interface: Power, Area and Accuracy Co-optimization for RRAM Crossbar-based Mixed-signal Computing System. In DAC'15. 13:1--13:6. Google ScholarDigital Library
H. Li, Y. Chen, C. Liu, J. P. Strachan, and N. Davila. 2017. Looking Ahead for Resistive Memory Technology: A broad perspective on ReRAM technology for future storage and computing. IEEE Consumer Electronics Magazine 6, 1 (2017), 94--103.Google ScholarCross Ref
Y. Ni, W. Chen, W. Cui, Y. Zhou, and K. Qiu. {n. d.}. Power Optimization Through Peripheral Circuit Reusing Integrated with Loop Tiling for RRAM Crossbar-based CNN. In DATE' 18. 1183--1186.Google Scholar
P. Panda, A. Sengupta, S. S. Sarwar, G. Srinivasan, S. Venkataramani, A. Raghunathan, and K. Roy. 2016. Cross-layer approximations for neuromorphic computing: From devices to circuits and systems. In DAC'16. 1--6. Google ScholarDigital Library
M. Peemen, A. Setio, B. Mesman, and H. Corporaal. 2013. Memory-centric accelerator design for Convolutional Neural Networks. In ICCD'13. 13--19.Google Scholar
K. Qiu, W. Chen, Y. Xu, L. Xia, Y. Wang, and Z. Shao. {n. d.}. A Peripheral Circuit Reuse Structure Integrated with a Re-timed Data Flow for Low Power RRAM Crossbar-based CNN. In DATE' 18. 1057 -- 1062.Google Scholar
Y. Shen, M. Ferdman, and P. Milder. 2017. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In ISCA '17. 535--547. Google ScholarDigital Library
F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, and S. Wei. 2017. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE Transactions on Very Large Scale Integration Systems(TVLSI) 25, 8 (2017), 2220--2233.Google ScholarDigital Library
L. Xia, T. Tang, W. Huangfu, M. Cheng, X. Yin, B. Li, Y. Wang, and H. Yang. 2016. Switched by input: Power efficient structure for RRAM-based convolutional neural network. In DAC'16. 1--6. Google ScholarDigital Library
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In FPGA'15. 161--170. Google ScholarDigital Library

Index Terms

Low power driven loop tiling for RRAM crossbar-based CNN
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks

Recommendations

Write Mode Aware Loop Tiling for High Performance Low Power Volatile PCM
DAC '14: Proceedings of the 51st Annual Design Automation Conference

Architecting PCM, especially MLC PCM, as main memory for MCUs is a promising technique to replace conventional DRAM deployment. However, PCM/MLC PCM suffers from long write latency and large write energy. Recent work has proposed a compiler directed ...
Read More
Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs
Abstract
Non-volatile memory (NVM) is expected to be the second tier of memory in two-tier memory systems. However, because of the limited write endurance, it is vital to reduce the number of writes on NVM. Large-scale nested loops are the ...
Read More
Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding
ASPDAC '22: Proceedings of the 27th Asia and South Pacific Design Automation Conference

Non-volatile memory (NVM) is expected to be the second level memory (named remote memory) in two-level memory hierarchy in the future. However, NVM has the limited write endurance, thus it is vital to reduce the number of write operations on NVM. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
April 2018
2327 pages
ISBN:9781450351911
DOI:10.1145/3167132
Conference Chairs:
Hisham M. Haddad
Kennesaw State University
,
Roger L. Wainwright
University of Tulsa
,
Richard Chbeir
University of Pau & Pays Adour, France
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
RCS
loop tiling
peripheral circuits
safe starting time
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 150
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Low power driven loop tiling for RRAM crossbar-based CNN

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Write Mode Aware Loop Tiling for High Performance Low Power Volatile PCM

Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs

Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Low power driven loop tiling for RRAM crossbar-based CNN

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Write Mode Aware Loop Tiling for High Performance Low Power Volatile PCM

Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs

Optimal Loop Tiling for Minimizing Write Operations on NVMs with Complete Memory Latency Hiding

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media