skip to main content
10.1145/3167132.3167174acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Low power driven loop tiling for RRAM crossbar-based CNN

Published:09 April 2018Publication History

ABSTRACT

Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Multiply and accumulate (MAC) operations serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design presents a very high overhead on peripheral circuits and memory accesses, limiting the gains of RCS.

Addressing the problem, recently a Multi-CLP (Convolutional Layer Processor) structure has been proposed, where the FPGA controlling resources can be shared by multiple computation units. Exploiting this idea, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed, with the underlying idea is to put the expensive AD/DAs onto spotlight and arrange multiple convolution layers to be sequentially served by the same PeriCU. This paper adopts the above structures. It is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different CLPs. A loop tiling technique is proposed to enable memory accesses bypassing and further improve the energy of RCS. And to guarantee correct data dependency between layers, the safe starting time for a layer is discussed if its previous layer is tiled in a different CLP. The experiments of two convolutional applications validate that the loop tiling technique integrated with the Multi-CLP structure can efficiently meet power budgets and further reduce energy consumption by 61.7%.

References

  1. M. Alwani, H. Chen, M. Ferdman, and P. Milder. 2016. Fused-layer CNN accelerators. In MICRO'16. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. DianNao: A Small-footprint High-throughput Accelerator for Ubiquitous Machine-learning. In ASPLOS'14. 269--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In ISCA'16. 27--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Geng, L. Waeijen, M. Peemen, H. Corporaal, and Y. He. 2016. MacSim: A MAC-Enabled High-Performance Low-Power SIMD Architecture. In DSD'16. 160--167.Google ScholarGoogle Scholar
  5. B. Li, X. Xia, P. Gu, Y. Wang, and H. Yang. 2015. Merging the Interface: Power, Area and Accuracy Co-optimization for RRAM Crossbar-based Mixed-signal Computing System. In DAC'15. 13:1--13:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. H. Li, Y. Chen, C. Liu, J. P. Strachan, and N. Davila. 2017. Looking Ahead for Resistive Memory Technology: A broad perspective on ReRAM technology for future storage and computing. IEEE Consumer Electronics Magazine 6, 1 (2017), 94--103.Google ScholarGoogle ScholarCross RefCross Ref
  7. Y. Ni, W. Chen, W. Cui, Y. Zhou, and K. Qiu. {n. d.}. Power Optimization Through Peripheral Circuit Reusing Integrated with Loop Tiling for RRAM Crossbar-based CNN. In DATE' 18. 1183--1186.Google ScholarGoogle Scholar
  8. P. Panda, A. Sengupta, S. S. Sarwar, G. Srinivasan, S. Venkataramani, A. Raghunathan, and K. Roy. 2016. Cross-layer approximations for neuromorphic computing: From devices to circuits and systems. In DAC'16. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Peemen, A. Setio, B. Mesman, and H. Corporaal. 2013. Memory-centric accelerator design for Convolutional Neural Networks. In ICCD'13. 13--19.Google ScholarGoogle Scholar
  10. K. Qiu, W. Chen, Y. Xu, L. Xia, Y. Wang, and Z. Shao. {n. d.}. A Peripheral Circuit Reuse Structure Integrated with a Re-timed Data Flow for Low Power RRAM Crossbar-based CNN. In DATE' 18. 1057 -- 1062.Google ScholarGoogle Scholar
  11. Y. Shen, M. Ferdman, and P. Milder. 2017. Maximizing CNN Accelerator Efficiency Through Resource Partitioning. In ISCA '17. 535--547. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Tu, S. Yin, P. Ouyang, S. Tang, L. Liu, and S. Wei. 2017. Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE Transactions on Very Large Scale Integration Systems(TVLSI) 25, 8 (2017), 2220--2233.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Xia, T. Tang, W. Huangfu, M. Cheng, X. Yin, B. Li, Y. Wang, and H. Yang. 2016. Switched by input: Power efficient structure for RRAM-based convolutional neural network. In DAC'16. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In FPGA'15. 161--170. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Low power driven loop tiling for RRAM crossbar-based CNN

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
      April 2018
      2327 pages
      ISBN:9781450351911
      DOI:10.1145/3167132

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 April 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader