skip to main content
10.1145/3195970.3196079acmconferencesArticle/Chapter ViewAbstractPublication PagesdacConference Proceedingsconference-collections
research-article

NNsim: fast performance estimation based on sampled simulation of GPGPU kernels for neural networks

Published:24 June 2018Publication History

ABSTRACT

Existent GPU simulators are too slow to use for neural networks implemented in GPUs. For fast performance estimation, we propose a novel hybrid method of analytical performance modeling and sampled simulation of GPUs. By taking full advantage of repeated computation of neural networks, three sampling techniques are devised: Inter-Kernel sampling, Intra-Kernel sampling, and Streaming Multiprocessor sampling. The key technique is to estimate the average IPC through sampled simulation, considering the effect of the warp scheduler and memory access contention. Compared with GPGPU-Sim, the proposed technique reduces the simulation time by up to 450 times with less than 5.0% of accuracy loss.

References

  1. Greg Diamos. 2016. Baidu Releases AI Benchmark. (September 2016). https://www.eetimes.com/document.asp?doc_id=1330521 {Online; posted 26-09-2016}.Google ScholarGoogle Scholar
  2. Bakhoda et al. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In ISPASS. 163--174.Google ScholarGoogle Scholar
  3. Farooqui et al. 2011. A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In GPGPU-4. 9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fang et al. 2013. FastLanes: An FPGA accelerated GPU microarchitecture simulator. In ICCD. 241--248.Google ScholarGoogle Scholar
  5. Huang et al. 2014. TBPoint: Reducing simulation time for large-scale GPGPU kernels. In IPDPS. 437--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. He et al. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on CVPR. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  7. Huang et al. 2016. Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016).Google ScholarGoogle Scholar
  8. Ko et al. 2014. Hardware-in-the-loop simulation for CPU/GPU heterogeneous platforms. In Proceedings of the 51st Annual DAC. 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Lee et al. 2016. Parallel GPU Architecture Simulation Framework Exploiting Architectural-Level Parallelism with Timing Error Prediction. IEEE TC 65, 4 (2016), 1253--1265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Redmon et al. 2016. YOLO9000: Better, Faster, Stronger. arXiv preprint arXiv:1612.08242 (2016).Google ScholarGoogle Scholar
  11. Sim et al. 2012. A performance analysis framework for identifying potential benefits in GPGPU applications. In ACM SIGPLAN Notices, Vol. 47. 11--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Wang et al. 2017. CGPredict: Embedded GPU Performance Estimation from Single-Threaded Applications. TECS 16 (2017), 146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Yu et al. 2015. GPGPU-MiniBench: Accelerating GPGPU micro-architecture simulation. IEEE TC 64, 11 (2015), 3153--3166. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. NNsim: fast performance estimation based on sampled simulation of GPGPU kernels for neural networks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      DAC '18: Proceedings of the 55th Annual Design Automation Conference
      June 2018
      1089 pages
      ISBN:9781450357005
      DOI:10.1145/3195970

      Copyright © 2018 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 June 2018

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,770of5,499submissions,32%

      Upcoming Conference

      DAC '24
      61st ACM/IEEE Design Automation Conference
      June 23 - 27, 2024
      San Francisco , CA , USA

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader