ABSTRACT
We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming termed synchronous, delayed visibility (SDV) semantics. The SDV semantics provides a precise definition of barrier divergence in GPU kernels and allows kernel verification to be reduced to analysis of a sequential program, thereby completely avoiding the need to reason about thread interleavings, and allowing existing modular techniques for program verification to be leveraged. We describe an efficient encoding for data race detection and propose a method for automatically inferring loop invariants required for verification. We have implemented these techniques as a practical verification tool, GPUVerify, which can be applied directly to OpenCL and CUDA source code. We evaluate GPUVerify with respect to a set of 163 kernels drawn from public and commercial sources. Our evaluation demonstrates that GPUVerify is capable of efficient, automatic verification of a large number of real-world kernels.
- AMD. AMD Accelerated Parallel Processing (APP) SDK. developer.amd.com/sdks/amdappsdk/pages/default.aspxGoogle Scholar
- M. Barnett, B. Chang, R. DeLine, B. Jacobs, and K. R. M. Leino. Boogie: A modular reusable verifier for object-oriented programs. In FMCO, 2005. Google ScholarDigital Library
- D. Beyer, T. A. Henzinger, R. Majumdar, and A. Rybalchenko. Path invariants. In PLDI, 2007. Google ScholarDigital Library
- M. Boyer, K. Skadron, and W. Weimer. Automated dynamic analysis of CUDA programs. In STMCS, 2008.Google Scholar
- C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI, 2008. Google ScholarDigital Library
- P. Collingbourne, C. Cadar, and P. H. J. Kelly. Symbolic testing of OpenCL code. In HVC, 2011.Google Scholar
- P. Cousot and R. Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, 1977. Google ScholarDigital Library
- L. de Moura and N. Bjørner. Z3: An efficient SMT solver. In TACAS, 2008. Google ScholarDigital Library
- A. Donaldson, D. Kroening and P. Rummer. Automatic analysis of scratch-pad memory code for heterogeneous multicore processors. In TACAS, 2010. Google ScholarDigital Library
- A. Donaldson, D. Kroening and P. Rummer. Automatic analysis of DMA races using model checking and k-induction. Formal Methods in System Design 39(1):83--113. Google ScholarDigital Library
- M. D. Ernst, J. H. Perkins, P. J. Guo, S. McCamant, C. Pacheco, M. S. Tschantz, and C. Xiao The Daikon system for dynamic detection of likely invariants. Sci. Comput. Program. 69(1-3):35--45. Google ScholarDigital Library
- C. Flanagan and K. R. M. Leino. Houdini, an annotation assistant for ESC/Java. In FME, 2001. Google ScholarDigital Library
- S. Graf and H. Sa idi. Construction of abstract state graphs with PVS. In CAV, 1997. Google ScholarDigital Library
- A. Habermaier and A. Knapp. On the correctness of the SIMT execution model of GPUs. In ESOP, 2012. Google ScholarDigital Library
- T. Kahsai, Y. Ge, and C. Tinelli. Instantiation-based invariant discovery. In NFM, 2011. Google ScholarDigital Library
- R. K. Karmani, P. Madhusudan, and B. Moore. Thread contracts for safe parallelism. In PPOPP, 2011. Google ScholarDigital Library
- Khronos OpenCL Working Group. The OpenCL specifica-tion, version 1.1, 2011. Document Revision: 44.Google Scholar
- S. K. Lahiri and S. Qadeer. Complexity and algorithms for monomial and clausal predicate. In CADE, 2009. Google ScholarDigital Library
- K. R. M. Leino, G. Nelson, and J. B. Saxe. ESC/Java user's manual. Technical Note 2000-002, Compaq Systems Research Center, October 2000.Google Scholar
- A. Leung, M. Gupta, Y. Agarwal, R. Gupta, R. Jhala, and S. Lerner. Verifying GPU kernels by test amplification. In PLDI, 2012. Google ScholarDigital Library
- G. Li and G. Gopalakrishnan. Scalable SMT-based verifica-tion of GPU kernel functions. In FSE, 2010. Google ScholarDigital Library
- G. Li, P. Li, G. Gopalakrishnan. Parametric flows: automated behaviour equivalencing for symbolic analysis of races in CUDA programs. In SC, 2012. Google ScholarDigital Library
- G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. GKLEE: concolic verification and test generation for GPUs. In PPOPP, 2012. Google ScholarDigital Library
- llvm.org. clang: a C language family frontend for LLVM. clang.llvm.orgGoogle Scholar
- A. Lokhmotov. Mobile and embedded computing on Mali GPUs. In UK GPU Computing Conference, 2011.Google Scholar
- K. L. McMillan. Lazy abstraction with interpolants. In CAV, 2006. Google ScholarDigital Library
- Microsoft Corporation. C++ AMP sample projects for download (MSDN blog). blogs.msdn.com/b/nativeconcurrency/archive/2012/01/30/c-amp-sample-projects-for-download.aspxGoogle Scholar
- D. Moth and Y. Levanoni. Microsoft's C++ AMP unveiled. www.drdobbs.com/windows/231600761Google Scholar
- NVIDIA. CUDA Toolkit Release Archive. developer.nvidia.com/cuda-toolkit-archiveGoogle Scholar
- NVIDIA. CUDA C programming guide, v4.0, 2011.Google Scholar
- NVIDIA. PTX: Parallel thread execution ISA, v2.3, 2011.Google Scholar
- L. Nyland. Personal communication, April 2012.Google Scholar
- L. Nyland, M. Harris, and J. Prins. Fast N-body simulation with CUDA. GPU Gems 3, Chapter 31. Addison-Wesley, 2007.Google Scholar
- Rightware Oy. Basemark CL. www.rightware.com/en/Benchmarking+Software/Basemark%99+CLGoogle Scholar
- S. Srivastava and S. Gulwani. Program verification using templates over predicate abstraction. In PLDI, 2009. Google ScholarDigital Library
- B. Steensgaard. Points-to analysis in almost linear time. In POPL, 1996. Google ScholarDigital Library
- S. Stone, J. Haldar, S. Tsao, W. Hwu, B. Sutton, and Z. Liang. Accelerating advanced MRI reconstructions on GPUs. J. Parallel Distrib. Comput., 68(10):1307--1318, 2008. Google ScholarDigital Library
- S. Tripakis, C. Stergiou, and R. Lublinerman. Checking non-interference in SPMD programs. In HotPar, 2010.Google Scholar
Index Terms
- GPUVerify: a verifier for GPU kernels
Recommendations
The Design and Implementation of a Verification Technique for GPU Kernels
We present a technique for the formal verification of GPU kernels, addressing two classes of correctness properties: data races and barrier divergence. Our approach is founded on a novel formal operational semantics for GPU kernels termed <i>synchronous, ...
GPUVerify: a verifier for GPU kernels
OOPSLA '12We present a technique for verifying race- and divergence-freedom of GPU kernels that are written in mainstream kernel programming languages such as OpenCL and CUDA. Our approach is founded on a novel formal operational semantics for GPU programming ...
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsData-dependent GPU kernels, whose data or control flow are dependent on the input of the program, are difficult to verify because they require reasoning about shared state manipulated by many parallel threads. Existing verification techniques for GPU ...
Comments