research-article

Efficient Instrumentation of GPGPU Applications Using Information Flow Analysis and Symbolic Execution

Authors:

Naila Farooqui,

Karsten Schwan,

Sudhakar YalamanchiliAuthors Info & Claims

GPGPU-7: Proceedings of Workshop on General Purpose Processing Using GPUs

Pages 19 - 27

https://doi.org/10.1145/2588768.2576782

Published: 01 March 2014 Publication History

Abstract

Dynamic instrumentation of GPGPU binaries makes possible real-time introspection methods for performance debugging, correctness checks, workload characterization, and runtime optimization. Such instrumentation involves inserting code at the instruction level of an application, while the application is running, thereby able to accurately profile data-dependent application behavior. Runtime overheads seen from instrumentation, however, can obviate its utility. This paper shows how a combination of information flow analysis and symbolic execution can be used to alleviate these overheads. The methods and their effectiveness are demonstrated for a variety of GPGPU codes written in OpenCL that run on AMD GPU target backends. Kernels that can be analyzed entirely via symbolic execution need not be instrumented, thus eliminating kernel runtime overheads altogether. For the remaining GPU kernels, our results show 5-38% improvements in kernel runtime overheads.

References

[1]

AMD. AMD APP SDK. AMD, 2.9 edition.

[2]

AMD. CodeXL. AMD, 3.1 edition.

[3]

AMD. AMD Intermediate Language (IL). AMD, 2.4 edition, October 2011.

[4]

A. Ariel, W. W. L. Fung, A. E. Turner, and T. M. Aamodt. Visualizing complex dynamics in many-core accelerator architectures. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 164--174, White Plains, NY, USA, March 2010.

[5]

S. S. Baghsorkhi, M. Delahaye, S. J. Patel, W. D. Gropp, and W.-m. W. Hwu. An adaptive performance modeling tool for gpu architectures. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 105--114, New York, NY, USA, 2010. ACM.

Digital Library

[6]

A. Bakhoda, G. Yuan, W. W. L. Fung, H. Wong, and T. M. Aamodt. Analyzing cuda workloads using a detailed gpu simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 163--174, Boston, MA, USA, April 2009.

[7]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A benchmark suite for heterogeneous computing. In Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, pages 44--54, Oct. 2009.

Digital Library

[8]

S. Collange, D. Defour, and D. Parello. Barra, a modular functional gpu simulator for gpgpu. Technical Report hal-00359342, 2009.

[9]

G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark. Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT '10, pages 353--364, New York, NY, USA, 2010. ACM.

Digital Library

[10]

R. Dominguez, D. Schaa, and D. Kaeli. Caracal: Dynamic translation of runtime environments for gpus. In Proceedings of the 4th Workshop on General-Purpose Computation on Graphics Processing Units, Newport Beach, CA, USA, March 2011. ACM.

Digital Library

[11]

N. Farooqui, A. Kerr, G. Diamos, S. Yalamanchili, and K. Schwan. A framework for dynamically instrumenting gpu compute applications within gpu ocelot. In Proceedings of the 4th Workshop on General-Purpose Computation on Graphics Processing Units, Newport Beach, CA, USA, March 2011. ACM.

Digital Library

[12]

N. Farooqui, A. Kerr, G. Eisenhauer, K. Schwan, and S. Yalamanchili. Lynx: A dynamic instrumentation system for data-parallel applications on gpgpu architectures. In Performance Analysis of Systems and Software (ISPASS), 2012 IEEE International Symposium on, pages 58 --67, april 2012. http://code.google.com/p/gpulynx/.

Digital Library

[13]

N. Goswami, R. Shankar, M. Joshi, and T. Li. Exploring gpgpu workloads: Characterization methodology, analysis and microarchitecture evaluation implications. In Workload Characterization (IISWC), 2010 IEEE International Symposium on, pages 1--10, 2010.

Digital Library

[14]

D. Grewe, Z. Wang, and M. F. O'Boyle. Portable mapping of data parallel programs to opencl for heterogeneous systems. In CGO '13: Proceedings of the 11th International Symposium on Code Generation and Optimization. ACM, 2013.

Digital Library

[15]

K. O. W. Group. The OpenCL Specification, December 2008.

[16]

K. O. W. Group. HSA Programmer ÃćâĆňâĎćs Reference Manual: Virtual ISA and Programming Model, Compiler Writer ÃćâĆňâĎćs Guide, and Object Format (BRIG), 0.95 edition, May 2013.

[17]

A. Kerr, G. Diamos, and S. Yalamanchili. A characterization and analysis of ptx kernels. Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on, 2009.

Digital Library

[18]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.

Digital Library

[19]

G. Li, P. Li, G. Sawaya, G. Gopalakrishnan, I. Ghosh, and S. P. Rajan. Gklee: Concolic verification and test generation for gpus. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 215--224, New York, NY, USA, 2012. ACM.

Digital Library

[20]

J. Newsome. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. 2005.

[21]

NVIDIA. NVIDIA CUDA Compute Unified Device Architecture. NVIDIA Corporation, Santa Clara, California, 2.1 edition, October 2008.

[22]

NVIDIA. NVIDIA Compute Visual Profiler. NVIDIA Corporation, Santa Clara, California, 4.0 edition, May 2011.

[23]

Y. Zhang and J. D. Owens. A quantitative performance analysis model for gpu architectures. In 17th International Conference on High-Performance Computer Architecture (HPCA-17), pages 382--393, San Antonio, TX, USA, February 2011. IEEE Computer Society.

Digital Library

Cited By

Farooqui NKaeli DCavazos J(2016)A systems perspective on GPU computingProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884057(72-81)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2884045.2884057
(2016)Toward high-performance key-value stores through GPU encoding and locality-aware encodingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.04.01596:C(27-37)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2016.04.015
Zhao DMahakode ALakshminarasaiah SRaicu I(2016)High-Performance Storage Support for Scientific Big Data Applications on the CloudResource Management for Big Data Platforms10.1007/978-3-319-44881-7_8(147-170)Online publication date: 28-Oct-2016
https://doi.org/10.1007/978-3-319-44881-7_8
Show More Cited By

Index Terms

Efficient Instrumentation of GPGPU Applications Using Information Flow Analysis and Symbolic Execution

Recommendations

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot
GPGPU-4: Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units

In this paper we present the design and implementation of a dynamic instrumentation infrastructure for PTX programs that procedurally transforms kernels and manages related data structures. We show how performing instrumentation within the GPU Ocelot ...
A unified optimizing compiler framework for different GPGPU architectures

This article presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and ...
Modeling GPU-CPU workloads and systems
GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units

Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. While it may be possible for domain experts to optimize a high performance application for a very specific and well ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

GPGPU-7: Proceedings of Workshop on General Purpose Processing Using GPUs

March 2014

110 pages

ISBN:9781450327664

DOI:10.1145/2588768

Conference Chairs:
John Cavazos,
Xiang Gong,
David Kaeli

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

GPGPU-7

GPGPU-7: Seventh Workshop on General Purpose Processing Using GPUs

March 1, 2014

UT, Salt Lake City, USA

Acceptance Rates

GPGPU-7 Paper Acceptance Rate 12 of 27 submissions, 44%;

Overall Acceptance Rate 57 of 129 submissions, 44%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
316
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)2

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Farooqui NKaeli DCavazos J(2016)A systems perspective on GPU computingProceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit10.1145/2884045.2884057(72-81)Online publication date: 12-Mar-2016
https://dl.acm.org/doi/10.1145/2884045.2884057
(2016)Toward high-performance key-value stores through GPU encoding and locality-aware encodingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.04.01596:C(27-37)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1016/j.jpdc.2016.04.015
Zhao DMahakode ALakshminarasaiah SRaicu I(2016)High-Performance Storage Support for Scientific Big Data Applications on the CloudResource Management for Big Data Platforms10.1007/978-3-319-44881-7_8(147-170)Online publication date: 28-Oct-2016
https://doi.org/10.1007/978-3-319-44881-7_8
Sengupta DSong SAgarwal KSchwan KKern JVetter J(2015)GraphReduceProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/2807591.2807655(1-12)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2807591.2807655
Stewart CGupta V(2014)The workshop on diversity in systems research 2013ACM SIGOPS Operating Systems Review10.1145/2626401.262642248:1(103-106)Online publication date: 15-May-2014
https://dl.acm.org/doi/10.1145/2626401.2626422

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten