research-article

BSGP: bulk-synchronous GPU programming

Authors:

Baining GuoAuthors Info & Claims

ACM Transactions on Graphics (TOG), Volume 27, Issue 3

Pages 1 - 12

https://doi.org/10.1145/1360612.1360618

Published: 01 August 2008 Publication History

Abstract

We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs, while the source code complexity and programming time are significantly reduced. To test BSGP's code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.

Supplementary Material

MOV File (a19-hou.mov)

Download
28.25 MB

References

[1]

ARTIS, 2004. X3DToolKit homepage. http://artis.imag.fr/Software/X3D/.

[2]

ATI, 2006. Researcher CTM documentation. http://ati.amd.com/companyinfo/researcher/documents.html.

[3]

Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777--786.

Digital Library

[4]

Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451--490.

Digital Library

[5]

Foley, T., and Sugerman, J. 2005. Kd-tree acceleration structures for a gpu raytracer. In Proceedings of Graphics Hardware, 15--22.

Digital Library

[6]

Ford, L. R., and Fulkerson, D. R. 1962. Flows in Networks. Princeton University Press.

[7]

Fortune, S., and Wyllie, J. 1978. Parallelism in random access machines. In STOC '78: Proceedings of the tenth annual ACM symposium on Theory of computing, 114--118.

Digital Library

[8]

Gress, A., and Zachmann, G. 2006. GPU-ABiSort: optimal parallel sorting on stream architectures. In Parallel and Distributed Processing Symposium, 45--54.

Digital Library

[9]

Harris, M., Owens, J., Sengupta, S., Zhang, Y., and Davidson, A., 2007. CUDPP homepage. http://www.gpgpu.org/developer/cudpp/.

[10]

Hill, J. M. D., McColl, B., Stefanescu, D. C., Goudreau, M. W., Lang, K., Rao, S. B., Suel, T., Tsantilas, T., and Bisseling, R. H. 1998. Bsplib: The bsp programming library. Parallel Comput. 24, 14, 1947--1980.

Digital Library

[11]

Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In I3D'07, 167--174.

Digital Library

[12]

Kale, L. V., and Krishnan, S. 1993. Charm++: a portable concurrent object oriented system based on c++. SIGPLAN Notices 28, 10, 91--108.

Digital Library

[13]

Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J. 2003. Cg: a system for programming graphics hardware in a c-like language. In Proceedings of SIGGRAPH '03, 896--907.

Digital Library

[14]

McCool, M., and Du Toit, S. 2004. Metaprogramming GPUs with Sh. AK Peters Ltd.

Digital Library

[15]

McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, 57--68.

Digital Library

[16]

McCool, M., Du Toit, S., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3, 787--795.

Digital Library

[17]

Moreton, H. 2001. Watertight tessellation using forward differencing. In Proceedings of Graphics Hardware, 25--32.

Digital Library

[18]

NVIDIA, 2007. CUDA homepage. http://developer.nvidia.com/object/cuda.html.

[19]

Olano, M., and Lastra, A. 1998. A shading language on graphics hardware: the pixelflow shading system. In Proceedings of SIGGRAPH'98, 159--168.

Digital Library

[20]

Parisi, T. 2003. Flux: lightweight, standards-based web graphics in xml. In ACM SIGGRAPH 2003 Web Graphics, 1--1.

Digital Library

[21]

Peercy, M. S., Olano, M., Airey, J., and Ungar, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of SIGGRAPH'00, 425--432.

Digital Library

[22]

Popov, S., Günther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance gpu ray tracing. In Proceedings of Eurographics'07, 415--424.

[23]

Proudfoot, K., Mark, W. R., Tzvetkov, S., and Hanrahan, P. 2001. A real-time procedural shading system for programmable graphics hardware. In Proceedings of SIGGRAPH'01, 159--170.

Digital Library

[24]

Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for gpu computing. In Graphics Hardware, 97--106.

Digital Library

[25]

Skillicorn, D. B., Hill, J. M. D., and McColl, W. F. 1997. Questions and Answers about BSP. Scientific Programming 6, 3, 249--274.

[26]

Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: using data parallelism to program gpus for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, 325--335.

Digital Library

[27]

Valiant, L. G. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103--111.

Digital Library

[28]

Wegman, M. N., and Zadeck, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2, 181--210.

Digital Library

Cited By

Zheng SChen XShi ZYan LXu K(2024)GPU Coroutines for Flexible Splitting and Scheduling of Rendering TasksACM Transactions on Graphics10.1145/368776643:6(1-24)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687766
Yang JDev SCampbell A(2024)RenderKernel: High-level programming for real-time rendering systemsVisual Informatics10.1016/j.visinf.2024.09.0048:3(82-95)Online publication date: Sep-2024
https://doi.org/10.1016/j.visinf.2024.09.004
Amaris MCamargo RCordeiro DGoldman ATrystram D(2023)Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniquesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.002171:C(66-78)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2022.09.002
Show More Cited By

Index Terms

BSGP: bulk-synchronous GPU programming

Recommendations

Real-time KD-tree construction on graphics hardware

We present an algorithm for constructing kd-trees on GPUs. This algorithm achieves real-time performance by exploiting the GPU's streaming architecture at all stages of kd-tree construction. Unlike previous parallel kd-tree algorithms, our method builds ...
Design and implementation of stream processing system and library for CELL broadband engine processors
PDCS '07: Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems

To bring out high performance in applications running on a CELL Broadband Engine processor (CELL processor), developers have to know its architecture and have special skills of the programming. As we know, the CELL processor is suitable for stream ...
Stream Processing on Hybrid CPU/Intel^® Xeon Phi^™ Systems
Euro-Par 2018: Parallel Processing
Abstract
Stream processing is currently central to handle large volumes of data generated at high rates. However, the efficient processing of such quantity of data demands massively parallel hardware. The usual approach is to rely on clusters of multi-...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics

ACM Transactions on Graphics Volume 27, Issue 3

August 2008

844 pages

ISSN:0730-0301

EISSN:1557-7368

DOI:10.1145/1360612

Issue’s Table of Contents

Copyright © 2008 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2008

Published in TOG Volume 27, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

67
Total Citations
View Citations
1,917
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zheng SChen XShi ZYan LXu K(2024)GPU Coroutines for Flexible Splitting and Scheduling of Rendering TasksACM Transactions on Graphics10.1145/368776643:6(1-24)Online publication date: 19-Dec-2024
https://dl.acm.org/doi/10.1145/3687766
Yang JDev SCampbell A(2024)RenderKernel: High-level programming for real-time rendering systemsVisual Informatics10.1016/j.visinf.2024.09.0048:3(82-95)Online publication date: Sep-2024
https://doi.org/10.1016/j.visinf.2024.09.004
Amaris MCamargo RCordeiro DGoldman ATrystram D(2023)Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniquesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.002171:C(66-78)Online publication date: 1-Jan-2023
https://dl.acm.org/doi/10.1016/j.jpdc.2022.09.002
Girolamo JHope JQasem A(2022)Uncovering input-sensitive energy bottlenecks in oversubscribed GPU workloadsSustainable Computing: Informatics and Systems10.1016/j.suscom.2022.10065435(100654)Online publication date: Sep-2022
https://doi.org/10.1016/j.suscom.2022.100654
Sewall JChhugani JKim CSatish NDubey P(2020)PALMProceedings of the VLDB Endowment10.14778/3402707.34027194:11(795-806)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402707.3402719
Andon PDoroshenko AZhereb KYatsenko O(2018)Algebra-Algorithmic Models and Methods of Parallel Programing10.15407/akademperiodyka.367.192Online publication date: 2018
https://doi.org/10.15407/akademperiodyka.367.192
Jo SJeong YLee S(2018)GPU-Driven Scalable Parser for OBJ ModelsJournal of Computer Science and Technology10.1007/s11390-018-1827-233:2(417-428)Online publication date: 23-Mar-2018
https://doi.org/10.1007/s11390-018-1827-2
Allombert VGava FTesson J(2017)Multi-MLInternational Journal of Parallel Programming10.1007/s10766-016-0417-645:2(340-361)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s10766-016-0417-6
Yang XYang BWang PXu D(2016)MSKDMultimedia Tools and Applications10.1007/s11042-014-2371-x75:2(1349-1364)Online publication date: 1-Jan-2016
https://dl.acm.org/doi/10.1007/s11042-014-2371-x
Koike ASadakane K(2015)A Novel Computational Model for GPUs with Applications to Efficient AlgorithmsInternational Journal of Networking and Computing10.15803/ijnc.5.1_265:1(26-60)Online publication date: 2015
https://doi.org/10.15803/ijnc.5.1_26
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents