skip to main content
research-article

BSGP: bulk-synchronous GPU programming

Published: 01 August 2008 Publication History

Abstract

We present BSGP, a new programming language for general purpose computation on the GPU. A BSGP program looks much the same as a sequential C program. Programmers only need to supply a bare minimum of extra information to describe parallel processing on GPUs. As a result, BSGP programs are easy to read, write, and maintain. Moreover, the ease of programming does not come at the cost of performance. A well-designed BSGP compiler converts BSGP programs to kernels and combines them using optimally allocated temporary streams. In our benchmark, BSGP programs achieve similar or better performance than well-optimized CUDA programs, while the source code complexity and programming time are significantly reduced. To test BSGP's code efficiency and ease of programming, we implemented a variety of GPU applications, including a highly sophisticated X3D parser that would be extremely difficult to develop with existing GPU programming languages.

Supplementary Material

MOV File (a19-hou.mov)

References

[1]
ARTIS, 2004. X3DToolKit homepage. http://artis.imag.fr/Software/X3D/.
[2]
ATI, 2006. Researcher CTM documentation. http://ati.amd.com/companyinfo/researcher/documents.html.
[3]
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P. 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph. 23, 3, 777--786.
[4]
Cytron, R., Ferrante, J., Rosen, B. K., Wegman, M. N., and Zadeck, F. K. 1991. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst. 13, 4, 451--490.
[5]
Foley, T., and Sugerman, J. 2005. Kd-tree acceleration structures for a gpu raytracer. In Proceedings of Graphics Hardware, 15--22.
[6]
Ford, L. R., and Fulkerson, D. R. 1962. Flows in Networks. Princeton University Press.
[7]
Fortune, S., and Wyllie, J. 1978. Parallelism in random access machines. In STOC '78: Proceedings of the tenth annual ACM symposium on Theory of computing, 114--118.
[8]
Gress, A., and Zachmann, G. 2006. GPU-ABiSort: optimal parallel sorting on stream architectures. In Parallel and Distributed Processing Symposium, 45--54.
[9]
Harris, M., Owens, J., Sengupta, S., Zhang, Y., and Davidson, A., 2007. CUDPP homepage. http://www.gpgpu.org/developer/cudpp/.
[10]
Hill, J. M. D., McColl, B., Stefanescu, D. C., Goudreau, M. W., Lang, K., Rao, S. B., Suel, T., Tsantilas, T., and Bisseling, R. H. 1998. Bsplib: The bsp programming library. Parallel Comput. 24, 14, 1947--1980.
[11]
Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P. 2007. Interactive k-d tree gpu raytracing. In I3D'07, 167--174.
[12]
Kale, L. V., and Krishnan, S. 1993. Charm++: a portable concurrent object oriented system based on c++. SIGPLAN Notices 28, 10, 91--108.
[13]
Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J. 2003. Cg: a system for programming graphics hardware in a c-like language. In Proceedings of SIGGRAPH '03, 896--907.
[14]
McCool, M., and Du Toit, S. 2004. Metaprogramming GPUs with Sh. AK Peters Ltd.
[15]
McCool, M. D., Qin, Z., and Popa, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, 57--68.
[16]
McCool, M., Du Toit, S., Popa, T., Chan, B., and Moule, K. 2004. Shader algebra. ACM Trans. Graph. 23, 3, 787--795.
[17]
Moreton, H. 2001. Watertight tessellation using forward differencing. In Proceedings of Graphics Hardware, 25--32.
[18]
NVIDIA, 2007. CUDA homepage. http://developer.nvidia.com/object/cuda.html.
[19]
Olano, M., and Lastra, A. 1998. A shading language on graphics hardware: the pixelflow shading system. In Proceedings of SIGGRAPH'98, 159--168.
[20]
Parisi, T. 2003. Flux: lightweight, standards-based web graphics in xml. In ACM SIGGRAPH 2003 Web Graphics, 1--1.
[21]
Peercy, M. S., Olano, M., Airey, J., and Ungar, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of SIGGRAPH'00, 425--432.
[22]
Popov, S., Günther, J., Seidel, H.-P., and Slusallek, P. 2007. Stackless kd-tree traversal for high performance gpu ray tracing. In Proceedings of Eurographics'07, 415--424.
[23]
Proudfoot, K., Mark, W. R., Tzvetkov, S., and Hanrahan, P. 2001. A real-time procedural shading system for programmable graphics hardware. In Proceedings of SIGGRAPH'01, 159--170.
[24]
Sengupta, S., Harris, M., Zhang, Y., and Owens, J. D. 2007. Scan primitives for gpu computing. In Graphics Hardware, 97--106.
[25]
Skillicorn, D. B., Hill, J. M. D., and McColl, W. F. 1997. Questions and Answers about BSP. Scientific Programming 6, 3, 249--274.
[26]
Tarditi, D., Puri, S., and Oglesby, J. 2006. Accelerator: using data parallelism to program gpus for general-purpose uses. In ASPLOS-XII: Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, 325--335.
[27]
Valiant, L. G. 1990. A bridging model for parallel computation. Communications of the ACM 33, 8, 103--111.
[28]
Wegman, M. N., and Zadeck, F. K. 1991. Constant propagation with conditional branches. ACM Trans. Program. Lang. Syst. 13, 2, 181--210.

Cited By

View all
  • (2024)GPU Coroutines for Flexible Splitting and Scheduling of Rendering TasksACM Transactions on Graphics10.1145/368776643:6(1-24)Online publication date: 19-Dec-2024
  • (2024)RenderKernel: High-level programming for real-time rendering systemsVisual Informatics10.1016/j.visinf.2024.09.0048:3(82-95)Online publication date: Sep-2024
  • (2023)Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniquesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.002171:C(66-78)Online publication date: 1-Jan-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 27, Issue 3
August 2008
844 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/1360612
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 August 2008
Published in TOG Volume 27, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. bulk synchronous parallel programming
  2. programable graphics hardware
  3. stream processing
  4. thread manipulation

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)1
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)GPU Coroutines for Flexible Splitting and Scheduling of Rendering TasksACM Transactions on Graphics10.1145/368776643:6(1-24)Online publication date: 19-Dec-2024
  • (2024)RenderKernel: High-level programming for real-time rendering systemsVisual Informatics10.1016/j.visinf.2024.09.0048:3(82-95)Online publication date: Sep-2024
  • (2023)Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniquesJournal of Parallel and Distributed Computing10.1016/j.jpdc.2022.09.002171:C(66-78)Online publication date: 1-Jan-2023
  • (2022)Uncovering input-sensitive energy bottlenecks in oversubscribed GPU workloadsSustainable Computing: Informatics and Systems10.1016/j.suscom.2022.10065435(100654)Online publication date: Sep-2022
  • (2020)PALMProceedings of the VLDB Endowment10.14778/3402707.34027194:11(795-806)Online publication date: 3-Jun-2020
  • (2018)Algebra-Algorithmic Models and Methods of Parallel Programing10.15407/akademperiodyka.367.192Online publication date: 2018
  • (2018)GPU-Driven Scalable Parser for OBJ ModelsJournal of Computer Science and Technology10.1007/s11390-018-1827-233:2(417-428)Online publication date: 23-Mar-2018
  • (2017)Multi-MLInternational Journal of Parallel Programming10.1007/s10766-016-0417-645:2(340-361)Online publication date: 1-Apr-2017
  • (2016)MSKDMultimedia Tools and Applications10.1007/s11042-014-2371-x75:2(1349-1364)Online publication date: 1-Jan-2016
  • (2015)A Novel Computational Model for GPUs with Applications to Efficient AlgorithmsInternational Journal of Networking and Computing10.15803/ijnc.5.1_265:1(26-60)Online publication date: 2015
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media