article

Open Access

Brook for GPUs: stream computing on graphics hardware

Authors:
Ian Buck

Stanford University

Stanford University
View Profile

,
Tim Foley

Stanford University

Stanford University
View Profile

,
Daniel Horn

Stanford University

Stanford University
View Profile

,
Jeremy Sugerman

Stanford University

Stanford University
View Profile

,
Kayvon Fatahalian

Stanford University

Stanford University
View Profile

,
Mike Houston

Stanford University

Stanford University
View Profile

,
Pat Hanrahan

Stanford University

Stanford University
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 23 Issue 3pp 777–786https://doi.org/10.1145/1015706.1015800

Published:01 August 2004Publication History

ACM Transactions on Graphics

Abstract

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.

Supplemental Material

Available for Download

mov

pps068.mov (11.7 KB)

References

ATI, 2004. Hardware image processing using ARB_fragment_program. http://www.ati.com/developer/sdk/RadeonSDK/Html/Samples/OpenGL/HW_Image_Processing.html.Google Scholar
ATI, 2004. Radeon X800 product site. http://www.ati.com/products/radeonx800.Google Scholar
BOLZ, J., FARMER, I., GRINSPUN, E., ANDSCHRÖDER, P. 2003. Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22, 3, 917--924. Google ScholarDigital Library
BOVE, V., AND WATLINGTON, J. 1995. Cheops: A reconfigurable data-flow system for video processing. IEEE Trans. on Circuits and Systems for Video Technology (April), 140--149. Google ScholarDigital Library
BROOK, 2004. Brook project web page. http://brook.sourceforge.net.Google Scholar
BUCK, I. 2004. Brook specification v.0.2. Tech. Rep. CSTR 2003-04 10/31/03 12/5/03, Stanford University.Google Scholar
CARR, N. A., HALL, J. D., AND HART, J. C. 2002. The Ray Engine. In Proceedings of Graphics hardware, Eurographics Association, 37--46. Google ScholarDigital Library
CHAN, E., NG, R., SEN, P., PROUDFOOT, K., AND HANRAHAN, P. 2002. Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware. In Proceedings of Graphics hardware, Eurographics Association, 69--78. Google ScholarDigital Library
COOLEY, J. W., AND TUKEY, J. W. 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19 (April), 297--301.Google ScholarCross Ref
DALLY, W. J., HANRAHAN, P., EREZ, M., KNIGHT, T. J., LABONT, F., AHN, J.-H., JAYASENA, N., KAPASI, U. J., DAS, A., GUMMARAJU, J., AND BUCK, I. 2003. Merrimac: Supercomputing with Streams. In Proceedings of SC2003, ACM Press. Google ScholarDigital Library
DONGARRA, J. 2004. Performance of various computers using standard linear equations software. Tech. Rep. CS-89-85, University of Tennessee, Knoxville TN. Google Scholar
ENGLAND, N. 1986. A graphics system architecture for interactive application-specific display functions. In IEEE CGA, 60--70. Google ScholarDigital Library
FLISAKOWSKI, S., 2004. cTool library. http://ctool.sourceforge.net.Google Scholar
FRIGO, M., AND JOHNSON, S. G., 2003. benchFFT home page. http://www.fftw.org/benchfft.Google Scholar
FUCHS, H., POULTON, J., EYLES, J., GREER, T., GOLDFEATHER, J., ELLSWORTH, D., MOLNAR, S., TURK, G., TEBBS, B., AND ISRAEL, L. 1989. Pixel-Planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories. In Computer Graphics (Proceedings of ACM SIGGRAPH 89), ACM Press, 79--88. Google ScholarDigital Library
GOKHALE, M., AND GOMERSALL, E. 1997. High level compilation for fine grained fpgas. In Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines, 165--173. Google ScholarDigital Library
HARRIS, M. J., BAXTER, W. V., SCHEUERMANN, T., AND LASTRA, A. 2003. Simulation of cloud dynamics on graphics hardware. In Proceedings of Graphics hardware, Eurographics Association, 92--101. Google ScholarDigital Library
INTEL, 2003. Intel software development products. http://www.intel.com/software/products/compilers.Google Scholar
INTEL, 2004. Intel math kernel library. http://www.intel.com/software/products/mkl.Google Scholar
KAPASI, U., DALLY, W. J., RIXNER, S., OWENS, J. D., AND KHAILANY, B. 2002. The Imagine Stream Processor. Proceedings of International Conference on Computer Design (September). Google ScholarDigital Library
KESSENICH, J., BALDWIN, D., AND ROST, R., 2003. The OpenGL Shading Language. http://www.opengl.org/documentation/oglsl.html.Google Scholar
KHAILANY, B., DALLY, W. J., RIXNER, S., KAPASI, U. J., MATTSON, P., NAMKOONG, J., OWENS, J. D., TOWLES, B., AND CHAN, A. 2001. IMAGINE: Media processing with streams. In IEEE Micro. IEEE Computer Society Press. Google ScholarDigital Library
KOZYRAKIS, C. 1999. A media-enhance vector architecture for embedded memory systems. Tech. Rep. UCB/CSD-99-1059, Univ. of California at Berkeley. Google Scholar
KRÜGER, J., AND WESTERMANN, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22, 3, 908--916. Google ScholarDigital Library
LABONTE, F., HOROWITZ, M., AND BUCK, I., 2004. An evaluation of graphics processors as stream co-processors. Unpublished.Google Scholar
LAWSON, C. L., HANSON, R. J., KINCAID, D. R., AND KROGH, F. T. 1979. Basic Linear Algebra Subprograms for Fortran usage. ACM Trans. on Mathematical Software 5, 3 (Sept.), 308--323. Google ScholarDigital Library
LINDHOLM, E., KLIGARD, M. J., AND MORETON, H. 2001. A user-programmable vertex engine. In Proceedings of SIGGRAPH 2001, ACM Press/Addison-Wesley Publishing Co., 149--158. Google ScholarDigital Library
MARK, W. R., GLANVILLE, R. S., AKELEY, K., AND KILGARD, M. J. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. Graph. 22, 3, 896--907. Google ScholarDigital Library
MATTSON, P. 2002. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University. Google ScholarDigital Library
MCCOOL, M. D., QIN, Z., AND POPA, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, Eurographics Association, 57--68. Google ScholarDigital Library
McCOOL, M., DU TOIT, S., POPA, T., CHAN, B., AND MOULE, K. 2004. Shader algebra. ACM Trans. Graph. Google ScholarDigital Library
MICROSOFT, 2003. High-level shader language. http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/graphics/reference/Shaders/HighLevelShaderLanguage.asp.Google Scholar
MOLNAR, S., EYLES, J., AND POULTON, J. 1992. PixelFlow: High-speed rendering using image composition. In Proceedings of ACM SIGGRAPH 92, ACM Press, 231--240. Google ScholarDigital Library
MORELAND, K., AND ANGEL, E. 2003. The FFT on a GPU. In Proceedings of Graphics hardware, Eurographics Association, 112--119. Google ScholarDigital Library
NVIDIA, 2004. GeForce 6800: Product overview. http://nvidia.com/page/geforce_6800.html.Google Scholar
OWENS, J. D., DALLY, W. J., KAPASI, U. J., RIXNER, S., MATTSON, P., AND MOWERY, B. 2000. Polygon rendering on a stream architecture. In Proceedings of Graphics hardware, ACM Press, 23--32. Google ScholarDigital Library
PEERCY, M. S., OLANO, M., AIREY, J., AND UNGAR, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of ACM SIGGRAPH 2000, ACM Press/Addison-Wesley Publishing Co., 425--432. Google ScholarDigital Library
PERCY, J., 2003. OpenGL Extensions. http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions_SIG03.pdf.Google Scholar
PERONA, P., AND MALIK, J. 1990. Scale-space And Edge Detection Using Anisotropic Diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12, 7 (June), 629--639. Google ScholarDigital Library
PURCELL, T. J., BUCK, I., MARK, W. R., AND HANRAHAN, P. 2002. Ray tracing on programmable graphics hardware. ACM Trans. Graph., 703--712. Google ScholarDigital Library
RUSSELL, R. 1978. The Cray-1 computer system. In Comm. ACM, 63--72. Google ScholarDigital Library
SANKARALINGAM, K., NAGARAJAN, R., LIU, H., HUH, J., KIM, C., D. BURGER, KECKLER, S., AND MOORE, C. 2003. Exploiting ILP, TLP, and DLP using polymorphism in the TRIPS architecture. In 30th Annual International Symposium on Computer Architecture (ISCA), 422--433. Google ScholarDigital Library
SHERBONDY, A., HOUSTON, M., AND NAPEL, S. 2003. Fast volume segmentation with simultaneous visualization using programmable graphics hardware. IEEE Visualization. Google ScholarDigital Library
SULLIVAN, W., WERTHIMER, D., BOWYER, S., COBB, J., GEDYE, D., AND ANDERSON, D. 1997. A new major SETI project based on Project Serendip data and 100,000 personal computers. In Astronomical and Biochemical Origins and the Search for Life in the Universe, Proceedings of the Fifth International Conference on Bioastronomy, Editrice Compositori, C. Cosmovici, S. Bowyer, and D. Wertheimer, Eds.Google Scholar
TAYLOR, M. B., KIM, J., MILLER, J., WENTZLAFF, D., GHODRAT, F., GREENWALD, B., HOFFMANN, H., JOHNSON, P., LEE, J.-W., LEE, W., MA, A., SARAF, A., SENESKI, M., SHNIDMAN, N., STRUMPEN, V., FRANK, M., AMARASINGHE, S., AND AGARWAL, A. 2002. The raw microprocessor: A computational fabric for software circuits and general purpose programs. In IEEE Micro. Google ScholarDigital Library
THOMPSON, C. J., HAHN, S., AND OSKIN, M. 2002. Using modern graphics architectures for general-purpose computing: A framework and analysis. International Symposium on Microarchitecture. Google ScholarDigital Library
WALD, I. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University.Google Scholar
WHALEY, R. C., PETITET, A., AND DONGARRA, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 1--2, 3--35.Google ScholarCross Ref
WOO, M., NEIDER, J., DAVIS, T., SHREINER, D., AND OPENGL ARCHITECTURE REVIEW BOARD, 1999. OpenGL programming guide.Google Scholar

Index Terms

Brook for GPUs: stream computing on graphics hardware
1. Computing methodologies
  1. Computer graphics
    1. Graphics systems and interfaces
      1. Graphics processors
2. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Concurrent programming languages
        Distributed programming languages
        Parallel programming languages

Recommendations

Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 Papers

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Read More
Brook for GPUs: Stream Computing on Graphics Hardware
Seminal Graphics Papers: Pushing the Boundaries, Volume 2

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Read More
Interactive k-d tree GPU raytracing
I3D '07: Proceedings of the 2007 symposium on Interactive 3D graphics and games

Over the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing on GPUs. Our work extends Foley et al.'s GPU k-d tree research. We port their kd-restart algorithm from multi-pass, using ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 23, Issue 3
August 2004
684 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/1015706
Editor:
John C. Hart
Issue’s Table of Contents
Copyright © 2004 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 2004
Published in tog Volume 23, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Seminal Paper
Author Tags
Brook
Data Parallel Computing
GPU Computing
Programmable Graphics Hardware
Stream Computing
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 472
  Total Citations
  View Citations
- 6,314
  Total Downloads
- Downloads (Last 12 months)572
- Downloads (Last 6 weeks)149
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Brook for GPUs: stream computing on graphics hardware

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Brook for GPUs: stream computing on graphics hardware

Brook for GPUs: Stream Computing on Graphics Hardware

Interactive k-d tree GPU raytracing