skip to main content
article
Open Access
Seminal Paper

Brook for GPUs: stream computing on graphics hardware

Published:01 August 2004Publication History
Skip Abstract Section

Abstract

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.

Skip Supplemental Material Section

Supplemental Material

References

  1. ATI, 2004. Hardware image processing using ARB_fragment_program. http://www.ati.com/developer/sdk/RadeonSDK/Html/Samples/OpenGL/HW_Image_Processing.html.Google ScholarGoogle Scholar
  2. ATI, 2004. Radeon X800 product site. http://www.ati.com/products/radeonx800.Google ScholarGoogle Scholar
  3. BOLZ, J., FARMER, I., GRINSPUN, E., ANDSCHRÖDER, P. 2003. Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22, 3, 917--924. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BOVE, V., AND WATLINGTON, J. 1995. Cheops: A reconfigurable data-flow system for video processing. IEEE Trans. on Circuits and Systems for Video Technology (April), 140--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BROOK, 2004. Brook project web page. http://brook.sourceforge.net.Google ScholarGoogle Scholar
  6. BUCK, I. 2004. Brook specification v.0.2. Tech. Rep. CSTR 2003-04 10/31/03 12/5/03, Stanford University.Google ScholarGoogle Scholar
  7. CARR, N. A., HALL, J. D., AND HART, J. C. 2002. The Ray Engine. In Proceedings of Graphics hardware, Eurographics Association, 37--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CHAN, E., NG, R., SEN, P., PROUDFOOT, K., AND HANRAHAN, P. 2002. Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware. In Proceedings of Graphics hardware, Eurographics Association, 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. COOLEY, J. W., AND TUKEY, J. W. 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19 (April), 297--301.Google ScholarGoogle ScholarCross RefCross Ref
  10. DALLY, W. J., HANRAHAN, P., EREZ, M., KNIGHT, T. J., LABONT, F., AHN, J.-H., JAYASENA, N., KAPASI, U. J., DAS, A., GUMMARAJU, J., AND BUCK, I. 2003. Merrimac: Supercomputing with Streams. In Proceedings of SC2003, ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. DONGARRA, J. 2004. Performance of various computers using standard linear equations software. Tech. Rep. CS-89-85, University of Tennessee, Knoxville TN. Google ScholarGoogle Scholar
  12. ENGLAND, N. 1986. A graphics system architecture for interactive application-specific display functions. In IEEE CGA, 60--70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. FLISAKOWSKI, S., 2004. cTool library. http://ctool.sourceforge.net.Google ScholarGoogle Scholar
  14. FRIGO, M., AND JOHNSON, S. G., 2003. benchFFT home page. http://www.fftw.org/benchfft.Google ScholarGoogle Scholar
  15. FUCHS, H., POULTON, J., EYLES, J., GREER, T., GOLDFEATHER, J., ELLSWORTH, D., MOLNAR, S., TURK, G., TEBBS, B., AND ISRAEL, L. 1989. Pixel-Planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories. In Computer Graphics (Proceedings of ACM SIGGRAPH 89), ACM Press, 79--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. GOKHALE, M., AND GOMERSALL, E. 1997. High level compilation for fine grained fpgas. In Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines, 165--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. HARRIS, M. J., BAXTER, W. V., SCHEUERMANN, T., AND LASTRA, A. 2003. Simulation of cloud dynamics on graphics hardware. In Proceedings of Graphics hardware, Eurographics Association, 92--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. INTEL, 2003. Intel software development products. http://www.intel.com/software/products/compilers.Google ScholarGoogle Scholar
  19. INTEL, 2004. Intel math kernel library. http://www.intel.com/software/products/mkl.Google ScholarGoogle Scholar
  20. KAPASI, U., DALLY, W. J., RIXNER, S., OWENS, J. D., AND KHAILANY, B. 2002. The Imagine Stream Processor. Proceedings of International Conference on Computer Design (September). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. KESSENICH, J., BALDWIN, D., AND ROST, R., 2003. The OpenGL Shading Language. http://www.opengl.org/documentation/oglsl.html.Google ScholarGoogle Scholar
  22. KHAILANY, B., DALLY, W. J., RIXNER, S., KAPASI, U. J., MATTSON, P., NAMKOONG, J., OWENS, J. D., TOWLES, B., AND CHAN, A. 2001. IMAGINE: Media processing with streams. In IEEE Micro. IEEE Computer Society Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. KOZYRAKIS, C. 1999. A media-enhance vector architecture for embedded memory systems. Tech. Rep. UCB/CSD-99-1059, Univ. of California at Berkeley. Google ScholarGoogle Scholar
  24. KRÜGER, J., AND WESTERMANN, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22, 3, 908--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. LABONTE, F., HOROWITZ, M., AND BUCK, I., 2004. An evaluation of graphics processors as stream co-processors. Unpublished.Google ScholarGoogle Scholar
  26. LAWSON, C. L., HANSON, R. J., KINCAID, D. R., AND KROGH, F. T. 1979. Basic Linear Algebra Subprograms for Fortran usage. ACM Trans. on Mathematical Software 5, 3 (Sept.), 308--323. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. LINDHOLM, E., KLIGARD, M. J., AND MORETON, H. 2001. A user-programmable vertex engine. In Proceedings of SIGGRAPH 2001, ACM Press/Addison-Wesley Publishing Co., 149--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. MARK, W. R., GLANVILLE, R. S., AKELEY, K., AND KILGARD, M. J. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. Graph. 22, 3, 896--907. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. MATTSON, P. 2002. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. MCCOOL, M. D., QIN, Z., AND POPA, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, Eurographics Association, 57--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. McCOOL, M., DU TOIT, S., POPA, T., CHAN, B., AND MOULE, K. 2004. Shader algebra. ACM Trans. Graph. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. MICROSOFT, 2003. High-level shader language. http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/graphics/reference/Shaders/HighLevelShaderLanguage.asp.Google ScholarGoogle Scholar
  33. MOLNAR, S., EYLES, J., AND POULTON, J. 1992. PixelFlow: High-speed rendering using image composition. In Proceedings of ACM SIGGRAPH 92, ACM Press, 231--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. MORELAND, K., AND ANGEL, E. 2003. The FFT on a GPU. In Proceedings of Graphics hardware, Eurographics Association, 112--119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. NVIDIA, 2004. GeForce 6800: Product overview. http://nvidia.com/page/geforce_6800.html.Google ScholarGoogle Scholar
  36. OWENS, J. D., DALLY, W. J., KAPASI, U. J., RIXNER, S., MATTSON, P., AND MOWERY, B. 2000. Polygon rendering on a stream architecture. In Proceedings of Graphics hardware, ACM Press, 23--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. PEERCY, M. S., OLANO, M., AIREY, J., AND UNGAR, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of ACM SIGGRAPH 2000, ACM Press/Addison-Wesley Publishing Co., 425--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. PERCY, J., 2003. OpenGL Extensions. http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions_SIG03.pdf.Google ScholarGoogle Scholar
  39. PERONA, P., AND MALIK, J. 1990. Scale-space And Edge Detection Using Anisotropic Diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12, 7 (June), 629--639. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. PURCELL, T. J., BUCK, I., MARK, W. R., AND HANRAHAN, P. 2002. Ray tracing on programmable graphics hardware. ACM Trans. Graph., 703--712. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. RUSSELL, R. 1978. The Cray-1 computer system. In Comm. ACM, 63--72. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. SANKARALINGAM, K., NAGARAJAN, R., LIU, H., HUH, J., KIM, C., D. BURGER, KECKLER, S., AND MOORE, C. 2003. Exploiting ILP, TLP, and DLP using polymorphism in the TRIPS architecture. In 30th Annual International Symposium on Computer Architecture (ISCA), 422--433. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. SHERBONDY, A., HOUSTON, M., AND NAPEL, S. 2003. Fast volume segmentation with simultaneous visualization using programmable graphics hardware. IEEE Visualization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. SULLIVAN, W., WERTHIMER, D., BOWYER, S., COBB, J., GEDYE, D., AND ANDERSON, D. 1997. A new major SETI project based on Project Serendip data and 100,000 personal computers. In Astronomical and Biochemical Origins and the Search for Life in the Universe, Proceedings of the Fifth International Conference on Bioastronomy, Editrice Compositori, C. Cosmovici, S. Bowyer, and D. Wertheimer, Eds.Google ScholarGoogle Scholar
  45. TAYLOR, M. B., KIM, J., MILLER, J., WENTZLAFF, D., GHODRAT, F., GREENWALD, B., HOFFMANN, H., JOHNSON, P., LEE, J.-W., LEE, W., MA, A., SARAF, A., SENESKI, M., SHNIDMAN, N., STRUMPEN, V., FRANK, M., AMARASINGHE, S., AND AGARWAL, A. 2002. The raw microprocessor: A computational fabric for software circuits and general purpose programs. In IEEE Micro. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. THOMPSON, C. J., HAHN, S., AND OSKIN, M. 2002. Using modern graphics architectures for general-purpose computing: A framework and analysis. International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. WALD, I. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University.Google ScholarGoogle Scholar
  48. WHALEY, R. C., PETITET, A., AND DONGARRA, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 1--2, 3--35.Google ScholarGoogle ScholarCross RefCross Ref
  49. WOO, M., NEIDER, J., DAVIS, T., SHREINER, D., AND OPENGL ARCHITECTURE REVIEW BOARD, 1999. OpenGL programming guide.Google ScholarGoogle Scholar

Index Terms

  1. Brook for GPUs: stream computing on graphics hardware

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Graphics
            ACM Transactions on Graphics  Volume 23, Issue 3
            August 2004
            684 pages
            ISSN:0730-0301
            EISSN:1557-7368
            DOI:10.1145/1015706
            Issue’s Table of Contents

            Copyright © 2004 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 1 August 2004
            Published in tog Volume 23, Issue 3

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader