Abstract
In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts.
Supplemental Material
Available for Download
- ATI, 2004. Hardware image processing using ARB_fragment_program. http://www.ati.com/developer/sdk/RadeonSDK/Html/Samples/OpenGL/HW_Image_Processing.html.Google Scholar
- ATI, 2004. Radeon X800 product site. http://www.ati.com/products/radeonx800.Google Scholar
- BOLZ, J., FARMER, I., GRINSPUN, E., ANDSCHRÖDER, P. 2003. Sparse matrix solvers on the GPU: conjugate gradients and multigrid. ACM Trans. Graph. 22, 3, 917--924. Google ScholarDigital Library
- BOVE, V., AND WATLINGTON, J. 1995. Cheops: A reconfigurable data-flow system for video processing. IEEE Trans. on Circuits and Systems for Video Technology (April), 140--149. Google ScholarDigital Library
- BROOK, 2004. Brook project web page. http://brook.sourceforge.net.Google Scholar
- BUCK, I. 2004. Brook specification v.0.2. Tech. Rep. CSTR 2003-04 10/31/03 12/5/03, Stanford University.Google Scholar
- CARR, N. A., HALL, J. D., AND HART, J. C. 2002. The Ray Engine. In Proceedings of Graphics hardware, Eurographics Association, 37--46. Google ScholarDigital Library
- CHAN, E., NG, R., SEN, P., PROUDFOOT, K., AND HANRAHAN, P. 2002. Efficient partitioning of fragment shaders for multipass rendering on programmable graphics hardware. In Proceedings of Graphics hardware, Eurographics Association, 69--78. Google ScholarDigital Library
- COOLEY, J. W., AND TUKEY, J. W. 1965. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19 (April), 297--301.Google ScholarCross Ref
- DALLY, W. J., HANRAHAN, P., EREZ, M., KNIGHT, T. J., LABONT, F., AHN, J.-H., JAYASENA, N., KAPASI, U. J., DAS, A., GUMMARAJU, J., AND BUCK, I. 2003. Merrimac: Supercomputing with Streams. In Proceedings of SC2003, ACM Press. Google ScholarDigital Library
- DONGARRA, J. 2004. Performance of various computers using standard linear equations software. Tech. Rep. CS-89-85, University of Tennessee, Knoxville TN. Google Scholar
- ENGLAND, N. 1986. A graphics system architecture for interactive application-specific display functions. In IEEE CGA, 60--70. Google ScholarDigital Library
- FLISAKOWSKI, S., 2004. cTool library. http://ctool.sourceforge.net.Google Scholar
- FRIGO, M., AND JOHNSON, S. G., 2003. benchFFT home page. http://www.fftw.org/benchfft.Google Scholar
- FUCHS, H., POULTON, J., EYLES, J., GREER, T., GOLDFEATHER, J., ELLSWORTH, D., MOLNAR, S., TURK, G., TEBBS, B., AND ISRAEL, L. 1989. Pixel-Planes 5: a heterogeneous multiprocessor graphics system using processor-enhanced memories. In Computer Graphics (Proceedings of ACM SIGGRAPH 89), ACM Press, 79--88. Google ScholarDigital Library
- GOKHALE, M., AND GOMERSALL, E. 1997. High level compilation for fine grained fpgas. In Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines, 165--173. Google ScholarDigital Library
- HARRIS, M. J., BAXTER, W. V., SCHEUERMANN, T., AND LASTRA, A. 2003. Simulation of cloud dynamics on graphics hardware. In Proceedings of Graphics hardware, Eurographics Association, 92--101. Google ScholarDigital Library
- INTEL, 2003. Intel software development products. http://www.intel.com/software/products/compilers.Google Scholar
- INTEL, 2004. Intel math kernel library. http://www.intel.com/software/products/mkl.Google Scholar
- KAPASI, U., DALLY, W. J., RIXNER, S., OWENS, J. D., AND KHAILANY, B. 2002. The Imagine Stream Processor. Proceedings of International Conference on Computer Design (September). Google ScholarDigital Library
- KESSENICH, J., BALDWIN, D., AND ROST, R., 2003. The OpenGL Shading Language. http://www.opengl.org/documentation/oglsl.html.Google Scholar
- KHAILANY, B., DALLY, W. J., RIXNER, S., KAPASI, U. J., MATTSON, P., NAMKOONG, J., OWENS, J. D., TOWLES, B., AND CHAN, A. 2001. IMAGINE: Media processing with streams. In IEEE Micro. IEEE Computer Society Press. Google ScholarDigital Library
- KOZYRAKIS, C. 1999. A media-enhance vector architecture for embedded memory systems. Tech. Rep. UCB/CSD-99-1059, Univ. of California at Berkeley. Google Scholar
- KRÜGER, J., AND WESTERMANN, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms. ACM Trans. Graph. 22, 3, 908--916. Google ScholarDigital Library
- LABONTE, F., HOROWITZ, M., AND BUCK, I., 2004. An evaluation of graphics processors as stream co-processors. Unpublished.Google Scholar
- LAWSON, C. L., HANSON, R. J., KINCAID, D. R., AND KROGH, F. T. 1979. Basic Linear Algebra Subprograms for Fortran usage. ACM Trans. on Mathematical Software 5, 3 (Sept.), 308--323. Google ScholarDigital Library
- LINDHOLM, E., KLIGARD, M. J., AND MORETON, H. 2001. A user-programmable vertex engine. In Proceedings of SIGGRAPH 2001, ACM Press/Addison-Wesley Publishing Co., 149--158. Google ScholarDigital Library
- MARK, W. R., GLANVILLE, R. S., AKELEY, K., AND KILGARD, M. J. 2003. Cg: A system for programming graphics hardware in a C-like language. ACM Trans. Graph. 22, 3, 896--907. Google ScholarDigital Library
- MATTSON, P. 2002. A Programming System for the Imagine Media Processor. PhD thesis, Stanford University. Google ScholarDigital Library
- MCCOOL, M. D., QIN, Z., AND POPA, T. S. 2002. Shader metaprogramming. In Proceedings of Graphics hardware, Eurographics Association, 57--68. Google ScholarDigital Library
- McCOOL, M., DU TOIT, S., POPA, T., CHAN, B., AND MOULE, K. 2004. Shader algebra. ACM Trans. Graph. Google ScholarDigital Library
- MICROSOFT, 2003. High-level shader language. http://msdn.microsoft.com/library/default.asp?url=/library/enus/directx9_c/directx/graphics/reference/Shaders/HighLevelShaderLanguage.asp.Google Scholar
- MOLNAR, S., EYLES, J., AND POULTON, J. 1992. PixelFlow: High-speed rendering using image composition. In Proceedings of ACM SIGGRAPH 92, ACM Press, 231--240. Google ScholarDigital Library
- MORELAND, K., AND ANGEL, E. 2003. The FFT on a GPU. In Proceedings of Graphics hardware, Eurographics Association, 112--119. Google ScholarDigital Library
- NVIDIA, 2004. GeForce 6800: Product overview. http://nvidia.com/page/geforce_6800.html.Google Scholar
- OWENS, J. D., DALLY, W. J., KAPASI, U. J., RIXNER, S., MATTSON, P., AND MOWERY, B. 2000. Polygon rendering on a stream architecture. In Proceedings of Graphics hardware, ACM Press, 23--32. Google ScholarDigital Library
- PEERCY, M. S., OLANO, M., AIREY, J., AND UNGAR, P. J. 2000. Interactive multi-pass programmable shading. In Proceedings of ACM SIGGRAPH 2000, ACM Press/Addison-Wesley Publishing Co., 425--432. Google ScholarDigital Library
- PERCY, J., 2003. OpenGL Extensions. http://mirror.ati.com/developer/SIGGRAPH03/Percy_OpenGL_Extensions_SIG03.pdf.Google Scholar
- PERONA, P., AND MALIK, J. 1990. Scale-space And Edge Detection Using Anisotropic Diffusion. IEEE Trans. on Pattern Analysis and Machine Intelligence 12, 7 (June), 629--639. Google ScholarDigital Library
- PURCELL, T. J., BUCK, I., MARK, W. R., AND HANRAHAN, P. 2002. Ray tracing on programmable graphics hardware. ACM Trans. Graph., 703--712. Google ScholarDigital Library
- RUSSELL, R. 1978. The Cray-1 computer system. In Comm. ACM, 63--72. Google ScholarDigital Library
- SANKARALINGAM, K., NAGARAJAN, R., LIU, H., HUH, J., KIM, C., D. BURGER, KECKLER, S., AND MOORE, C. 2003. Exploiting ILP, TLP, and DLP using polymorphism in the TRIPS architecture. In 30th Annual International Symposium on Computer Architecture (ISCA), 422--433. Google ScholarDigital Library
- SHERBONDY, A., HOUSTON, M., AND NAPEL, S. 2003. Fast volume segmentation with simultaneous visualization using programmable graphics hardware. IEEE Visualization. Google ScholarDigital Library
- SULLIVAN, W., WERTHIMER, D., BOWYER, S., COBB, J., GEDYE, D., AND ANDERSON, D. 1997. A new major SETI project based on Project Serendip data and 100,000 personal computers. In Astronomical and Biochemical Origins and the Search for Life in the Universe, Proceedings of the Fifth International Conference on Bioastronomy, Editrice Compositori, C. Cosmovici, S. Bowyer, and D. Wertheimer, Eds.Google Scholar
- TAYLOR, M. B., KIM, J., MILLER, J., WENTZLAFF, D., GHODRAT, F., GREENWALD, B., HOFFMANN, H., JOHNSON, P., LEE, J.-W., LEE, W., MA, A., SARAF, A., SENESKI, M., SHNIDMAN, N., STRUMPEN, V., FRANK, M., AMARASINGHE, S., AND AGARWAL, A. 2002. The raw microprocessor: A computational fabric for software circuits and general purpose programs. In IEEE Micro. Google ScholarDigital Library
- THOMPSON, C. J., HAHN, S., AND OSKIN, M. 2002. Using modern graphics architectures for general-purpose computing: A framework and analysis. International Symposium on Microarchitecture. Google ScholarDigital Library
- WALD, I. 2004. Realtime Ray Tracing and Interactive Global Illumination. PhD thesis, Saarland University.Google Scholar
- WHALEY, R. C., PETITET, A., AND DONGARRA, J. J. 2001. Automated empirical optimizations of software and the ATLAS project. Parallel Computing 27, 1--2, 3--35.Google ScholarCross Ref
- WOO, M., NEIDER, J., DAVIS, T., SHREINER, D., AND OPENGL ARCHITECTURE REVIEW BOARD, 1999. OpenGL programming guide.Google Scholar
Index Terms
- Brook for GPUs: stream computing on graphics hardware
Recommendations
Brook for GPUs: stream computing on graphics hardware
SIGGRAPH '04: ACM SIGGRAPH 2004 PapersIn this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Brook for GPUs: Stream Computing on Graphics Hardware
Seminal Graphics Papers: Pushing the Boundaries, Volume 2In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a ...
Interactive k-d tree GPU raytracing
I3D '07: Proceedings of the 2007 symposium on Interactive 3D graphics and gamesOver the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing on GPUs. Our work extends Foley et al.'s GPU k-d tree research. We port their kd-restart algorithm from multi-pass, using ...
Comments