research-article

Understanding the efficiency of ray traversal on GPUs

Authors:
Timo Aila

NVIDIA Research

NVIDIA Research
View Profile

,
Samuli Laine

NVIDIA Research

NVIDIA Research
View Profile

HPG '09: Proceedings of the Conference on High Performance Graphics 2009August 2009Pages 145–149https://doi.org/10.1145/1572769.1572792

Published:01 August 2009Publication History

HPG '09: Proceedings of the Conference on High Performance Graphics 2009

Pages 145–149

ABSTRACT

We discuss the mapping of elementary ray tracing operations---acceleration structure traversal and primitive intersection---onto wide SIMD/SIMT machines. Our focus is on NVIDIA GPUs, but some of the observations should be valid for other wide machines as well. While several fast GPU tracing methods have been published, very little is actually understood about their performance. Nobody knows whether the methods are anywhere near the theoretically obtainable limits, and if not, what might be causing the discrepancy. We study this question by comparing the measurements against a simulator that tells the upper bound of performance for a given kernel. We observe that previously known methods are a factor of 1.5--2.5X off from theoretical optimum, and most of the gap is not explained by memory bandwidth, but rather by previously unidentified inefficiencies in hardware work distribution. We then propose a simple solution that significantly narrows the gap between simulation and measurement. This results in the fastest GPU ray tracer to date. We provide results for primary, ambient occlusion and diffuse interreflection rays.

References

Blelloch, G. 1990. Prefix sums and their applications. In Synthesis of Parallel Algorithms, Morgan Kaufmann, J. H. Reif, Ed.Google Scholar
Ernst, M., and Greiner, G. 2007. Early split clipping for bounding volume hierarchies. In Proc. IEEE/Eurographics Symposium of Interactive Ray Tracing 2007, 73--78. Google ScholarDigital Library
G&#252;nther, J., Popov, S., Seidel, H.-P., and Slusallek, P. 2007. Realtime ray tracing on GPU with BVH-based packet traversal. In Proc. IEEE/Eurographics Symposium on Interactive Ray Tracing 2007, 113--118. Google ScholarDigital Library
Lindholm, E., Nickolls, J., Oberman, S., and Montrym, J. 2008. Nvidia tesla: A unified graphics and computing architecture. IEEE Micro 28, 2, 39--55. Google ScholarDigital Library
NVIDIA. 2008. NVIDIA CUDA Programming Guide Version 2.1.Google Scholar
Reshetov, A., Soupikov, A., and Hurley, J. 2005. Multi-level ray tracing algorithm. ACM Trans. Graph. 24, 3, 1176--1185. Google ScholarDigital Library
W&#228;chter, C., and Keller, A. 2006. Instant ray tracing: The bounding interval hierarchy. In Proc. Eurographics Symposium on Rendering 2006, 139--149. Google ScholarDigital Library
Wald, I., Benthin, C., and Wagner, M. 2001. Interactive rendering with coherent ray tracing. Computer Graphics Forum 20, 3, 153--164.Google ScholarDigital Library
Wald, I., Boulos, S., and Shirley, P. 2007. Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies. ACM Trans. Graph. 26, 1. Google ScholarDigital Library
Wald, I., Benthin, C., and Boulos, S. 2008. Getting rid of packets: Efficient SIMD single-ray traversal using multibranching bvhs. In Proc. IEEE/Eurographics Symposium on Interactive Ray Tracing 2008.Google Scholar
Woop, S. 2004. A Ray Tracing Hardware Architecture for Dynamic Scenes. Tech. rep., Saarland University.Google Scholar
Zhou, K., Hou, Q., Wang, R., and Guo, B. 2008. Real-time KD-tree construction on graphics hardware. ACM Trans. Graph. 27, 5, 1--11. Google ScholarDigital Library

Index Terms

Understanding the efficiency of ray traversal on GPUs
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        3D imaging
  2. Computer graphics

Recommendations

CPU-style SIMD ray traversal on GPUs
HPG '18: Proceedings of the Conference on High-Performance Graphics

In this paper we describe and evaluate an implementation of CPU-style SIMD ray traversal on the GPU. We show how spreading moderately wide BVHs (up to a branching factor of eight) across multiple threads in a warp can improve performance while not ...
Read More
A feasibility study of ray tracing on mobile GPUs
SA '14: SIGGRAPH Asia 2014 Mobile Graphics and Interactive Applications

Ray tracing is considered to be a promising technology for enhancing visual experience of future graphics applications. This work investigates the feasibility of ray tracing on mobile GPUs. A ray tracer was developed by integrating state-of-the-art ...
Read More
NVIDIA Tesla: A Unified Graphics and Computing Architecture

To enable flexible, programmable graphics and high-performance computing, NVIDIA has developed the Tesla scalable unified graphics and parallel computing architecture. Its scalable parallel array of processors is massively multithreaded and programmable ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
HPG '09: Proceedings of the Conference on High Performance Graphics 2009
August 2009
185 pages
ISBN:9781605586038
DOI:10.1145/1572769
Editors:
Stephen N. Spencer
University of Washington
,
David McAllister
NVIDIA
,
Matt Pharr
Intel
,
Ingo Wald
Intel
,
General Chairs:
David Luebke
NVIDIA
,
Philipp Slusallek
DFKI & Saarland University
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 August 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
SIMD
SIMT
ray tracing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate15of44submissions,34%
Upcoming Conference
HPG '24

Sponsor:

siggraph

High-Performance Graphics

July 26 - 28, 2024

Denver , CO , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 320
  Total Citations
  View Citations
- 1,881
  Total Downloads
- Downloads (Last 12 months)88
- Downloads (Last 6 weeks)13
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Understanding the efficiency of ray traversal on GPUs

HPG '09: Proceedings of the Conference on High Performance Graphics 2009

ABSTRACT

References

Cited By

Index Terms

Recommendations

CPU-style SIMD ray traversal on GPUs

A feasibility study of ray tracing on mobile GPUs

NVIDIA Tesla: A Unified Graphics and Computing Architecture