skip to main content
10.1145/2155620.2155676acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

SIMD re-convergence at thread frontiers

Published: 03 December 2011 Publication History

Abstract

Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA, OpenCL, and DirectX Compute. The impact of branch divergence can be quite different depending upon whether the program's control flow is structured or unstructured. In this paper, we show that unstructured control flow occurs frequently in applications and can lead to significant code expansion when executed using existing approaches for handling branch divergence.
This paper proposes a new technique for automatically mapping arbitrary control flow onto SIMD processors that relies on a concept of a Thread Frontier, which is a bounded region of the program containing all threads that have branched away from the current warp. This technique is evaluated on a GPU emulator configured to model i) a commodity GPU (Intel Sandybridge), and ii) custom hardware support not realized in current GPU architectures. It is shown that this new technique performs identically to the best existing method for structured control flow, and re-converges at the earliest possible point when executing unstructured control flow. This leads to i) between 1.5 - 633.2% reductions in dynamic instruction counts for several real applications, ii) simplification of the compilation process, and iii) ability to efficiently add high level unstructured programming constructs (e.g., exceptions) to existing data-parallel languages.

References

[1]
T. Hatazaki, "Tsubame-2 - a 2.4 pflops peak performance system," in Optical Fiber Communication Conference. Optical Society of America, 2011.
[2]
K. Spafford, J. S. Meredith, and J. S. Vetter, "Quantifying numa and contention effects in multi-gpu systems," in Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units, ser. GPGPU-4. New York, NY, USA: ACM, 2011, pp. 11:1--11:7.
[3]
L. G. Valiant, "A bridging model for parallel computation," Commun. ACM, 1990.
[4]
H. Wu, G. Diamos, S. Li, and S. Yalamanchili, "Characterization and transformation of unstructured control flow in gpu applications," in The First International Workshop on Characterizing Applications for Heterogeneous Exascale Systems. ACM, June 2011.
[5]
A. Levinthal and T. Porter, "Chap - a simd graphics processor," SIGGRAPH Comput. Graph., vol. 18, no. 3, pp. 77--82, 1984.
[6]
W. W. L. Fung, I. Sham, G. Yuan, and T. M. Aamodt, "Dynamic warp formation and scheduling for efficient gpu control flow," in Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, 2007.
[7]
Intel, Intel HD Graphics OpenSource Programmer Reference Manual, June 2010.
[8]
H. Wong, M.-M. Papadopoulou, M. Sadooghi-Alvandi, and A. Moshovos, "Demystifying gpu microarchitecture through microbenchmarking," in Performance Analysis of Systems Software (ISPASS), 2010 IEEE International Symposium on, 2010, pp. 235--246.
[9]
Wen-mei W. Hwu, "Real-time adaptive gpu multi-agent path planning," in GPU Computing Gems, Wen-mei W. Hwu, Ed. Morgan Kaufmann, Sep. 2010, vol. 2.
[10]
C. Trapnell and M. C. Schatz, "Optimizing data intensive gpgpu computations for dna sequence alignment," Parallel Computing, vol. 35, pp. 429--440, August 2009.
[11]
A. Badal and A. Badano, "Accelerating monte carlo simulations of photon transport in a voxelized geometry using a massively parallel gpu," Medical Physics 36, 2009.
[12]
V. Pham, P. Vo, H. V. Thanh, and B. L. Hoai, "Gpu implementation of extended gaussian mixture model for background subtraction," International Conference on Computing and Telecommunication Technologies, 2010.
[13]
Q. Fang and D. A. Boas, "Monte carlo simulation of photon migration in 3d turbid media accelerated by graphics processing units," Optical Express 17, vol. 17, pp. 20 178--20 190.
[14]
T. Tsiodras, "A real-time raytracer of triangle meshes in cuda," Tech. Rep., February 2011. {Online}. Available: http://users.softlab.ntua.gr/~ttsiod/cudarenderer-BVH.html
[15]
S. G. Parker, J. Bigler, A. Dietrich, H. Friedrich, J. Hoberock, D. Luebke, D. McAllister, M. McGuire, K. Morley, A. Robison, and M. Stich, "Optix: a general purpose ray tracing engine," ACM Transactions on Graphics, vol. 29, pp. 66:1--66:13, July 2010.
[16]
G. Diamos, A. Kerr, S. Yalamanchili, and N. Clark, "Ocelot: A dynamic compiler for bulk-synchronous applications in heterogeneous systems," in Proceedings of PACT '10, 2010.
[17]
A. Kerr, G. Diamos, and S. Yalamanchili, "A characterization and analysis of ptx kernels," in IISWC09: IEEE International Symposium on Workload Characterization, Austin, TX, USA, October 2009.
[18]
W. Bouknight, S. Denenberg, D. McIntyre, J. Randall, A. Sameh, and D. Slotnick, "The illiac iv system," Proceedings of the IEEE, vol. 60, no. 4, pp. 369--388, apr. 1972.
[19]
AMD, Evergreen Family Instruction Set Architecture Instructions and Microcode, 2010.
[20]
F. Zhang and E. H. D'Hollander, "Using hammock graphs to structure programs," IEEE Trans. Softw. Eng., pp. 231--245, 2004.
[21]
J. Meng, D. Tarjan, and K. Skadron, "Dynamic warp subdivision for integrated branch and memory divergence tolerance," in Proceedings of the 37th annual international symposium on Computer architecture, ser. ISCA '10. New York, NY, USA: ACM, 2010, pp. 235--246.
[22]
W. Fung and T. Aamodt, "Thread block compaction for efficient simt control flow," in 17th International Symposium on High Performance Computer Architecture, feb. 2011, pp. 25--36.
[23]
B. W. Coon and E. J. Lindholm, "System and method for managing divergent threads in a simd architecture," Patent US 7 353 369, April, 2008.

Cited By

View all
  • (2023)Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous HardwareProceedings of the VLDB Endowment10.14778/3565838.356584215:13(3869-3882)Online publication date: 20-Jan-2023
  • (2023)A Relational Program Logic with Data Abstraction and Dynamic FramingACM Transactions on Programming Languages and Systems10.1145/355149744:4(1-136)Online publication date: 10-Jan-2023
  • (2023)Scalable Color Quantization for Task-centric Image CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355138919:2s(1-18)Online publication date: 17-Feb-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
December 2011
519 pages
ISBN:9781450310536
DOI:10.1145/2155620
  • Conference Chair:
  • Carlo Galuzzi,
  • General Chair:
  • Luigi Carro,
  • Program Chairs:
  • Andreas Moshovos,
  • Milos Prvulovic
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

MICRO-44
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Enabling Transparent Acceleration of Big Data Frameworks Using Heterogeneous HardwareProceedings of the VLDB Endowment10.14778/3565838.356584215:13(3869-3882)Online publication date: 20-Jan-2023
  • (2023)A Relational Program Logic with Data Abstraction and Dynamic FramingACM Transactions on Programming Languages and Systems10.1145/355149744:4(1-136)Online publication date: 10-Jan-2023
  • (2023)Scalable Color Quantization for Task-centric Image CompressionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355138919:2s(1-18)Online publication date: 17-Feb-2023
  • (2023)Making Sense of the Unknown: How Managers Make Cyber Security DecisionsACM Transactions on Software Engineering and Methodology10.1145/354868232:4(1-33)Online publication date: 27-May-2023
  • (2022)Novice Use of the Java Programming LanguageACM Transactions on Computing Education10.1145/355139323:1(1-24)Online publication date: 29-Dec-2022
  • (2022)Using User-Generated YouTube Videos to Understand Unguided Interactions with Robots in Public PlacesACM Transactions on Human-Robot Interaction10.1145/355028012:1(1-40)Online publication date: 1-Aug-2022
  • (2022)Repurposing GPU Microarchitectures with Light-Weight Out-Of-Order ExecutionIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309323133:2(388-402)Online publication date: 1-Feb-2022
  • (2020)Speculative reconvergence for improved SIMT efficiencyProceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization10.1145/3368826.3377911(121-132)Online publication date: 22-Feb-2020
  • (2019)Distributed multi-agent optimization for smart grids and home automationIntelligenza Artificiale10.3233/IA-18003712:2(67-87)Online publication date: 29-Jan-2019
  • (2019)Crowdsourced Targeted Feedback Collection for Multicriteria Data Source SelectionJournal of Data and Information Quality10.1145/328493411:1(1-27)Online publication date: 4-Jan-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media