Gunrock: a high-performance graph processing library on the GPU

Authors:
Yangzihao Wang

University of California, Davis

University of California, Davis
View Profile

,
Andrew Davidson

University of California, Davis

University of California, Davis
View Profile

,
Yuechao Pan

University of California, Davis

University of California, Davis
View Profile

,
Yuduo Wu

University of California, Davis

University of California, Davis
View Profile

,
Andy Riffel

University of California, Davis

University of California, Davis
View Profile

,
John D. Owens

University of California, Davis

University of California, Davis
View Profile

Authors Info & Claims

ACM SIGPLAN Notices Volume 51 Issue 8August 2016Article No.: 11pp 1–12https://doi.org/10.1145/3016078.2851145

Published:27 February 2016Publication History

ACM SIGPLAN Notices

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a novel data-centric abstraction centered on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five graph primitives (BFS, BC, SSSP, CC, and PageRank) and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

Supplemental Material

Available for Download

zip

Supplemental material. (107.6 KB)

References

S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 12:1--12:10, Nov. 2012. Google ScholarDigital Library
U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163--177, 2001.Google ScholarCross Ref
M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IEEE International Symposium on Workload Characterization, IISWC-2012, pages 141--151, Nov. 2012. Google ScholarDigital Library
D. Cederman and P. Tsigas. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008, pages 57--64, June 2008. Google ScholarDigital Library
A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, pages 349--359, May 2014. Google ScholarDigital Library
D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F. Werneck. PHAST: Hardware-accelerated shortest path trees. Journal of Parallel and Distributed Computing, 73:940--952, Sept. 2010.Google ScholarCross Ref
E. Elsen and V. Vaidyanathan. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the gather-apply-scatter abstraction, 2013. http://www.github.com/RoyalCaliber/vertexAPI2.Google Scholar
Z. Fu, M. Personick, and B. Thompson. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 2:1--2:6, June 2014. Google ScholarDigital Library
A. Geil, Y. Wang, and J. D. Owens. WTF, GPU! Computing Twitter's who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, Oct. 2014. Google ScholarDigital Library
R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments, ALENEX08, pages 90--100, Jan. 2008. Google ScholarDigital Library
A. Goel. The "who-to-follow" system at Twitter: Algorithms, impact, and further research. WWW 2014 industry track, 2014.Google Scholar
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI '12, pages 17--30. USENIX Association, Oct. 2012. Google ScholarDigital Library
D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing (POOSC), July 2005.Google Scholar
J. Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '94, pages 16--25, June 1994. Google ScholarDigital Library
S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, Mar. 2012. Google ScholarDigital Library
Y. Jia, V. Lu, J. Hoberock, M. Garland, and J. C. Hart. Edge v. node parallelism for graph centrality metrics. In W. W. Hwu, editor, GPU Computing Gems Jade Edition, chapter 2, pages 15--28. Morgan Kaufmann, Oct. 2011.Google Scholar
S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. IEEE Micro, 31(5):7--17, Sept. 2011. Google ScholarDigital Library
F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 239--252, June 2014. Google ScholarDigital Library
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pages 340--349, July 2010.Google Scholar
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, June 2010. Google ScholarDigital Library
R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 11--18, Feb. 2014. Google ScholarDigital Library
A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 572--583, Nov. 2014. Google ScholarDigital Library
A. McLaughlin, J. Riedy, and D. A. Bader. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In 2015 IEEE High Performance Extreme Computing Conference, HPEC '15, Sept. 2015.Google ScholarCross Ref
D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, Feb. 2012. Google ScholarDigital Library
U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, Oct. 2003. 1998 European Symposium on Algorithms. Google ScholarDigital Library
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, Nov. 2013. Google ScholarDigital Library
P. R. Pande and D. A. Bader. Computing betweenness centrality for small world networks on a GPU. In HPEC, 2011.Google Scholar
K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 12--25, June 2011. Google ScholarDigital Library
S. Salihoglu and J. Widom. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 3:1--3:6, June 2014. Google ScholarDigital Library
S. Sallinen, A. Gharaibeh, and M. Ripeanu. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR, abs/1503.04359(1503.04359v1), Mar. 2015.Google Scholar
A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, Mar. 2013. Google ScholarDigital Library
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 135--146, Feb. 2013. Google ScholarDigital Library
J. G. Siek, L.-Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, Dec. 2001.Google Scholar
J. Soman, K. Kishore, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, IPDPSW 2010, pages 1--8, Apr. 2010.Google ScholarCross Ref
S. Tzeng, B. Lloyd, and J. D. Owens. A GPU task-parallel model with dependency resolution. IEEE Computer, 45(8):34--41, Aug. 2012. Google ScholarDigital Library
Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization for high-level programming models for GPU graph analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66--75, Oct. 2015. Google ScholarDigital Library
J. Zhong and B. He. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, 25(6):1543--1552, June 2014. Google ScholarDigital Library

Recommendations

Gunrock: a high-performance graph processing library on the GPU
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level ...
Read More
Gunrock: GPU Graph Analytics
Special Issue: Invited papers from PPoPP 2016, Part 1

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. “Gunrock,” our ...
Read More
Gunrock: a high-performance graph processing library on the GPU
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

For large-scale graph analytics on the GPU, the irregularity of data access and control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGPLAN Notices Volume 51, Issue 8
PPoPP '16
August 2016
405 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3016078
Editor:
Matthew Fluet
Issue’s Table of Contents
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2016
420 pages
ISBN:9781450340922
DOI:10.1145/2851141
General Chair:
Rafael Asenjo
University of Málaga, Spain
,
Program Chair:
Tim Harris
Oracle Labs, Cambridge, UK
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 February 2016
Check for updates
Badges
- Distinguished Paper
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 307
  Total Citations
  View Citations
- 5,220
  Total Downloads
- Downloads (Last 12 months)642
- Downloads (Last 6 weeks)80
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gunrock: a high-performance graph processing library on the GPU

ACM SIGPLAN Notices

Abstract

Supplemental Material

Available for Download

References

Cited By

Recommendations

Gunrock: a high-performance graph processing library on the GPU

Gunrock: GPU Graph Analytics

Gunrock: a high-performance graph processing library on the GPU