skip to main content
research-article
Public Access
Distinguished Paper

Gunrock: a high-performance graph processing library on the GPU

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

For large-scale graph analytics on the GPU, the irregularity of data access/control flow and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock," our high-level bulk-synchronous graph-processing system targeting the GPU, takes a new approach to abstracting GPU graph analytics: rather than designing an abstraction around computation, Gunrock instead implements a novel data-centric abstraction centered on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five graph primitives (BFS, BC, SSSP, CC, and PageRank) and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.

Skip Supplemental Material Section

Supplemental Material

References

  1. S. Beamer, K. Asanović, and D. Patterson. Direction-optimizing breadth-first search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 12:1--12:10, Nov. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. U. Brandes. A faster algorithm for betweenness centrality. Journal of Mathematical Sociology, 25(2):163--177, 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. Burtscher, R. Nasre, and K. Pingali. A quantitative study of irregular programs on GPUs. In IEEE International Symposium on Workload Characterization, IISWC-2012, pages 141--151, Nov. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Cederman and P. Tsigas. On dynamic load-balancing on graphics processors. In Graphics Hardware 2008, pages 57--64, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Davidson, S. Baxter, M. Garland, and J. D. Owens. Work-efficient parallel GPU methods for single source shortest paths. In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium, pages 349--359, May 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Delling, A. V. Goldberg, A. Nowatzyk, and R. F. Werneck. PHAST: Hardware-accelerated shortest path trees. Journal of Parallel and Distributed Computing, 73:940--952, Sept. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. E. Elsen and V. Vaidyanathan. A vertex-centric CUDA/C++ API for large graph analytics on GPUs using the gather-apply-scatter abstraction, 2013. http://www.github.com/RoyalCaliber/vertexAPI2.Google ScholarGoogle Scholar
  8. Z. Fu, M. Personick, and B. Thompson. MapGraph: A high level API for fast development of high performance graph analytics on GPUs. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 2:1--2:6, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Geil, Y. Wang, and J. D. Owens. WTF, GPU! Computing Twitter's who-to-follow on the GPU. In Proceedings of the Second ACM Conference on Online Social Networks, COSN '14, Oct. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Geisberger, P. Sanders, and D. Schultes. Better approximation of betweenness centrality. In Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments, ALENEX08, pages 90--100, Jan. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Goel. The "who-to-follow" system at Twitter: Algorithms, impact, and further research. WWW 2014 industry track, 2014.Google ScholarGoogle Scholar
  12. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI '12, pages 17--30. USENIX Association, Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Gregor and A. Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing (POOSC), July 2005.Google ScholarGoogle Scholar
  14. J. Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA '94, pages 16--25, June 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for easy and efficient graph analysis. In Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 349--362, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Jia, V. Lu, J. Hoberock, M. Garland, and J. C. Hart. Edge v. node parallelism for graph centrality metrics. In W. W. Hwu, editor, GPU Computing Gems Jade Edition, chapter 2, pages 15--28. Morgan Kaufmann, Oct. 2011.Google ScholarGoogle Scholar
  17. S. W. Keckler, W. J. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. IEEE Micro, 31(5):7--17, Sept. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. F. Khorasani, K. Vora, R. Gupta, and L. N. Bhuyan. CuSha: Vertex-centric graph processing on GPUs. In Proceedings of the 23rd International Symposium on High-performance Parallel and Distributed Computing, HPDC '14, pages 239--252, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the Twenty-Sixth Annual Conference on Uncertainty in Artificial Intelligence, UAI-10, pages 340--349, July 2010.Google ScholarGoogle Scholar
  20. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD '10, pages 135--146, June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. C. McColl, D. Ediger, J. Poovey, D. Campbell, and D. A. Bader. A performance evaluation of open source graph databases. In Proceedings of the First Workshop on Parallel Programming for Analytics Applications, PPAA '14, pages 11--18, Feb. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. McLaughlin and D. A. Bader. Scalable and high performance betweenness centrality on the GPU. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 572--583, Nov. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. McLaughlin, J. Riedy, and D. A. Bader. A fast, energy-efficient abstraction for simultaneous breadth-first searches. In 2015 IEEE High Performance Extreme Computing Conference, HPEC '15, Sept. 2015.Google ScholarGoogle ScholarCross RefCross Ref
  24. D. Merrill, M. Garland, and A. Grimshaw. Scalable GPU graph traversal. In Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '12, pages 117--128, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. U. Meyer and P. Sanders. Δ-stepping: a parallelizable shortest path algorithm. Journal of Algorithms, 49(1):114--152, Oct. 2003. 1998 European Symposium on Algorithms. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In Proceedings of ACM Symposium on Operating Systems Principles, SOSP '13, pages 456--471, Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. R. Pande and D. A. Bader. Computing betweenness centrality for small world networks on a GPU. In HPEC, 2011.Google ScholarGoogle Scholar
  28. K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T.-H. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 12--25, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Salihoglu and J. Widom. HelP: High-level primitives for large-scale graph processing. In Proceedings of the Workshop on GRAph Data Management Experiences and Systems, GRADES '14, pages 3:1--3:6, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Sallinen, A. Gharaibeh, and M. Ripeanu. Accelerating direction-optimized breadth first search on hybrid architectures. CoRR, abs/1503.04359(1503.04359v1), Mar. 2015.Google ScholarGoogle Scholar
  31. A. E. Sariyüce, K. Kaya, E. Saule, and U. V. Çatalyürek. Betweenness centrality on GPUs and heterogeneous architectures. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pages 76--85, Mar. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '13, pages 135--146, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. G. Siek, L.-Q. Lee, and A. Lumsdaine. The Boost Graph Library: User Guide and Reference Manual. Addison-Wesley, Dec. 2001.Google ScholarGoogle Scholar
  34. J. Soman, K. Kishore, and P. J. Narayanan. A fast GPU algorithm for graph connectivity. In 24th IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, IPDPSW 2010, pages 1--8, Apr. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  35. S. Tzeng, B. Lloyd, and J. D. Owens. A GPU task-parallel model with dependency resolution. IEEE Computer, 45(8):34--41, Aug. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Y. Wu, Y. Wang, Y. Pan, C. Yang, and J. D. Owens. Performance characterization for high-level programming models for GPU graph analytics. In IEEE International Symposium on Workload Characterization, IISWC-2015, pages 66--75, Oct. 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. J. Zhong and B. He. Medusa: Simplified graph processing on GPUs. IEEE Transactions on Parallel and Distributed Systems, 25(6):1543--1552, June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 8
    PPoPP '16
    August 2016
    405 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3016078
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2016
      420 pages
      ISBN:9781450340922
      DOI:10.1145/2851141

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 February 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader