skip to main content
10.1145/3079079.3079097acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

GraphGrind: addressing load imbalance of graph partitioning

Published:14 June 2017Publication History

ABSTRACT

We investigate how graph partitioning adversely affects the performance of graph analytics. We demonstrate that graph partitioning induces extra work during graph traversal and that graph partitions have markedly different connectivity than the original graph. By consequence, increasing the number of partitions reaches a tipping point after which overheads quickly dominate performance gains. Moreover, we show that the heuristic to balance CPU load between graph partitions by balancing the number of edges is inappropriate for a range of graph analyses. However, even when it is appropriate, it is sub-optimal due to the skewed degree distribution of social networks. Based on these observations, we propose GraphGrind, a new graph analytics system that addresses the limitations incurred by graph partitioning. We moreover propose a NUMA-aware extension to the Cilk programming language and obtain a scale-free yet NUMA-aware parallel programming environment which underpins NUMA-aware scheduling in GraphGrind. We demonstrate that Graph-Grind outperforms state-of-the-art graph analytics systems for shared memory including Ligra, Polymer and Galois.

References

  1. V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. 2010. Scalable Graph Exploration on Multicore Processors. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Beamer, K. Asanović, and D. Patterson. 2012. Direction-optimizing Breadth-first Search. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis Article 12, 10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. D. Blumofe and C. E. Leiserson. 1994. Scheduling multi-threaded computations by work stealing. In Proc. of the Annual Symp. on Foundations of Computer Science. 356--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. Florian, L. Marc, and V. Milan. 2014. Balanced graph edge partition. In Proc. of the 20th SIGKDD Intl. Conf. on Knowledge discovery and data mining. 1456--1465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware Graph Mining Techniques for Performance and Energy Efficiency. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis Article 95, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Frigo, P. Halpern, C. E Leiserson, and S. Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proc. of the Annual Symp. on Parallelism in algorithms and architectures. 79--90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. 1999. Cache-Oblivious Algorithms. In Proc. of the Annual Symp. on Foundations of Computer Science 285--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proc. of the Conf. on Programming Language Design and Implementation 212--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. E Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.. In Proc. of the Intl. Symp. on Operating System Design and Implementation, Vol. 12. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. E Gonzalez, R. S Xin, A. Dave, D. Crankshaw, M. J Franklin, and I. Stoica. 2014. Graphx: Graph processing in a distributed dataflow framework. In Proc. of the Intl. Symp. on Operating System Design and Implementation. 599--613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Hong, T. Oguntebi, and K. Olukotun. 2011. Efficient parallel graph exploration on multi-core CPU and GPU. In Parallel Architectures and Compilation Techniques (PACT), 2011 Intl. Conf. on. 78--88. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Intel 2013. Intel Cilk Plus Language Extension Specification (version 1.2. 324396-003us ed.). Intel.Google ScholarGoogle Scholar
  13. G. Karypis and V. Kumar. 1998. Multilevel k-way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput. 48, 1 (1998), 96 -- 129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. Kwak, C. Lee, H. Park, and S. Moon. 2010. What is Twitter, a social network or a news media?. In Proc. of the international conference on World wide web. 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Kyrola, G. E Blelloch, and C. Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC.. In Proc. of the Intl. Symp. on Operating System Design and Implementation, Vol. 12. 31--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. E. Leiserson, T. B. Schardl, and J. Sukha. 2012. Deterministic Parallel Random-number Generation for Dynamic-multithreading Platforms. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 193--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Margo and M. Seltzer. 2015. A Scalable Distributed Graph Partitioner. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1478--1489. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proc. of the Internet Measurement Conf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Nguyen, A. Lenharth, and K. Pingali. 2013. A lightweight infrastructure for graph analytics. In Proc. of the Symp. on Operating Systems Principles. 456--471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66. Stanford InfoLab.Google ScholarGoogle Scholar
  21. A. Roy, I. Mihailovic, and W. Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proc. of the Symp. on Operating Systems Principles. 472--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Saad. 1990. SPARSKIT: A basic tool for sparse matrix computations. Technical Report NASA-CR-185876. NASA.Google ScholarGoogle Scholar
  23. J. Shun and G. E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 135--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Shun, L. Dhulipala, and G. E Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In Data Compression Conf. 403--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Y. Vigfusson. 2010. Affinity in distributed systems. Ph.D. Dissertation. Cornell University.Google ScholarGoogle Scholar
  26. J. Yang and J. Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Yotov, Tom R., K. Pingali, J. Gunnels, and F. Gustavson. 2007. An experimental comparison of cache-oblivious and cache-conscious programs. In Proc. of the Annual Symp. on Parallel algorithms and architectures. 93--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. Zhang, R. Chen, and H. Chen. 2015. NUMA-aware graph-structured analytics. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 183--193. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GraphGrind: addressing load imbalance of graph partitioning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          ICS '17: Proceedings of the International Conference on Supercomputing
          June 2017
          300 pages
          ISBN:9781450350204
          DOI:10.1145/3079079
          • General Chairs:
          • William D. Gropp,
          • Pete Beckman,
          • Program Chairs:
          • Zhiyuan Li,
          • Francisco J. Cazorla

          Copyright © 2017 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate584of2,055submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader