ABSTRACT
We investigate how graph partitioning adversely affects the performance of graph analytics. We demonstrate that graph partitioning induces extra work during graph traversal and that graph partitions have markedly different connectivity than the original graph. By consequence, increasing the number of partitions reaches a tipping point after which overheads quickly dominate performance gains. Moreover, we show that the heuristic to balance CPU load between graph partitions by balancing the number of edges is inappropriate for a range of graph analyses. However, even when it is appropriate, it is sub-optimal due to the skewed degree distribution of social networks. Based on these observations, we propose GraphGrind, a new graph analytics system that addresses the limitations incurred by graph partitioning. We moreover propose a NUMA-aware extension to the Cilk programming language and obtain a scale-free yet NUMA-aware parallel programming environment which underpins NUMA-aware scheduling in GraphGrind. We demonstrate that Graph-Grind outperforms state-of-the-art graph analytics systems for shared memory including Ligra, Polymer and Galois.
- V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. 2010. Scalable Graph Exploration on Multicore Processors. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis. 1--11. Google ScholarDigital Library
- S. Beamer, K. Asanović, and D. Patterson. 2012. Direction-optimizing Breadth-first Search. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis Article 12, 10 pages. Google ScholarDigital Library
- R. D. Blumofe and C. E. Leiserson. 1994. Scheduling multi-threaded computations by work stealing. In Proc. of the Annual Symp. on Foundations of Computer Science. 356--368. Google ScholarDigital Library
- B. Florian, L. Marc, and V. Milan. 2014. Balanced graph edge partition. In Proc. of the 20th SIGKDD Intl. Conf. on Knowledge discovery and data mining. 1456--1465. Google ScholarDigital Library
- M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware Graph Mining Techniques for Performance and Energy Efficiency. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis Article 95, 11 pages. Google ScholarDigital Library
- M. Frigo, P. Halpern, C. E Leiserson, and S. Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proc. of the Annual Symp. on Parallelism in algorithms and architectures. 79--90. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. 1999. Cache-Oblivious Algorithms. In Proc. of the Annual Symp. on Foundations of Computer Science 285--. Google ScholarDigital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proc. of the Conf. on Programming Language Design and Implementation 212--223. Google ScholarDigital Library
- J. E Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.. In Proc. of the Intl. Symp. on Operating System Design and Implementation, Vol. 12. 2. Google ScholarDigital Library
- J. E Gonzalez, R. S Xin, A. Dave, D. Crankshaw, M. J Franklin, and I. Stoica. 2014. Graphx: Graph processing in a distributed dataflow framework. In Proc. of the Intl. Symp. on Operating System Design and Implementation. 599--613. Google ScholarDigital Library
- S. Hong, T. Oguntebi, and K. Olukotun. 2011. Efficient parallel graph exploration on multi-core CPU and GPU. In Parallel Architectures and Compilation Techniques (PACT), 2011 Intl. Conf. on. 78--88. Google ScholarDigital Library
- Intel 2013. Intel Cilk Plus Language Extension Specification (version 1.2. 324396-003us ed.). Intel.Google Scholar
- G. Karypis and V. Kumar. 1998. Multilevel k-way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput. 48, 1 (1998), 96 -- 129. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. 2010. What is Twitter, a social network or a news media?. In Proc. of the international conference on World wide web. 591--600. Google ScholarDigital Library
- A. Kyrola, G. E Blelloch, and C. Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC.. In Proc. of the Intl. Symp. on Operating System Design and Implementation, Vol. 12. 31--46. Google ScholarDigital Library
- C. E. Leiserson, T. B. Schardl, and J. Sukha. 2012. Deterministic Parallel Random-number Generation for Dynamic-multithreading Platforms. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 193--204. Google ScholarDigital Library
- D. Margo and M. Seltzer. 2015. A Scalable Distributed Graph Partitioner. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1478--1489. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proc. of the Internet Measurement Conf. Google ScholarDigital Library
- D. Nguyen, A. Lenharth, and K. Pingali. 2013. A lightweight infrastructure for graph analytics. In Proc. of the Symp. on Operating Systems Principles. 456--471. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66. Stanford InfoLab.Google Scholar
- A. Roy, I. Mihailovic, and W. Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proc. of the Symp. on Operating Systems Principles. 472--488. Google ScholarDigital Library
- Y. Saad. 1990. SPARSKIT: A basic tool for sparse matrix computations. Technical Report NASA-CR-185876. NASA.Google Scholar
- J. Shun and G. E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 135--146. Google ScholarDigital Library
- J. Shun, L. Dhulipala, and G. E Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In Data Compression Conf. 403--412. Google ScholarDigital Library
- Y. Vigfusson. 2010. Affinity in distributed systems. Ph.D. Dissertation. Cornell University.Google Scholar
- J. Yang and J. Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233. Google ScholarDigital Library
- K. Yotov, Tom R., K. Pingali, J. Gunnels, and F. Gustavson. 2007. An experimental comparison of cache-oblivious and cache-conscious programs. In Proc. of the Annual Symp. on Parallel algorithms and architectures. 93--104. Google ScholarDigital Library
- K. Zhang, R. Chen, and H. Chen. 2015. NUMA-aware graph-structured analytics. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 183--193. Google ScholarDigital Library
Index Terms
- GraphGrind: addressing load imbalance of graph partitioning
Recommendations
Partitioning of a graph into induced subgraphs not containing prescribed cliques
AbstractLet K p be a complete graph of order p ≥ 2. A K p-free k-coloring of a graph H is a partition of V ( H ) into V 1 , V 2 … , V k such that H [ V i ] does not contain K p for each i ≤ k. In 1977 Borodin and Kostochka conjectured that any graph H ...
Partitioning a Graph into Complementary Subgraphs
WALCOM: Algorithms and ComputationAbstractIn the Partition Into Complementary Subgraphs (Comp-Sub) problem we are given a graph , and an edge set property , and asked whether G can be decomposed into two graphs, H and its complement , for some graph H, in such a way that the edge cut-set (...
Planar Graph Coloring with Forbidden Subgraphs: Why Trees and Paths Are Dangerous
SWAT '02: Proceedings of the 8th Scandinavian Workshop on Algorithm TheoryWe consider the problem of coloring a planar graph with the minimum number of colors such that each color class avoids one or more forbidden graphs as subgraphs. We perform a detailed study of the computational complexity of this problem.We present a ...
Comments