research-article

GraphGrind: addressing load imbalance of graph partitioning

Authors:
Jiawen Sun

The Queen's University of Belfast

The Queen's University of Belfast
View Profile

,
Hans Vandierendonck

The Queen's University of Belfast

The Queen's University of Belfast
View Profile

,
Dimitrios S. Nikolopoulos

The Queen's University of Belfast

The Queen's University of Belfast
View Profile

ICS '17: Proceedings of the International Conference on SupercomputingJune 2017Article No.: 16Pages 1–10https://doi.org/10.1145/3079079.3079097

Published:14 June 2017Publication History

ICS '17: Proceedings of the International Conference on Supercomputing

Pages 1–10

ABSTRACT

We investigate how graph partitioning adversely affects the performance of graph analytics. We demonstrate that graph partitioning induces extra work during graph traversal and that graph partitions have markedly different connectivity than the original graph. By consequence, increasing the number of partitions reaches a tipping point after which overheads quickly dominate performance gains. Moreover, we show that the heuristic to balance CPU load between graph partitions by balancing the number of edges is inappropriate for a range of graph analyses. However, even when it is appropriate, it is sub-optimal due to the skewed degree distribution of social networks. Based on these observations, we propose GraphGrind, a new graph analytics system that addresses the limitations incurred by graph partitioning. We moreover propose a NUMA-aware extension to the Cilk programming language and obtain a scale-free yet NUMA-aware parallel programming environment which underpins NUMA-aware scheduling in GraphGrind. We demonstrate that Graph-Grind outperforms state-of-the-art graph analytics systems for shared memory including Ligra, Polymer and Galois.

References

V. Agarwal, F. Petrini, D. Pasetto, and D. A. Bader. 2010. Scalable Graph Exploration on Multicore Processors. In Proc. of the Intl. Conf. for High Performance Computing, Networking, Storage and Analysis. 1--11. Google ScholarDigital Library
S. Beamer, K. Asanović, and D. Patterson. 2012. Direction-optimizing Breadth-first Search. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis Article 12, 10 pages. Google ScholarDigital Library
R. D. Blumofe and C. E. Leiserson. 1994. Scheduling multi-threaded computations by work stealing. In Proc. of the Annual Symp. on Foundations of Computer Science. 356--368. Google ScholarDigital Library
B. Florian, L. Marc, and V. Milan. 2014. Balanced graph edge partition. In Proc. of the 20th SIGKDD Intl. Conf. on Knowledge discovery and data mining. 1456--1465. Google ScholarDigital Library
M. Frasca, K. Madduri, and P. Raghavan. 2012. NUMA-aware Graph Mining Techniques for Performance and Energy Efficiency. In Proc. of the Intl. Conf. on High Performance Computing, Networking, Storage and Analysis Article 95, 11 pages. Google ScholarDigital Library
M. Frigo, P. Halpern, C. E Leiserson, and S. Lewin-Berlin. 2009. Reducers and other Cilk++ hyperobjects. In Proc. of the Annual Symp. on Parallelism in algorithms and architectures. 79--90. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. 1999. Cache-Oblivious Algorithms. In Proc. of the Annual Symp. on Foundations of Computer Science 285--. Google ScholarDigital Library
M. Frigo, C. E. Leiserson, and K. H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In Proc. of the Conf. on Programming Language Design and Implementation 212--223. Google ScholarDigital Library
J. E Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs.. In Proc. of the Intl. Symp. on Operating System Design and Implementation, Vol. 12. 2. Google ScholarDigital Library
J. E Gonzalez, R. S Xin, A. Dave, D. Crankshaw, M. J Franklin, and I. Stoica. 2014. Graphx: Graph processing in a distributed dataflow framework. In Proc. of the Intl. Symp. on Operating System Design and Implementation. 599--613. Google ScholarDigital Library
S. Hong, T. Oguntebi, and K. Olukotun. 2011. Efficient parallel graph exploration on multi-core CPU and GPU. In Parallel Architectures and Compilation Techniques (PACT), 2011 Intl. Conf. on. 78--88. Google ScholarDigital Library
Intel 2013. Intel Cilk Plus Language Extension Specification (version 1.2. 324396-003us ed.). Intel.Google Scholar
G. Karypis and V. Kumar. 1998. Multilevel k-way Partitioning Scheme for Irregular Graphs. J. Parallel and Distrib. Comput. 48, 1 (1998), 96 -- 129. Google ScholarDigital Library
H. Kwak, C. Lee, H. Park, and S. Moon. 2010. What is Twitter, a social network or a news media?. In Proc. of the international conference on World wide web. 591--600. Google ScholarDigital Library
A. Kyrola, G. E Blelloch, and C. Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC.. In Proc. of the Intl. Symp. on Operating System Design and Implementation, Vol. 12. 31--46. Google ScholarDigital Library
C. E. Leiserson, T. B. Schardl, and J. Sukha. 2012. Deterministic Parallel Random-number Generation for Dynamic-multithreading Platforms. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 193--204. Google ScholarDigital Library
D. Margo and M. Seltzer. 2015. A Scalable Distributed Graph Partitioner. Proc. VLDB Endow. 8, 12 (Aug. 2015), 1478--1489. Google ScholarDigital Library
A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proc. of the Internet Measurement Conf. Google ScholarDigital Library
D. Nguyen, A. Lenharth, and K. Pingali. 2013. A lightweight infrastructure for graph analytics. In Proc. of the Symp. on Operating Systems Principles. 456--471. Google ScholarDigital Library
L. Page, S. Brin, R. Motwani, and T. Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999--66. Stanford InfoLab.Google Scholar
A. Roy, I. Mihailovic, and W. Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proc. of the Symp. on Operating Systems Principles. 472--488. Google ScholarDigital Library
Y. Saad. 1990. SPARSKIT: A basic tool for sparse matrix computations. Technical Report NASA-CR-185876. NASA.Google Scholar
J. Shun and G. E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 135--146. Google ScholarDigital Library
J. Shun, L. Dhulipala, and G. E Blelloch. 2015. Smaller and faster: Parallel processing of compressed graphs with Ligra+. In Data Compression Conf. 403--412. Google ScholarDigital Library
Y. Vigfusson. 2010. Affinity in distributed systems. Ph.D. Dissertation. Cornell University.Google Scholar
J. Yang and J. Leskovec. 2012. Defining and Evaluating Network Communities based on Ground-truth. CoRR abs/1205.6233. Google ScholarDigital Library
K. Yotov, Tom R., K. Pingali, J. Gunnels, and F. Gustavson. 2007. An experimental comparison of cache-oblivious and cache-conscious programs. In Proc. of the Annual Symp. on Parallel algorithms and architectures. 93--104. Google ScholarDigital Library
K. Zhang, R. Chen, and H. Chen. 2015. NUMA-aware graph-structured analytics. In Proc. of the Symp. on Principles and Practice of Parallel Programming. 183--193. Google ScholarDigital Library

Index Terms

GraphGrind: addressing load imbalance of graph partitioning
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures
2. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. Shared memory algorithms
    2. Parallel programming languages

Recommendations

Partitioning of a graph into induced subgraphs not containing prescribed cliques
Abstract
Let K p be a complete graph of order p ≥ 2. A K p-free k-coloring of a graph H is a partition of V ( H ) into V 1 , V 2 … , V k such that H [ V i ] does not contain K p for each i ≤ k. In 1977 Borodin and Kostochka conjectured that any graph H ...
Read More
Partitioning a Graph into Complementary Subgraphs
WALCOM: Algorithms and Computation
Abstract
In the Partition Into Complementary Subgraphs (Comp-Sub) problem we are given a graph , and an edge set property , and asked whether G can be decomposed into two graphs, H and its complement , for some graph H, in such a way that the edge cut-set (...
Read More
Planar Graph Coloring with Forbidden Subgraphs: Why Trees and Paths Are Dangerous
SWAT '02: Proceedings of the 8th Scandinavian Workshop on Algorithm Theory

We consider the problem of coloring a planar graph with the minimum number of colors such that each color class avoids one or more forbidden graphs as subgraphs. We perform a detailed study of the computational complexity of this problem.We present a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICS '17: Proceedings of the International Conference on Supercomputing
June 2017
300 pages
ISBN:9781450350204
DOI:10.1145/3079079
General Chairs:
William D. Gropp
University of Illinois at Urbana-Champaign, Illinois
,
Pete Beckman
Argonne National Laboratory/Northwestern University, Illinois
,
Program Chairs:
Zhiyuan Li
Purdue University, West Lafayette, Indiana
,
Francisco J. Cazorla
IIIA-CSIC and Barcelona Supercomputing Center, Barcelona, Spain
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
graph analytics
graph partitioning
non-uniform memory access (NUMA)
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate584of2,055submissions,28%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 51
  Total Citations
  View Citations
- 697
  Total Downloads
- Downloads (Last 12 months)43
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GraphGrind: addressing load imbalance of graph partitioning

ICS '17: Proceedings of the International Conference on Supercomputing

ABSTRACT

References

Cited By

Index Terms

Recommendations

Partitioning of a graph into induced subgraphs not containing prescribed cliques

Partitioning a Graph into Complementary Subgraphs

Planar Graph Coloring with Forbidden Subgraphs: Why Trees and Paths Are Dangerous