research-article

Public Access

Architectural implications on the performance and cost of graph analytics systems

Authors:

Purushotham BangaloreAuthors Info & Claims

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

Pages 40 - 51

https://doi.org/10.1145/3127479.3128606

Published: 24 September 2017 Publication History

Abstract

Graph analytics systems have gained significant popularity due to the prevalence of graph data. Many of these systems are designed to run in a shared-nothing architecture whereby a cluster of machines can process a large graph in parallel. In more recent proposals, others have argued that a single-machine system can achieve better performance and/or is more cost-effective. There is however no clear consensus which approach is better. In this paper, we classify existing graph analytics systems into four categories based on the architectural differences, i.e., processing infrastructure (centralized vs distributed), and memory consumption (in-memory vs out-of-core). We select eight open-source systems to cover all categories, and perform a comparative measurement study to compare their performance and cost characteristics across a spectrum of input data, applications, and hardware settings. Our results show that the best performing configuration can depend on the type of applications and input graphs, and there is no dominant winner across all categories. Based on our findings, we summarize the trends in performance and cost, and provide several insights that help to illuminate the performance and resource cost tradeoffs across different graph analytics systems and categories.

References

[1]

Microsoft Azure: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/#Windows.

[2]

C. R. Aberger, S. Tu, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 431--446, 2016.

Digital Library

[3]

Apache Giraph. http://giraph.apache.org/.

[4]

V. R. Borkar, M. J. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE 2011, April 11--16, 2011, Hannover, Germany, pages 1151--1162, 2011.

Digital Library

[5]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1--7):107--117, 1998.

[6]

Y. Bu, V. R. Borkar, J. Jia, M. J. Carey, and T. Condie. Pregelix: Big(ger) graph analytics on a dataflow engine. PVLDB, 8(2):161--172, 2014.

Digital Library

[7]

M. Capota, T. Hegeman, A. Iosup, A. Prat-Pérez, O. Erling, and P. A. Boncz. Graphalytics: A big data benchmark for graph-processing platforms. In Proceedings of the Third International Workshop on Graph Data Management Experiences and Systems, GRADES 2015, Melbourne, VIC, Australia, May 31 - June 4, 2015, pages 7:1--7:6, 2015.

Digital Library

[8]

R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: differentiated graph computation and partitioning on skewed graphs. In EuroSys, pages 1:1--1:15, 2015.

Digital Library

[9]

J. Cheng, Q. Liu, Z. Li, W. Fan, J. C. S. Lui, and C. He. VENUS: vertex-centric streamlined graph computation on a single PC. In ICDE, pages 1131--1142, 2015.

[10]

B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In Proceedings of the 2013 IEEE International Conference on Big Data, 6--9 October 2013, Santa Clara, CA, USA, pages 60--67, 2013.

[11]

Y. Gao, W. Zhou, J. Han, D. Meng, Z. Zhang, and Z. Xu. An evaluation and analysis of graph processing frameworks on five key issues. In Proceedings of the 12th ACM International Conference on Computing Frontiers, CF'15, Ischia, Italy, May 18--21, 2015, pages 11:1--11:8, 2015.

Digital Library

[12]

J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 17--30, Hollywood, CA, 2012. USENIX.

Digital Library

[13]

J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.

Digital Library

[14]

Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How well do graph-processing platforms perform? an empirical performance evaluation and analysis. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19--23, 2014, pages 395--404, 2014.

Digital Library

[15]

M. Han and K. Daudjee. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB, 8(9):950--961, 2015.

Digital Library

[16]

M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. PVLDB, 7(12):1047--1058, 2014.

Digital Library

[17]

Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, pages 169--182, 2013.

Digital Library

[18]

A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a PC. In OSDI 2012, Hollywood, CA, USA, October 8--10, 2012, pages 31--46, 2012.

Digital Library

[19]

J. Li, J. Cheng, Y. Zhao, F. Yang, Y. Huang, H. Chen, and R. Zhao. A comparison of general-purpose distributed systems for data processing. In 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5--8, 2016, pages 378--383, 2016.

[20]

Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-scale distributed graph computing systems: An experimental evaluation. PVLDB, 8(3):281--292, 2014.

Digital Library

[21]

G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6--10, 2010, pages 135--146, 2010.

Digital Library

[22]

F. McSherry, M. Isard, and D. G. Murray. Scalability! but at what cost? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, Switzerland, May 2015. USENIX Association.

Digital Library

[23]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455, 2013.

Digital Library

[24]

D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3--6, 2013, pages 456--471, 2013.

Digital Library

[25]

Y. Perez, R. Sosic, A. Banerjee, R. Puttagunta, M. Raison, P. Shah, and J. Leskovec. Ringo: Interactive graph analytics on big-memory machines. In SIGMOD, pages 1105--1110, 2015.

Digital Library

[26]

K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4--8, 2011, pages 12--25, 2011.

Digital Library

[27]

A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4--7, 2015, pages 410--424, 2015.

Digital Library

[28]

A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3--6, 2013, pages 472--488, 2013.

Digital Library

[29]

S. Salihoglu and J. Widom. GPS: a graph processing system. In Conference on Scientific and Statistical Database Management, SSDBM '13, Baltimore, MD, USA, July 29 -- 31, 2013, pages 22:1--22:12, 2013.

Digital Library

[30]

N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 979--990, 2014.

Digital Library

[31]

J. Seo, S. Guo, and M. S. Lam. Socialite: Datalog extensions for efficient social network analysis. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013, pages 278--289, 2013.

[32]

Z. Shang and J. X. Yu. Catch the wind: Graph workload balancing on cloud. In ICDE, pages 553--564, 2013.

Digital Library

[33]

J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013.

Digital Library

[34]

N. Sundaram, N. Satish, M. M. A. Patwary, S. Dulloor, M. J. Anderson, S. G. Vadlamudi, D. Das, and P. Dubey. Graphmat: High performance graph analytics made productive. PVLDB, 8(11):1214--1225, 2015.

Digital Library

[35]

Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think like a vertex" to "think like a graph". PVLDB, 7(3):193--204, 2013.

Digital Library

[36]

M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. Gram: scaling graph computation to the trillions. In SoCC, pages 408--421, 2015.

Digital Library

[37]

C. Xie, R. Chen, H. Guan, B. Zang, and H. Chen. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In PPoPP, pages 194--204, 2015.

Digital Library

[38]

W. Xie, G. Wang, D. Bindel, A. J. Demers, and J. Gehrke. Fast iterative graph computation with block updates. PVLDB, 6(14):2014--2025, 2013.

Digital Library

[39]

D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014.

Digital Library

[40]

D. Yan, Y. Huang, J. Cheng, and H. Wu. Efficient processing of very large graphs in a small cluster. CoRR, abs/1601.05590, 2016.

[41]

F. Yang, Y. Huang, Y. Zhao, J. Li, G.Jiang, and J. Cheng. The best of both worlds: Big data programming with both productivity and performance. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 1619--1622, 2017.

Digital Library

[42]

F. Yang, J. Li, and J. Cheng. Husky: Towards a more efficient and expressive distributed computing framework. PVLDB, 9(5):420--431, 2016.

Digital Library

[43]

M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010, 2010.

Digital Library

[44]

C. Zhou, J. Gao, B. Sun, and J. X. Yu. Mocgraph: Scalable distributed graph processing using message online computing. PVLDB, 8(4):377--388, 2014.

Digital Library

[45]

Y. Zhou, L. Liu, K. Lee, and Q. Zhang. Graphtwist: Fast iterative graph computation with two-tier optimizations. PVLDB, 8(11):1262--1273, 2015.

Digital Library

Cited By

Zhao WCao YBuneman PLi JNtarmos N(2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698833
Chen HLi CZheng CHuang CFang JCheng JZhang J(2022)G-tranProceedings of the VLDB Endowment10.14778/3551793.355181315:11(2545-2558)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551813
Weng TZhou XLi KPeng PLi K(2022)Efficient Distributed Approaches to Core Maintenance on Large Dynamic GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309075933:1(129-143)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TPDS.2021.3090759
Show More Cited By

Recommendations

Scalable Graph Processing Frameworks: A Taxonomy and Open Challenges

The world is becoming a more conjunct place and the number of data sources such as social networks, online transactions, web search engines, and mobile devices is increasing even more than had been predicted. A large percentage of this growing dataset ...
Navigating the maze of graph analytics frameworks using massive graph datasets
SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data

Graph algorithms are becoming increasingly important for analyzing large datasets in many fields. Real-world graph data follows a pattern of sparsity, that is not uniform but highly skewed towards a few items. Implementing graph traversal, statistics and ...
Parallel graph analytics

Data-centric abstractions and execution strategies are needed to exploit parallelism in large-scale graph analytics.

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing

September 2017

672 pages

ISBN:9781450350280

DOI:10.1145/3127479

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SoCC '17

Sponsor:

SoCC '17: ACM Symposium on Cloud Computing

September 24 - 27, 2017

California, Santa Clara

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
467
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)6

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhao WCao YBuneman PLi JNtarmos N(2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
https://dl.acm.org/doi/10.1145/3698833
Chen HLi CZheng CHuang CFang JCheng JZhang J(2022)G-tranProceedings of the VLDB Endowment10.14778/3551793.355181315:11(2545-2558)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551813
Weng TZhou XLi KPeng PLi K(2022)Efficient Distributed Approaches to Core Maintenance on Large Dynamic GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309075933:1(129-143)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TPDS.2021.3090759
Chen HLi CFang JHuang CCheng JZhang JHou YYan X(2019)GrasperProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362715(87-100)Online publication date: 20-Nov-2019
https://dl.acm.org/doi/10.1145/3357223.3362715
Lissandrini MBrugnara MVelegrakis Y(2018)Beyond macrobenchmarksProceedings of the VLDB Endowment10.14778/3297753.329775912:4(390-403)Online publication date: 1-Dec-2018
https://dl.acm.org/doi/10.14778/3297753.3297759
Chen HLiu MZhao YYan XYan DCheng JOliveira RFelber PHu Y(2018)G-MinerProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190545(1-12)Online publication date: 23-Apr-2018
https://dl.acm.org/doi/10.1145/3190508.3190545

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten