skip to main content
10.1145/3127479.3128606acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Architectural implications on the performance and cost of graph analytics systems

Published: 24 September 2017 Publication History

Abstract

Graph analytics systems have gained significant popularity due to the prevalence of graph data. Many of these systems are designed to run in a shared-nothing architecture whereby a cluster of machines can process a large graph in parallel. In more recent proposals, others have argued that a single-machine system can achieve better performance and/or is more cost-effective. There is however no clear consensus which approach is better. In this paper, we classify existing graph analytics systems into four categories based on the architectural differences, i.e., processing infrastructure (centralized vs distributed), and memory consumption (in-memory vs out-of-core). We select eight open-source systems to cover all categories, and perform a comparative measurement study to compare their performance and cost characteristics across a spectrum of input data, applications, and hardware settings. Our results show that the best performing configuration can depend on the type of applications and input graphs, and there is no dominant winner across all categories. Based on our findings, we summarize the trends in performance and cost, and provide several insights that help to illuminate the performance and resource cost tradeoffs across different graph analytics systems and categories.

References

[1]
Microsoft Azure: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/#Windows.
[2]
C. R. Aberger, S. Tu, K. Olukotun, and C. Ré. Emptyheaded: A relational engine for graph processing. In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, pages 431--446, 2016.
[3]
Apache Giraph. http://giraph.apache.org/.
[4]
V. R. Borkar, M. J. Carey, R. Grover, N. Onose, and R. Vernica. Hyracks: A flexible and extensible foundation for data-intensive computing. In ICDE 2011, April 11--16, 2011, Hannover, Germany, pages 1151--1162, 2011.
[5]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1--7):107--117, 1998.
[6]
Y. Bu, V. R. Borkar, J. Jia, M. J. Carey, and T. Condie. Pregelix: Big(ger) graph analytics on a dataflow engine. PVLDB, 8(2):161--172, 2014.
[7]
M. Capota, T. Hegeman, A. Iosup, A. Prat-Pérez, O. Erling, and P. A. Boncz. Graphalytics: A big data benchmark for graph-processing platforms. In Proceedings of the Third International Workshop on Graph Data Management Experiences and Systems, GRADES 2015, Melbourne, VIC, Australia, May 31 - June 4, 2015, pages 7:1--7:6, 2015.
[8]
R. Chen, J. Shi, Y. Chen, and H. Chen. Powerlyra: differentiated graph computation and partitioning on skewed graphs. In EuroSys, pages 1:1--1:15, 2015.
[9]
J. Cheng, Q. Liu, Z. Li, W. Fan, J. C. S. Lui, and C. He. VENUS: vertex-centric streamlined graph computation on a single PC. In ICDE, pages 1131--1142, 2015.
[10]
B. Elser and A. Montresor. An evaluation study of bigdata frameworks for graph processing. In Proceedings of the 2013 IEEE International Conference on Big Data, 6--9 October 2013, Santa Clara, CA, USA, pages 60--67, 2013.
[11]
Y. Gao, W. Zhou, J. Han, D. Meng, Z. Zhang, and Z. Xu. An evaluation and analysis of graph processing frameworks on five key issues. In Proceedings of the 12th ACM International Conference on Computing Frontiers, CF'15, Ischia, Italy, May 18--21, 2015, pages 11:1--11:8, 2015.
[12]
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pages 17--30, Hollywood, CA, 2012. USENIX.
[13]
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599--613, 2014.
[14]
Y. Guo, M. Biczak, A. L. Varbanescu, A. Iosup, C. Martella, and T. L. Willke. How well do graph-processing platforms perform? an empirical performance evaluation and analysis. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19--23, 2014, pages 395--404, 2014.
[15]
M. Han and K. Daudjee. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB, 8(9):950--961, 2015.
[16]
M. Han, K. Daudjee, K. Ammar, M. T. Özsu, X. Wang, and T. Jin. An experimental comparison of pregel-like graph processing systems. PVLDB, 7(12):1047--1058, 2014.
[17]
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, pages 169--182, 2013.
[18]
A. Kyrola, G. E. Blelloch, and C. Guestrin. Graphchi: Large-scale graph computation on just a PC. In OSDI 2012, Hollywood, CA, USA, October 8--10, 2012, pages 31--46, 2012.
[19]
J. Li, J. Cheng, Y. Zhao, F. Yang, Y. Huang, H. Chen, and R. Zhao. A comparison of general-purpose distributed systems for data processing. In 2016 IEEE International Conference on Big Data, BigData 2016, Washington DC, USA, December 5--8, 2016, pages 378--383, 2016.
[20]
Y. Lu, J. Cheng, D. Yan, and H. Wu. Large-scale distributed graph computing systems: An experimental evaluation. PVLDB, 8(3):281--292, 2014.
[21]
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, Indianapolis, Indiana, USA, June 6--10, 2010, pages 135--146, 2010.
[22]
F. McSherry, M. Isard, and D. G. Murray. Scalability! but at what cost? In 15th Workshop on Hot Topics in Operating Systems (HotOS XV), Kartause Ittingen, Switzerland, May 2015. USENIX Association.
[23]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In SOSP, pages 439--455, 2013.
[24]
D. Nguyen, A. Lenharth, and K. Pingali. A lightweight infrastructure for graph analytics. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3--6, 2013, pages 456--471, 2013.
[25]
Y. Perez, R. Sosic, A. Banerjee, R. Puttagunta, M. Raison, P. Shah, and J. Leskovec. Ringo: Interactive graph analytics on big-memory machines. In SIGMOD, pages 1105--1110, 2015.
[26]
K. Pingali, D. Nguyen, M. Kulkarni, M. Burtscher, M. A. Hassaan, R. Kaleem, T. Lee, A. Lenharth, R. Manevich, M. Méndez-Lojo, D. Prountzos, and X. Sui. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4--8, 2011, pages 12--25, 2011.
[27]
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: scale-out graph processing from secondary storage. In Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4--7, 2015, pages 410--424, 2015.
[28]
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In ACM SIGOPS 24th Symposium on Operating Systems Principles, SOSP '13, Farmington, PA, USA, November 3--6, 2013, pages 472--488, 2013.
[29]
S. Salihoglu and J. Widom. GPS: a graph processing system. In Conference on Scientific and Statistical Database Management, SSDBM '13, Baltimore, MD, USA, July 29 -- 31, 2013, pages 22:1--22:12, 2013.
[30]
N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22--27, 2014, pages 979--990, 2014.
[31]
J. Seo, S. Guo, and M. S. Lam. Socialite: Datalog extensions for efficient social network analysis. In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8--12, 2013, pages 278--289, 2013.
[32]
Z. Shang and J. X. Yu. Catch the wind: Graph workload balancing on cloud. In ICDE, pages 553--564, 2013.
[33]
J. Shun and G. E. Blelloch. Ligra: a lightweight graph processing framework for shared memory. In PPoPP, pages 135--146, 2013.
[34]
N. Sundaram, N. Satish, M. M. A. Patwary, S. Dulloor, M. J. Anderson, S. G. Vadlamudi, D. Das, and P. Dubey. Graphmat: High performance graph analytics made productive. PVLDB, 8(11):1214--1225, 2015.
[35]
Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think like a vertex" to "think like a graph". PVLDB, 7(3):193--204, 2013.
[36]
M. Wu, F. Yang, J. Xue, W. Xiao, Y. Miao, L. Wei, H. Lin, Y. Dai, and L. Zhou. Gram: scaling graph computation to the trillions. In SoCC, pages 408--421, 2015.
[37]
C. Xie, R. Chen, H. Guan, B. Zang, and H. Chen. SYNC or ASYNC: time to fuse for distributed graph-parallel computation. In PPoPP, pages 194--204, 2015.
[38]
W. Xie, G. Wang, D. Bindel, A. J. Demers, and J. Gehrke. Fast iterative graph computation with block updates. PVLDB, 6(14):2014--2025, 2013.
[39]
D. Yan, J. Cheng, Y. Lu, and W. Ng. Blogel: A block-centric framework for distributed computation on real-world graphs. PVLDB, 7(14):1981--1992, 2014.
[40]
D. Yan, Y. Huang, J. Cheng, and H. Wu. Efficient processing of very large graphs in a small cluster. CoRR, abs/1601.05590, 2016.
[41]
F. Yang, Y. Huang, Y. Zhao, J. Li, G.Jiang, and J. Cheng. The best of both worlds: Big data programming with both productivity and performance. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14--19, 2017, pages 1619--1622, 2017.
[42]
F. Yang, J. Li, and J. Cheng. Husky: Towards a more efficient and expressive distributed computing framework. PVLDB, 9(5):420--431, 2016.
[43]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In 2nd USENIX Workshop on Hot Topics in Cloud Computing, HotCloud'10, Boston, MA, USA, June 22, 2010, 2010.
[44]
C. Zhou, J. Gao, B. Sun, and J. X. Yu. Mocgraph: Scalable distributed graph processing using message online computing. PVLDB, 8(4):377--388, 2014.
[45]
Y. Zhou, L. Liu, K. Lee, and Q. Zhang. Graphtwist: Fast iterative graph computation with two-tier optimizations. PVLDB, 8(11):1262--1273, 2015.

Cited By

View all
  • (2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
  • (2022)G-tranProceedings of the VLDB Endowment10.14778/3551793.355181315:11(2545-2558)Online publication date: 29-Sep-2022
  • (2022)Efficient Distributed Approaches to Core Maintenance on Large Dynamic GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309075933:1(129-143)Online publication date: 1-Jan-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SoCC '17: Proceedings of the 2017 Symposium on Cloud Computing
September 2017
672 pages
ISBN:9781450350280
DOI:10.1145/3127479
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. experiments
  2. graph analytics systems
  3. performance
  4. resource cost

Qualifiers

  • Research-article

Funding Sources

Conference

SoCC '17
Sponsor:
SoCC '17: ACM Symposium on Cloud Computing
September 24 - 27, 2017
California, Santa Clara

Acceptance Rates

Overall Acceptance Rate 169 of 722 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Automating Vectorized Distributed Graph ComputationProceedings of the ACM on Management of Data10.1145/36988332:6(1-27)Online publication date: 20-Dec-2024
  • (2022)G-tranProceedings of the VLDB Endowment10.14778/3551793.355181315:11(2545-2558)Online publication date: 29-Sep-2022
  • (2022)Efficient Distributed Approaches to Core Maintenance on Large Dynamic GraphsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309075933:1(129-143)Online publication date: 1-Jan-2022
  • (2019)GrasperProceedings of the ACM Symposium on Cloud Computing10.1145/3357223.3362715(87-100)Online publication date: 20-Nov-2019
  • (2018)Beyond macrobenchmarksProceedings of the VLDB Endowment10.14778/3297753.329775912:4(390-403)Online publication date: 1-Dec-2018
  • (2018)G-MinerProceedings of the Thirteenth EuroSys Conference10.1145/3190508.3190545(1-12)Online publication date: 23-Apr-2018

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media