ABSTRACT
The expressiveness of the vertex-centric programming model introduced by Pregel attracted great attention. Over the years, numerous frameworks emerged, abiding by the same programming model, while relying on widely different architectural designs. The vast majority of existing vertex-centric frameworks exploits distributed memory parallelism or out-of-core computations. To our knowledge, only one vertex-centric framework is designed upon in-memory storage and shared memory parallelism. Unfortunately, while built on a faster architecture than that of other vertex-centric frameworks, it did not prove to significantly outperform other existing solutions.
In this paper we present iPregel: another in-memory shared memory vertex-centric framework. The optimisations developed and presented in this paper particularly target three hotspots of vertex-centric calculations: selecting active vertices, routing messages to their recipient and updating recipients inbox. We compare iPregel against the state-of-the-art in-memory distributed memory framework Pregel+ on three of the most common vertex-centric applications: PageRank, Hashmin and the Single-Source Shortest Path. Experiments demonstrate that the single-node framework iPregel is faster than its distributed memory counterpart until at least 11 nodes are used. Further experiments show that iPregel completes a PageRank application with an order of magnitude less memory than popular vertex-centric frameworks.
- I. Abdelaziz, R. Harbi, S. Salihoglu, and P. Kalnis. 2017. Combining Vertex-Centric Graph Processing with SPARQL for Large-Scale RDF Data Analytics. IEEE Transactions on Parallel and Distributed Systems 28, 12 (Dec 2017), 3374--3388.Google ScholarDigital Library
- Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, Panos Kalnis, and Nikos Mamoulis. 2015. Spartex: A vertex-centric framework for RDF data analytics. Proceedings of the VLDB Endowment 8, 12 (2015), 1880--1883. Google ScholarDigital Library
- Ballmer Alex, Walters Benjamin, and Raicu Ioan. {n. d.}. FemtoGraph: A Pregel Based Shared-memory Graph Processing Library. ({n. d.}). Poster at SC'16.Google Scholar
- Yingyi Bu. 2013. Pregelix: Dataflow-based Big Graph Analytics. In Proceedings of the 4th Annual Symposium on Cloud Computing (SOCC '13). ACM, New York, NY, USA, Article 54, 2 pages. Google ScholarDigital Library
- Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. 2015. One trillion edges: Graph processing at facebook-scale. Proceedings of the VLDB Endowment 8, 12 (2015), 1804--1815. Google ScholarDigital Library
- Disa Mhembere Da Zheng, Randal Burns, Joshua Vogelstein, Carey E Priebe, and Alexander S Szalay. 2015. FlashGraph: Processing billion-node graphs on an array of commodity SSDs. In Proceedings of the 13th USENIX Conference on File and Storage Technologies. 45--58. Google ScholarDigital Library
- L. Dagum and R. Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (Jan 1998), 46--55. Google ScholarDigital Library
- The Center for Discrete Mathematics and Theoretical Computer Science (DIMACS). 2006. 9th DIMACS Implementation Challenge. http://www.dis.uniromal.it/challenge9/download.shtml. (2006).Google Scholar
- Vasiliki Kalavri, Vladimir Vlassov, and Seif Haridi. 2016. High-Level Programming Abstractions for Distributed Graph Processing. (07 2016). arXiv:1607.02646 https://arxiv.org/abs/1607.02646Google Scholar
- Arijit Khan. 2016. Vertex-Centric Graph Processing: The Good, the Bad, and the Ugly. (12 2016). arXiv:1612.07404 https://arxiv.org/abs/1612.07404Google Scholar
- Jérome Kunegis. 2013. KONECT: The Koblenz Network Collection. In Proceedings of the 22Nd International Conference on World Wide Web (WWW '13 Companion). ACM, New York, NY, USA, 1343--1350. Google ScholarDigital Library
- Aapo Kyrola, Guy E Blelloch, and Carlos Guestrin. 2012. Graphchi: Large-scale graph computation on just a pc. USENIX.Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data. (June 2014).Google Scholar
- Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M Hellerstein. 2010. Graphlab: A new framework for parallel machine learning. arXiv preprint. arXiv preprint arXiv:1006.4990 1 (2010). Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). ACM, New York, NY, USA, 135--146. Google ScholarDigital Library
- Louise Quick, Paul Wilkinson, and David Hardcastle. 2012. Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis. In Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012) (ASONAM '12). IEEE Computer Society, Washington, DC, USA, 457--463. Google ScholarDigital Library
- Julian Shun and Guy E. Blelloch. 2013. Ligra. Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13 (2013).Google Scholar
- Leslie G Valiant. 1990. A bridging model for parallel computation. Commun. ACM 33, 8 (1990), 103--111. Google ScholarDigital Library
- Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 1307--1317. Google ScholarDigital Library
- Da Yan, Yuzhen Huang, Miao Liu, Hongzhi Chen, James Cheng, Huanhuan Wu, and Chengcui Zhang. 2017. GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit. IEEE Transactions on Parallel and Distributed Systems (2017). Google ScholarDigital Library
Index Terms
- iPregel: A Combiner-Based In-Memory Shared Memory Vertex-Centric Framework
Recommendations
iPregel: Vertex-centric programmability vs memory efficiency and performance, why choose?
Highlights- iPregel is up to 2300 × faster and 100 × more memory efficient than FemtoGraph.
AbstractThe vertex-centric programming model, designed to improve the programmability in graph processing application writing, has attracted great attention over the years. Multiple shared memory frameworks that have implemented the vertex-...
Pimiento: A Vertex-Centric Graph-Processing Framework on a Single Machine
Algorithms and Architectures for Parallel ProcessingAbstractHere, we describe a method for handling large graphs with data sizes exceeding memory capacity using minimal hardware resources. This method (called Pimiento) is a vertex-centric graph-processing framework on a single machine and represents a semi-...
NVRAM as an Enabler to New Horizons in Graph Processing
AbstractFrom the world wide web, to genomics, to traffic analysis, graphs are central to many scientific, engineering, and societal endeavours. Therefore an important question is what hardware technologies are most appropriate to invest in and use for ...
Comments