ABSTRACT
This paper proposes a new algorithmic paradigm - k-level asynchronous (KLA) - that bridges level-synchronous and asynchronous paradigms for processing graphs. The KLA paradigm enables the level of asynchrony in parallel graph algorithms to be parametrically varied from none (level-synchronous) to full (asynchronous). The motivation is to improve execution times through an appropriate trade-off between the use of fewer, but more expensive global synchronizations, as in level-synchronous algorithms, and more, but less expensive local synchronizations (and perhaps also redundant work), as in asynchronous algorithms. We show how common patterns in graph algorithms can be expressed in the KLA pardigm and provide techniques for determining k, the number of asynchronous steps allowed between global synchronizations. Results of an implementation of KLA in the STAPL Graph Library show excellent scalability on up to 96K cores and improvements of 10x or more over level-synchronous and asynchronous versions for graph algorithms such as breadth-first search, PageRank, k-core decomposition and others on certain classes of real-world graphs.
- The graph 500 list. http://www.graph500.org, 2013.Google Scholar
- Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/index.html, 2013.Google Scholar
- 9th DIMACS Implementation Challenge. http://www.dis.uniroma1.it/challenge9/, 2013.Google Scholar
- J. I. Alvarez-hamelin, A. Barrat, and A. Vespignani. Large scale networks fingerprinting and visualization using the k-core decomposition. In Adv. in Neural Inf. Proc. Syst. 18, pp. 41--50. MIT Press, 2006.Google Scholar
- J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In Intl. Parallel and Distributed Processing Symp., 0:495, 2007.Google ScholarCross Ref
- U. Brandes. A faster algorithm for betweenness centrality. J. of Math. Sociology, pp. 163--177, 2001.Google Scholar
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, pp. 107--117, 1998. Google ScholarDigital Library
- A. Buluç and K. Madduri. Parallel breadth-first search on distributed memory systems. In Proc. of Intl. Conf. for High Performance Computing, Networking, Storage and Analysis, SC '11, pp. 1--12, New York, NY, USA, 2011. Google ScholarDigital Library
- A. Buss, A. Fidel, Harshvardhan, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. The STAPL pView. In Intl. Workshop on Languages and Compilers for Parallel Computing (LCPC), in Lecture Notes in Computer Science (LNCS), Houston, TX, USA, 2010. Google ScholarDigital Library
- A. Buss, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, G. Tanase, N. Thomas, X. Xu, M. Bianco, N. M. Amato, and L. Rauchwerger. STAPL: Standard Template Adaptive Parallel Library. In Proc. Annual Haifa Experimental Systems Conference (SYSTOR), pp. 1--10, New York, NY, USA, 2010. Google ScholarDigital Library
- N. Edmonds, J. Willcock, and A. Lumsdaine. Expressing graph algorithms using generalized active messages. In Proc. of Symp. on Principles and Practice of Parallel Programming, PPoPP '13, pp. 289--290, New York, NY, USA, 2013. Google ScholarDigital Library
- R. G. Gallager, P. A. Humblet, and P. M. Spira. A distributed algorithm for minimum-weight spanning trees. In Trans. Program. Lang. Syst., pp. 66--77, 1983. Google ScholarDigital Library
- D. Gregor and A. Lumsdaine. The Parallel BGL: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing, POOSC, 2005.Google Scholar
- Harshvardhan, A. Fidel, N. M. Amato, and L. Rauchwerger. The STAPL Parallel Graph Library. In Languages and Compilers for Parallel Computing, Lecture Notes in Computer Science, pp. 46--60. Springer Berlin Heidelberg, 2012.Google Scholar
- M. A. Hassaan, M. Burtscher, and K. Pingali. Ordered and unordered algorithms for parallel breadth first search. In Proc. of the Intl. Conf. on Parallel Architectures and Compilation Techniques, PACT '10, pp. 539--540, New York, NY, USA, 2010. Google ScholarDigital Library
- S. Hong, H. Chafi, E. Sedlar, and K. Olukotun. Green-Marl: A DSL for easy and efficient graph analysis. In Proc. of the Intl. Conf. on Architectural Support for Prog. Languages and Operating Syst., ASPLOS'12, pp. 349--362, New York, NY, USA, 2012. Google ScholarDigital Library
- S. Hong, T. Oguntebi, and K. Olukotun. Efficient parallel graph exploration for multi-core cpu and gpu. In Proc. of the Intl. Conf. on Parallel Architectures and Compilation Techniques, PACT '11, pp. 78--88. Google ScholarDigital Library
- J. JàJà. An Introduction Parallel Algorithms. Addison-Wesley, Reading, Massachusetts, 1992.Google ScholarDigital Library
- C. E. Leiserson and T. B. Schardl. A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In Proc. of the Symp. on Parallelism in Algorithms and Architectures, SPAA '10, pp. 303--314, New York, NY, USA, 2010. Google ScholarDigital Library
- Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein. Distributed Graphlab: A framework for machine learning and data mining in the cloud. Proc. of the VLDB Endowment, pp. 716--727, 2012. Google ScholarDigital Library
- G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proc. of the Intl. Conf. on Management of data, SIGMOD '10, pp. 135--146, New York, NY, USA, 2010. Google ScholarDigital Library
- U. Meyer and P. Sanders. Delta-stepping : A parallel single source shortest path algorithm. In ESA '98: Proc. of the European Symp. on Algorithms, pp. 393--404. Springer-Verlag, 1998. Google ScholarDigital Library
- L. Page, S. Brin, R. Motwani and T. Winograd. The PageRank Citation Ranking: Bringing Order to the Web. 1998.Google Scholar
- R. Pearce, M. Gokhale, and N. M. Amato. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In Proc. of Intl. Conf. for High Performance Computing, Networking, Storage and Analysis, SC '10, pp. 1--11, Washington, DC, USA, 2010. Google ScholarDigital Library
- D. Prountzos, R. Manevich, and K. Pingali. Elixir: A system for synthesizing concurrent graph programs. In Proc. of the Intl. Conf. on Object Oriented Program. Syst. Languages and Applications, OOPSLA '12, pp. 375--394, New York, NY, USA, 2012. Google ScholarDigital Library
- M. J. Quinn and N. Deo. Parallel graph algorithms. In ACM Computing Surveys (CSUR), pp. 319--348, 1984. Google ScholarDigital Library
- J. H. Reif, editor. Synthesis of Parallel Algorithms. Morgan Kaufmann, San Mateo, CA, 1993. Google ScholarDigital Library
- P. Stutz, A. Bernstein, and W. Cohen. Signal/collect: Graph algorithms for the (semantic) web. In The Semantic Web-ISWC '10, pp. 764--780. Springer, 2010. Google ScholarDigital Library
- G. Tanase, A. Buss, A. Fidel, Harshvardhan, I. Papadopoulos, O. Pearce, T. Smith, N. Thomas, X. Xu, N. Mourad, J. Vu, M. Bianco, N. M. Amato, and L. Rauchwerger. The STAPL Parallel Container Framework. In Proc. of Symp. on Principles and Practice of Parallel Programming, PPoPP, pp. 235--246, San Antonio, TX, USA, 2011. Google ScholarDigital Library
- L. Valiant. Bridging model for parallel computation. Comm. ACM, pp. 103--111, 1990. Google ScholarDigital Library
- G. Wang, W. Xie, A. J. Demers, and J. Gehrke. Asynchronous large-scale graph processing made easy. In CIDR, 2013.Google Scholar
- D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, pp. 440--442, 1998.Google Scholar
Index Terms
- KLA: a new algorithmic paradigm for parallel graph computations
Recommendations
Processing Big Data Graphs on Memory-Restricted Systems
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilationWith the advent of big-data, processing large graphs quickly has become increasingly important. Most existing approaches either utilize in-memory processing techniques, which can only process graphs that fit completely in RAM, or disk-based techniques ...
A hierarchical approach to reducing communication in parallel graph algorithms
PPoPP 2015: Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingLarge-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is ...
A hierarchical approach to reducing communication in parallel graph algorithms
PPoPP '15Large-scale graph computing has become critical due to the ever-increasing size of data. However, distributed graph computations are limited in their scalability and performance due to the heavy communication inherent in such computations. This is ...
Comments