ABSTRACT
The determinant of performance in scale-up graph processing on a single system is the speed at which the graph can be fetched from storage: either from disk into memory or from memory into CPU-cache. Algorithms that follow edges perform random accesses to the storage medium for the graph and this can often be the determinant of performance, regardless of the algorithmic complexity or runtime efficiency of the actual algorithm in use. A storage-centric viewpoint would suggest that the solution to this problem lies in recognizing that graphs represent a unique workload and therefore should be treated as such by adopting novel ways to access graph structured data. We approach this problem from two different aspects and this paper details two different efforts in this direction. One approach is specific to graphs stored on SSDs and accelerates random access using a novel prefetcher called RASP. The second approach takes a fresh look at how graphs are accessed and suggests that trading off the low cost of random access for the approach of sequentially streaming a large set of (potentially unrelated) edges can be a winning proposition under certain circumstances: leading to a system for graphs stored on any medium (main-memory, SSD or magnetic disk) called X-stream. RASP and X-stream therefore take - diametrically opposite - storage centric viewpoints of the graph processing problem. After contrasting the approaches and demonstrating the benefit of each, this paper ends with a description of planned future development of an online algorithm that selects between the two approaches, possibly providing the best of both worlds.
- Aapo Kyrola and Guy Blelloch. Graphchi: Large-scale graph computation on just a PC. In OSDI. USENIX Association, 2012. Google ScholarDigital Library
- Grzegorz Malewicz, Matthew H. Austern, Aart J. C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD, pages 135--146. ACM, 2010. Google ScholarDigital Library
- Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: distributed graph-parallel computation on natural graphs. In OSDI, pages 17--30. USENIX Association, 2012. Google ScholarDigital Library
- Amitabha Roy, Karthik Nilakant, Valentin Dalibard, and Eiko Yoneki. Mitigating I/O latency in SSD-based graph traversal. Technical Report UCAM-CL-TR-823, University of Cambridge, November 2012.Google Scholar
- Philip Stutz, Abraham Bernstein, and William Cohen. Signal/collect: graph algorithms for the (semantic) web. In ISWC - Volume Part I, pages 764--780. Springer-Verlag, 2010. Google ScholarDigital Library
- Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun. Efficient parallel graph exploration on multi-core CPU and GPU. In PACT, pages 78--88. IEEE Computer Society, 2011. Google ScholarDigital Library
- Virat Agarwal, Fabrizio Petrini, Davide Pasetto, and David A. Bader. Scalable graph exploration on multicore processors. In SC, pages 1--11. IEEE Computer Society, 2010. Google ScholarDigital Library
- Roger Pearce, Maya Gokhale, and Nancy M. Amato. Multithreaded asynchronous graph traversal for in-memory and semi-external memory. In SC, pages 1--11. IEEE Computer Society, 2010. Google ScholarDigital Library
- Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. Green-marl: a DSL for easy and efficient graph analysis. In ASPLOS, pages 349--362. ACM, 2012. Google ScholarDigital Library
- http://twitter.mpi-sws.org/.Google Scholar
- http://snap.stanford.edu/.Google Scholar
- http://www.graph500.org/.Google Scholar
- Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University, 1998.Google Scholar
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is twitter, a social network or a news media? In WWW, pages 591--600. ACM, 2010. Google ScholarDigital Library
- http://dimacs.rutgers.edu/Challenges/.Google Scholar
- http://www.cise.ufl.edu/research/sparse/matrices/LAW/sk-2005.html.Google Scholar
- http://webscope.sandbox.yahoo.com/.Google Scholar
- Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graph evolution: Densification and shrinking diameters. ACM Trans. Knowl. Discov. Data, 1(1), March 2007. Google ScholarDigital Library
Recommendations
Scale-up graph processing in the cloud: challenges and solutions
CloudDP '14: Proceedings of the Fourth International Workshop on Cloud Data and PlatformsProcessing large graphs is an important part of the big-data problem. Recently a number of scale-up systems such as X-Stream, Graphchi and Turbograph have been proposed for processing large graphs using secondary storage on a single machine. The design ...
Large-Scale Stream Graph Processing: Doctoral Symposium
DEBS '17: Proceedings of the 11th ACM International Conference on Distributed and Event-based SystemsDynamically changing graphs are a powerful abstraction used to represent temporal relationships and connections occurring between data entities in various real-world organizations, such as social and telecommunication networks. The increasing volume, ...
Collapsible subgraphs of a 4-edge-connected graph
AbstractJaeger in 1979 showed that every 4-edge-connected graph is supereulerian, graphs that have spanning eulerian subgraphs. Catlin in 1988 sharpened Jaeger’s result by showing that every 4-edge-connected graph is collapsible, graphs that ...
Comments