ABSTRACT
We present ZipG, a distributed memory-efficient graph store for serving interactive graph queries. ZipG achieves memory efficiency by storing the input graph data using a compressed representation. What differentiates ZipG from other graph stores is its ability to execute a wide range of graph queries directly on this compressed representation. ZipG can thus execute a larger fraction of queries in main memory, achieving query interactivity. ZipG exposes a minimal API that is functionally rich enough to implement published functionalities from several industrial graph stores. We demonstrate this by implementing and evaluating graph queries from Facebook TAO, LinkBench, Graph Search and several other workloads on top of ZipG. On a single server with 244GB memory, ZipG executes tens of thousands of queries from these workloads for raw graph data over half a TB; this leads to an order of magnitude (sometimes as much as 23×) higher throughput than Neo4j and Titan. We get similar gains in distributed settings compared to Titan.
- Breadth First Search. https://en.wikipedia.org/wiki/Breadth-first_search.Google Scholar
- Building a follower model from scratch. https://engineering.pinterest.com/blog/building-follower-model-scratch.Google Scholar
- Demining the "Join Bomb" with graph queries. http://neo4j.com/blog/demining-the-join-bomb-with-graph-queries/.Google Scholar
- FlockDB. https://github.com/twitter/flockdb.Google Scholar
- Function Shipping: Separating Logical and Physical Tiers. https://docs.oracle.com/cd/A87860_01/doc/appdev.817/a86030/adx16nt4.htm.Google Scholar
- gMark Queries for LDBC Social Network Benchmark. https://github.com/graphMark/gmark/tree/master/demo/social/social-translated.Google Scholar
- Introducing FlockDB. https://blog.twitter.com/2010/introducing-flockdb.Google Scholar
- Introducing Graph Search Beta. http://newsroom.fb.com/news/2013/01/introducing-graph-search-beta/.Google Scholar
- LinkBench. https://github.com/facebookarchive/linkbench.Google Scholar
- Microsoft GraphView. https://github.com/facebookarchive/linkbench.Google Scholar
- Neo4j. http://neo4j.com/.Google Scholar
- Neo4j Pushes Graph DB Limits Past a Quadrillion Nodes. https://www.datanami.com/2016/04/26/neo4j-pushes-graph-db-limits-past-quadrillion-nodes/.Google Scholar
- openCypher. http://www.opencypher.org.Google Scholar
- OrientDB. http://orientdb.com/.Google Scholar
- Property Graph Model. https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model.Google Scholar
- Sparksee by Sparsity Technologies. http://www.sparsity-technologies.com.Google Scholar
- Suffix Array. http://en.wikipedia.org/wiki/Suffix_array.Google Scholar
- Titan. http://thinkaurelius.github.io/titan/.Google Scholar
- Titan Data Model. http://s3.thinkaurelius.com/docs/titan/current/data-model.html.Google Scholar
- Virtuoso Universal Server. http://virtuoso.openlinksw.com.Google Scholar
- R. Agarwal, A. Khandelwal, and I. Stoica. Succinct: Enabling Queries on Compressed Data. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015. Google ScholarDigital Library
- A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed Large-scale Natural Graph Factorization. In ACM International Conference on World Wide Web (WWW), 2013. Google ScholarDigital Library
- A. Anand, C. Muthukrishnan, S. Kappes, A. Akella, and S. Nath. Cheap and large cams for high performance data-intensive networked systems. In NSDI, 2010. Google ScholarDigital Library
- T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: A database benchmark based on the facebook social graph. In ACM International Conference on Management of Data (SIGMOD), 2013. Google ScholarDigital Library
- G. Bagan, A. Bonifati, R. Ciucanu, G. H. L. Fletcher, A. Lemay, and N. Advokaat. Generating flexible workloads for graph databases. Proceedings of the VLDB Endowment (PVLDB), 9(13):1457--1460, 2016. Google ScholarDigital Library
- P. Barceló Baeza. Querying graph databases. In ACM Symposium on Principles of Database Systems (PODS), pages 175--188, 2013. Google ScholarDigital Library
- Bharat, Krishna and Broder, Andrei and Henzinger, Monika and Kumar, Puneet and Venkatasubramanian, Suresh. The connectivity server: Fast access to linkage information on the web. Computer networks and ISDN Systems, 30, 1998. Google ScholarDigital Library
- P. Boldi and S. Vigna. The Webgraph Framework I: Compression Techniques. In International Conference on World Wide Web (WWW), 2004. Google ScholarDigital Library
- N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, et al. TAO: Facebook\textquoterights Distributed Data Store for the Social Graph. In USENIX Annual Technical Conference (ATC), 2013. Google ScholarDigital Library
- D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. Rewriting of regular expressions and regular path queries. In ACM Symposium on Principles of Database Systems (PODS), pages 194--204, 1999. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008. Google ScholarDigital Library
- F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On Compressing Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2009. Google ScholarDigital Library
- I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A graphical query language supporting recursion. In ACM International Conference on Management of Data (SIGMOD), pages 323--330, 1987. Google ScholarDigital Library
- C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a Workload-Driven Approach to Database Replication and Partitioning. Proceedings of the VLDB Endowment, 2010. Google ScholarDigital Library
- A. Dubey, G. D. Hill, R. Escriva, and E. G. Sirer. Weaver: A High-Performance, Transactional Graph Store Based on Refinable Timestamps. CoRR, abs/1509.08443, 2015. Google ScholarDigital Library
- O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. The ldbc social network benchmark: Interactive workload. In ACM International Conference on Management of Data (SIGMOD), pages 619--630, 2015. Google ScholarDigital Library
- W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression. In ACM International Conference on Management of Data (SIGMOD), 2012. Google ScholarDigital Library
- J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014. Google ScholarDigital Library
- C. Hernández and G. Navarro. Compression of Web and Social Graphs supporting Neighbor and Community Queries. In ACM Workshop on Social Network mining and Analysis (SNAKDD), 2011.Google Scholar
- C. Hernández and G. Navarro. Compressed Representations for Web and Social Graphs. Knowledge and Information Systems, 40(2), 2014. Google ScholarDigital Library
- Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data, 2014.Google Scholar
- A. Khandelwal, R. Agarwal, and I. Stoica. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016. Google ScholarDigital Library
- A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012. Google ScholarDigital Library
- Lakshman, Avinash and Malik, Prashant. Cassandra: A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44, 2010. Google ScholarDigital Library
- K. Lang. Finding good nearly balanced cuts in power law graphs. Preprint, 2004.Google Scholar
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, 2009.Google ScholarCross Ref
- L. Libkin and D. Vrgo\vc. Regular path queries on graphs with data. In ACM International Conference on Database Theory (ICDT), pages 74--85, 2012. Google ScholarDigital Library
- Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein. GraphLab: A New Framework For Parallel Machine Learning. arXiv preprint arXiv:1408.2041, 2014. Google ScholarDigital Library
- A. Maccioni and D. J. Abadi. Scalable pattern matching over compressed graphs via dedensification. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2016. Google ScholarDigital Library
- U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. SIAM Journal on Computing (SICOMP), 1993. Google ScholarDigital Library
- N. Martinez-Bazan, S. Gómez-Villamor, and F. Escalé-Claveras. DEX: A high-performance graph database management system. In IEEE Data Engineering Workshops (ICDEW), 2011. Google ScholarDigital Library
- H. Maserrat and J. Pei. Neighbor Query Friendly Compression of Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2010. Google ScholarDigital Library
- J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. In ACM SIGCOMM Computer Communication Review, 1997. Google ScholarDigital Library
- B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In ACM International Conference on Management of Data (SIGMOD). ACM, 2013. Google ScholarDigital Library
- J. Shun and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoP), 2013. Google ScholarDigital Library
- J. Shun, L. Dhulipala, and G. Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with LigraGoogle Scholar
- . In IEEE Data Compression Conference (DCC), 2015.Google Scholar
- Y. J. Song, M. K. Aguilera, R. Kotla, and D. Malkhi. RPC Chains: Efficient Client-server Communication in Geodistributed Systems. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009. Google ScholarDigital Library
- R. Wang, C. Conrad, and S. Shah. Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph. In Workshop on Hot Topics in Cloud Computing (HotCloud), 2013.Google Scholar
Index Terms
- ZipG: A Memory-efficient Graph Store for Interactive Queries
Recommendations
Log(graph): a near-optimal high-performance graph representation
PACT '18: Proceedings of the 27th International Conference on Parallel Architectures and Compilation TechniquesToday's graphs used in domains such as machine learning or social network analysis may contain hundreds of billions of edges. Yet, they are not necessarily stored efficiently, and standard graph representations such as adjacency lists waste a significant ...
Betweenness centrality on GPUs and heterogeneous architectures
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing UnitsThe betweenness centrality metric has always been intriguing for graph analyses and used in various applications. Yet, it is one of the most computationally expensive kernels in graph mining. In this work, we investigate a set of techniques to make the ...
GPU-based Graph Traversal on Compressed Graphs
SIGMOD '19: Proceedings of the 2019 International Conference on Management of DataGraph processing on GPUs received much attention in the industry and the academia recently, as the hardware accelerator offers attractive potential for performance boost. However, the high-bandwidth device memory on GPUs has limited capacity that ...
Comments