skip to main content
10.1145/3035918.3064012acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ZipG: A Memory-efficient Graph Store for Interactive Queries

Authors Info & Claims
Published:09 May 2017Publication History

ABSTRACT

We present ZipG, a distributed memory-efficient graph store for serving interactive graph queries. ZipG achieves memory efficiency by storing the input graph data using a compressed representation. What differentiates ZipG from other graph stores is its ability to execute a wide range of graph queries directly on this compressed representation. ZipG can thus execute a larger fraction of queries in main memory, achieving query interactivity. ZipG exposes a minimal API that is functionally rich enough to implement published functionalities from several industrial graph stores. We demonstrate this by implementing and evaluating graph queries from Facebook TAO, LinkBench, Graph Search and several other workloads on top of ZipG. On a single server with 244GB memory, ZipG executes tens of thousands of queries from these workloads for raw graph data over half a TB; this leads to an order of magnitude (sometimes as much as 23×) higher throughput than Neo4j and Titan. We get similar gains in distributed settings compared to Titan.

References

  1. Breadth First Search. https://en.wikipedia.org/wiki/Breadth-first_search.Google ScholarGoogle Scholar
  2. Building a follower model from scratch. https://engineering.pinterest.com/blog/building-follower-model-scratch.Google ScholarGoogle Scholar
  3. Demining the "Join Bomb" with graph queries. http://neo4j.com/blog/demining-the-join-bomb-with-graph-queries/.Google ScholarGoogle Scholar
  4. FlockDB. https://github.com/twitter/flockdb.Google ScholarGoogle Scholar
  5. Function Shipping: Separating Logical and Physical Tiers. https://docs.oracle.com/cd/A87860_01/doc/appdev.817/a86030/adx16nt4.htm.Google ScholarGoogle Scholar
  6. gMark Queries for LDBC Social Network Benchmark. https://github.com/graphMark/gmark/tree/master/demo/social/social-translated.Google ScholarGoogle Scholar
  7. Introducing FlockDB. https://blog.twitter.com/2010/introducing-flockdb.Google ScholarGoogle Scholar
  8. Introducing Graph Search Beta. http://newsroom.fb.com/news/2013/01/introducing-graph-search-beta/.Google ScholarGoogle Scholar
  9. LinkBench. https://github.com/facebookarchive/linkbench.Google ScholarGoogle Scholar
  10. Microsoft GraphView. https://github.com/facebookarchive/linkbench.Google ScholarGoogle Scholar
  11. Neo4j. http://neo4j.com/.Google ScholarGoogle Scholar
  12. Neo4j Pushes Graph DB Limits Past a Quadrillion Nodes. https://www.datanami.com/2016/04/26/neo4j-pushes-graph-db-limits-past-quadrillion-nodes/.Google ScholarGoogle Scholar
  13. openCypher. http://www.opencypher.org.Google ScholarGoogle Scholar
  14. OrientDB. http://orientdb.com/.Google ScholarGoogle Scholar
  15. Property Graph Model. https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model.Google ScholarGoogle Scholar
  16. Sparksee by Sparsity Technologies. http://www.sparsity-technologies.com.Google ScholarGoogle Scholar
  17. Suffix Array. http://en.wikipedia.org/wiki/Suffix_array.Google ScholarGoogle Scholar
  18. Titan. http://thinkaurelius.github.io/titan/.Google ScholarGoogle Scholar
  19. Titan Data Model. http://s3.thinkaurelius.com/docs/titan/current/data-model.html.Google ScholarGoogle Scholar
  20. Virtuoso Universal Server. http://virtuoso.openlinksw.com.Google ScholarGoogle Scholar
  21. R. Agarwal, A. Khandelwal, and I. Stoica. Succinct: Enabling Queries on Compressed Data. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Ahmed, N. Shervashidze, S. Narayanamurthy, V. Josifovski, and A. J. Smola. Distributed Large-scale Natural Graph Factorization. In ACM International Conference on World Wide Web (WWW), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Anand, C. Muthukrishnan, S. Kappes, A. Akella, and S. Nath. Cheap and large cams for high performance data-intensive networked systems. In NSDI, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. Linkbench: A database benchmark based on the facebook social graph. In ACM International Conference on Management of Data (SIGMOD), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Bagan, A. Bonifati, R. Ciucanu, G. H. L. Fletcher, A. Lemay, and N. Advokaat. Generating flexible workloads for graph databases. Proceedings of the VLDB Endowment (PVLDB), 9(13):1457--1460, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Barceló Baeza. Querying graph databases. In ACM Symposium on Principles of Database Systems (PODS), pages 175--188, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Bharat, Krishna and Broder, Andrei and Henzinger, Monika and Kumar, Puneet and Venkatasubramanian, Suresh. The connectivity server: Fast access to linkage information on the web. Computer networks and ISDN Systems, 30, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Boldi and S. Vigna. The Webgraph Framework I: Compression Techniques. In International Conference on World Wide Web (WWW), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. Li, et al. TAO: Facebook\textquoterights Distributed Data Store for the Social Graph. In USENIX Annual Technical Conference (ATC), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. Calvanese, G. De Giacomo, M. Lenzerini, and M. Y. Vardi. Rewriting of regular expressions and regular path queries. In ACM Symposium on Principles of Database Systems (PODS), pages 194--204, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. F. Chierichetti, R. Kumar, S. Lattanzi, M. Mitzenmacher, A. Panconesi, and P. Raghavan. On Compressing Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. I. F. Cruz, A. O. Mendelzon, and P. T. Wood. A graphical query language supporting recursion. In ACM International Conference on Management of Data (SIGMOD), pages 323--330, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. C. Curino, E. Jones, Y. Zhang, and S. Madden. Schism: a Workload-Driven Approach to Database Replication and Partitioning. Proceedings of the VLDB Endowment, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Dubey, G. D. Hill, R. Escriva, and E. G. Sirer. Weaver: A High-Performance, Transactional Graph Store Based on Refinable Timestamps. CoRR, abs/1509.08443, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. The ldbc social network benchmark: Interactive workload. In ACM International Conference on Management of Data (SIGMOD), pages 619--630, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. W. Fan, J. Li, X. Wang, and Y. Wu. Query Preserving Graph Compression. In ACM International Conference on Management of Data (SIGMOD), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. GraphX: Graph Processing in a Distributed Dataflow Framework. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Hernández and G. Navarro. Compression of Web and Social Graphs supporting Neighbor and Community Queries. In ACM Workshop on Social Network mining and Analysis (SNAKDD), 2011.Google ScholarGoogle Scholar
  40. C. Hernández and G. Navarro. Compressed Representations for Web and Social Graphs. Knowledge and Information Systems, 40(2), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Jure Leskovec and Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data, 2014.Google ScholarGoogle Scholar
  42. A. Khandelwal, R. Agarwal, and I. Stoica. BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-Scale Graph Computation on Just a PC. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Lakshman, Avinash and Malik, Prashant. Cassandra: A Decentralized Structured Storage System. ACM SIGOPS Operating Systems Review, 44, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. K. Lang. Finding good nearly balanced cuts in power law graphs. Preprint, 2004.Google ScholarGoogle Scholar
  46. J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  47. L. Libkin and D. Vrgo\vc. Regular path queries on graphs with data. In ACM International Conference on Database Theory (ICDT), pages 74--85, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Low, J. E. Gonzalez, A. Kyrola, D. Bickson, C. E. Guestrin, and J. Hellerstein. GraphLab: A New Framework For Parallel Machine Learning. arXiv preprint arXiv:1408.2041, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. A. Maccioni and D. J. Abadi. Scalable pattern matching over compressed graphs via dedensification. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. U. Manber and G. Myers. Suffix Arrays: A new method for on-line string searches. SIAM Journal on Computing (SICOMP), 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. N. Martinez-Bazan, S. Gómez-Villamor, and F. Escalé-Claveras. DEX: A high-performance graph database management system. In IEEE Data Engineering Workshops (ICDEW), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. H. Maserrat and J. Pei. Neighbor Query Friendly Compression of Social Networks. In ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. J. C. Mogul, F. Douglis, A. Feldmann, and B. Krishnamurthy. Potential benefits of delta encoding and data compression for HTTP. In ACM SIGCOMM Computer Communication Review, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. B. Shao, H. Wang, and Y. Li. Trinity: A Distributed Graph Engine on a Memory Cloud. In ACM International Conference on Management of Data (SIGMOD). ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. J. Shun and G. E. Blelloch. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoP), 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. J. Shun, L. Dhulipala, and G. Blelloch. Smaller and Faster: Parallel Processing of Compressed Graphs with LigraGoogle ScholarGoogle Scholar
  57. . In IEEE Data Compression Conference (DCC), 2015.Google ScholarGoogle Scholar
  58. Y. J. Song, M. K. Aguilera, R. Kotla, and D. Malkhi. RPC Chains: Efficient Client-server Communication in Geodistributed Systems. In USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. R. Wang, C. Conrad, and S. Shah. Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph. In Workshop on Hot Topics in Cloud Computing (HotCloud), 2013.Google ScholarGoogle Scholar

Index Terms

  1. ZipG: A Memory-efficient Graph Store for Interactive Queries

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
            May 2017
            1810 pages
            ISBN:9781450341974
            DOI:10.1145/3035918

            Copyright © 2017 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 9 May 2017

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader