skip to main content
10.1145/2806416.2806420acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article
Open Access

Towards Scale-out Capability on Social Graphs

Authors Info & Claims
Published:17 October 2015Publication History

ABSTRACT

The development of cloud storage and computing has facilitated the rise of various big data applications. As a representative high performance computing (HPC) workload, graph processing is becoming a part of cloud computing. However, scalable computing on large graphs is still dominated by HPC solutions, which require high performance all-to-all collective operations over torus (or mesh) networking. Implementing those torus-based algorithms on commodity clusters, e.g., cloud computing infrastructures, can result in great latency due to inefficient communication. Moreover, designing a highly scalable system for large social graphs, is far from being trivial, as intrinsic features of social graphs, e.g., degree skewness and lacking of locality, often profoundly limit the extent of parallelism.

To resolve the challenges, we explore the iceberg of developing a scalable system for processing large social graphs on commodity clusters. In particular, we focus on the scale-out capability of the system. We propose a novel separator-combiner based query processing engine which provides native load-balancing and very low communication overhead, such that increasinglylarger graphs can be simply addressed by adding more computing nodes to the cluster.The proposed system achieves remarkable scale-out capability in processing large social graphs with skew degree distributions, while providing many critical features for big data analytics, such as easy-to-use API, fault-tolerance and recovery. We implement the system as a portable and easily configurable library, and conduct comprehensive experimental studies to demonstrate its effectiveness and efficiency.

References

  1. Apache giraph: http://giraph.apache.org/.Google ScholarGoogle Scholar
  2. The graph 500 list,http://www.graph500.org/.Google ScholarGoogle Scholar
  3. A. Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. L. Wiener. Graph structure in the web. Computer Networks 33(1--6):309--320, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding J. Ferris, A. Giardullo, S. Kulkarni, H. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. Tao: Facebook's distributed data store for the social graph. In Presented as part of the 2013 USENIX ATC, pages 49--60, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Chakrabarti, Y. Zhan, and C. Faloutsos. R-mat: A recursive model for graph mining. In SDM, pages 442--446, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  6. E. Chan, M. Heimlich, A. Purkayastha, and R. van de Geijn. Collective communication: Theory, practice, and experience: Research articles. Concurr. Comput.: Pract. Exper., 19(13):1749--1783, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. F. Checconi and F. Petrini. Massive data analytics: The graph 500 on ibm blue gene/q. IBM Journal of Research and Development, 57(1/2):10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Faloutsos, P. Faloutsos, and C. Faloutsos. On power-law relationships of the internet topology. In SIGCOMM, pages 251--262, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Conference on OSDI, pages 17--30, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Gregor and A. Lumsdaine. The parallel bgl: A generic library for distributed graph computations. In Parallel Object-Oriented Scientific Computing, 2005.Google ScholarGoogle Scholar
  11. T. Gunarathne, J. Qiu, and D. Gannon. Towards a collective layer in the big data stack. In CCGrid, pages 236--245, 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Han and K. Daudjee. Giraph unchained: Barrierless asynchronous parallel execution in pregel-like graph processing systems. PVLDB, 8(9):950--961, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. U. Kang, C. E. Tsourakakis, and C. Faloutsos. Pegasus: mining peta-scale graphs. Knowl. Inf. Syst., 27(2):303--325, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Lee and L. Liu. Efficient data partitioning model for heterogeneous graphs in the cloud. In SC, page 46, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Lin and C. Dyer. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Graphlab: A new parallel framework for machine learning. In Conference on Uncertainty in Artificial Intelligence (UAI), 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed graphlab: A framework for machine learning in the cloud. CoRR, abs/1204.6078, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135--146, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Pig latin: a not-so-foreign language for data processing. In SIGMOD Conference, pages 1099--1110, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Pjesivac-Grbovic, T. Angskun, G. Bosilca, G. E. Fagg, E. Gabriel, and J. Dongarra. Performance analysis of mpi collective operations. pages 127--143, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. L. Qin, J. Yu, L. Chang, H. Cheng, C. Zhang, and X. Lin. Scalable big graph processing in mapreduce. In SIGMOD Conference, pages 827--838, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Redekopp, Y. Simmhan, and V. K. Prasanna. Optimizations and analysis of bsp graph processing models on public clouds. Parallel and Distributed Processing Symposium, International, 0:203--214, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Redner. How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4(2):131--134, Aug. 1998.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Salihoglu and J. Widom. Gps: A graph processing system. In SSDBM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Salihoglu and J. Widom. Optimizing graph algorithms on pregel-like systems. In PVLDB, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. N. Satish, N. Sundaram, M. M. A. Patwary, J. Seo, J. Park, M. A. Hassaan, S. Sengupta, Z. Yin, and P. Dubey. Navigating the maze of graph analytics frameworks using massive graph datasets. In SIGMOD Conference, pages 979--990, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Seo, E. J. Yoon, J. Kim, S. Jin, J.-S. Kim, and S. Maeng. Hama: An efficient matrix computation with the mapreduce framework. In CloudCom, pages 721--726, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Shang and M. Kitsuregawa. Efficient breadth-first search on large graphs with skewed degree distributions. In Joint 2013 EDBT/ICDT Conferences, pages 311--322, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Shao, H. Wang, and Y. Li. Trinity: A distributed graph engine on a memory cloud. In SIGMOD Conference, pages 505--516, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Tasci and M. Demirbas. Giraphx: Parallel yet serializable large-scale graph processing. In Proceedings of the 19th International Conference on Parallel Processing, Euro-Par'13, pages 458--469, Berlin, Heidelberg, 2013. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Anthony, H. Liu, and R. Murthy. Hive - a petabyte scale data warehouse using hadoop. In ICDE, pages 996--1005, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  32. Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson. From "think like a vertex" to "think like a graph". PVLDB, 7(3):193--204, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Ueno and T. Suzumura. 2d partitioning based graph search for the graph500 benchmark. In IPDPS Workshops, pages 1925--1931, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103--111, Aug. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. White. Hadoop - The Definitive Guide: Storage and Analysis at Internet Scale. O'Reilly, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. R. S. Xin, J. E. Gonzalez, M. J. Franklin, and I. Stoica. Graphx: a resilient distributed graph system on spark. In GRADES, page 2, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Yoo, E. Chow, K. W. Henderson, W. M. III, B. Hendrickson, and U. V. "Catalyurek. A scalable distributed parallel breadth-first search algorithm on bluegene"l. In SC, page 25, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proceedings of the 2Nd USENIX Conference on HotCloud, pages 10--10, 2010 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards Scale-out Capability on Social Graphs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Article Metrics

      • Downloads (Last 12 months)16
      • Downloads (Last 6 weeks)6

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader