skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

GraphIt: a high-performance graph DSL

Published:24 October 2018Publication History
Skip Abstract Section

Abstract

The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. As a result, programmers must try different combinations of a large set of techniques, which make tradeoffs among locality, work-efficiency, and parallelism, to develop the best implementation for a specific algorithm and type of graph. Existing graph frameworks and domain specific languages (DSLs) lack flexibility, supporting only a limited set of optimizations.

This paper introduces GraphIt, a new DSL for graph computations that generates fast implementations for algorithms with different performance characteristics running on graphs with different sizes and structures. GraphIt separates what is computed (algorithm) from how it is computed (schedule). Programmers specify the algorithm using an algorithm language, and performance optimizations are specified using a separate scheduling language. The algorithm language simplifies expressing the algorithms, while exposing opportunities for optimizations. We formulate graph optimizations, including edge traversal direction, data layout, parallelization, cache, NUMA, and kernel fusion optimizations, as tradeoffs among locality, parallelism, and work-efficiency. The scheduling language enables programmers to easily search through this complicated tradeoff space by composing together a large set of edge traversal, vertex data layout, and program structure optimizations. The separation of algorithm and schedule also enables us to build an autotuner on top of GraphIt to automatically find high-performance schedules. The compiler uses a new scheduling representation, the graph iteration space, to model, compose, and ensure the validity of the large number of optimizations. We evaluate GraphIt’s performance with seven algorithms on graphs with different structures and sizes. GraphIt outperforms the next fastest of six state-of-the-art shared-memory frameworks (Ligra, Green-Marl, GraphMat, Galois, Gemini, and Grazelle) on 24 out of 32 experiments by up to 4.8×, and is never more than 43% slower than the fastest framework on the other experiments. GraphIt also reduces the lines of code by up to an order of magnitude compared to the next fastest framework.

Skip Supplemental Material Section

Supplemental Material

a121-zhang.webm

webm

100.4 MB

References

  1. Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In International Conference on Management of Data (SIGMOD ’16). 431–446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing Breadth-first Search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12). Article 12, 12:1–12:10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Scott Beamer, Krste Asanovic, and David Patterson. 2015. Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server. In IEEE International Symposium on Workload Characterization (IISWC). 56–65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Scott Beamer, Krste Asanovic, and David Patterson. 2017. Reducing Pagerank Communication via Propagation Blocking. In IEEE International Parallel and Distributed Processing Symposium. 820–831.Google ScholarGoogle ScholarCross RefCross Ref
  6. James Bennett, Stan Lanning, and Netflix Netflix. 2007. The Netflix Prize. In In KDD Cup and Workshop in conjunction with KDD.Google ScholarGoogle Scholar
  7. Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’17). 93–104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. Technical Report.Google ScholarGoogle Scholar
  9. Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys ’15). Article 1, 1:1–1:15 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A Communication-optimizing Substrate for Distributed Heterogeneous Graph Analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New York, NY, USA, 752–768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 1:1–1:25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11). Article 9, 12 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). 1775–1784. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Zhisong Fu, Zhengwei Wu, Houyu Li, Yize Li, Min Wu, Xiaojie Chen, Xiaomeng Ye, Benquan Yu, and Xi Hu. 2017. GeaBase: A High-Performance Distributed Graph Database for Industry-Scale Applications. In International Conference on Advanced Cloud and Big Data (CBD). 170–175.Google ScholarGoogle Scholar
  15. Gurbinder Gill, Roshan Dathathri, Loc Hoang, Andrew Lenharth, and Keshav Pingali. 2018. Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms. In Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings. 249–264.Google ScholarGoogle Scholar
  16. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graphparallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA, 17–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making Pull-based Graph Processing Performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). 246–260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: A DSL for Easy and Efficient Graph Analysis. SIGARCH Comput. Archit. News 40, 1 (March 2012), 349–362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Sang Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Accelerated Flash Storage for External Graph Analytics. In International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2011. PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27, 2 (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Vladimir Kiriansky, Yunming Zhang, and Saman Amarasinghe. 2016. Optimizing Indirect Memory References with Milk. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT ’16). 299–312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M. Kaufman, Gurtej Kanwar, Wojciech Matusik, and Saman Amarasinghe. 2016. Simit: A Language for Physical Simulation. ACM Trans. Graph. 35, 2, Article 20 (March 2016), 20:1–20:21 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (WWW ’10). 591–600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 31–46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Monica S. Lam, Stephen Guo, and Jiwon Seo. 2013. SociaLite: Datalog Extensions for Efficient Social Network Analysis. In ICDE. 278–289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford. edu/data . (June 2014).Google ScholarGoogle Scholar
  27. Boduo Li, Sandeep Tata, and Yannis Sismanis. 2013. Sparkler: Supporting Large-scale Matrix Factorization. In Proceedings of the 16th International Conference on Extending Database Technology (EDBT ’13). 625–636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhiyuan Li, Pen-Chung Yew, and Chuag-Qi Zhu. 1989. Data dependence analysis on multi-dimensional array references. In Proceedings of the 3rd international conference on Supercomputing. 215–224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A New Framework For Parallel Machine Learning.. In UAI. 340–349. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Adam Lugowski, David Alber, Aydın Buluç, John Gilbert, Steve Reinhardt, Yun Teng, and Andrew Waranis. 2012. A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis. In SDM.Google ScholarGoogle Scholar
  31. Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ’17). 527–543. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jasmina Malicevic, Baptiste Lepers, and Willy Zwaenepoel. 2017. Everything you always wanted to know about multicore graph processing but were afraid to ask. In USENIX Annual Technical Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Dror E Maydan, John L Hennessy, and Monica S Lam. 1991. Efficient and exact data dependence analysis. In ACM SIGPLAN Notices, Vol. 26. 1–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing. ACM Comput. Surv. 48, 2, Article 25 (Oct. 2015), 25:1–25:39 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). 456–471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, and Katherine A. Yelick. 2007. When cache blocking of sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communication and Computing 18, 3 (01 May 2007), 297–311.Google ScholarGoogle Scholar
  37. David A Padua and Michael J Wolfe. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (1986), 1184–1201. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab.Google ScholarGoogle Scholar
  39. Vijayan Prabhakaran, Ming Wu, Xuetian Weng, Frank McSherry, Lidong Zhou, and Maya Haridasan. 2012. Managing Large Graphs on Multi-cores with Graph Awareness. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC’12). Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2012. Elixir: a system for synthesizing concurrent graph programs. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 375–394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2015. Synthesizing parallel graph programs via automated planning. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 533–544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2017. Halide: Decoupling Algorithms from Schedules for High-performance Image Processing. Commun. ACM 61, 1 (Dec. 2017), 106–115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Dolbeau Romain, Stephane Bihan, and Francois Bodin. 2007. HMPP: A hybrid multi-core parallel programming environment. In Workshop on general purpose processing on graphics processing units (GPGPU 2007).Google ScholarGoogle Scholar
  44. Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). 410–424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). 472–488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sherif Sakr, Faisal Moeen Orakzai, Ibrahim Abdelaziz, and Zuhair Khayyat. 2017. Large-Scale Graph Processing Using Apache Giraph (1st ed.). Springer Publishing Company, Incorporated. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. 2014. Navigating the Maze of Graph Analytics Frameworks Using Massive Graph Datasets. In ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). 979–990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time Content Recommendations at Twitter. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1281–1292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph Processing on GP Us: A Survey. ACM Comput. Surv. 50, 6, Article 81 (Jan. 2018), 81:1–81:35 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). 135–146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: Addressing Load Imbalance of Graph Partitioning. In Proceedings of the International Conference on Supercomputing (ICS ’17). Article 16, 16:1–16:10 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R. Dulloor, Michael J. Anderson, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. GraphMat: High Performance Graph Analytics Made Productive. Proc. VLDB Endow. 8, 11 (July 2015), 1214–1225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Keval Vora, Guoqing Xu, and Rajiv Gupta. 2016. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In 2016 USENIX Annual Technical Conference. 507–522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A Single-machine Diskbased Graph System for Interprocedural Static Analyses of Large-scale Systems Code. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). 389–404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Kai Wang, Guoqing Xu, Zhendong Su, and Yu David Liu. 2015. GraphQ: Graph Query Processing with Abstraction Refinement—Scalable and Programmable Analytics over Very Large Graphs on a Single PC. In USENIX Annual Technical Conference. 387–401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A Highperformance Graph Processing Library on the GP U. SIGPLAN Not. 51, 8, Article 11 (Feb. 2016), 12 pages.Google ScholarGoogle Scholar
  57. Michael E Wolf and Monica S Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 4 (1991), 452–471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big Graph Analytics Platforms. Foundations and Trends in Databases 7, 1-2 (2017), 1–195. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware Graph-structured Analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). 183–193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302.Google ScholarGoogle ScholarCross RefCross Ref
  61. Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E Priebe, and Alexander S Szalay. 2015. Flashgraph: Processing billion-node graphs on an array of commodity SSDs. In 13th USENIX Conference on File and Storage Technologies (FAST 15). 45–58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation. 301–316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale Graph Processing on a Single Machine Using 2-level Hierarchical Partitioning. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’15). 375–386. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. GraphIt: a high-performance graph DSL

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader