research-article

Open Access

GraphIt: a high-performance graph DSL

Authors:
Yunming Zhang

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Mengjiao Yang

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Riyadh Baghdadi

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Shoaib Kamil

Adobe Research, USA

Adobe Research, USA
View Profile

,
Julian Shun

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

,
Saman Amarasinghe

Massachusetts Institute of Technology, USA

Massachusetts Institute of Technology, USA
View Profile

Proceedings of the ACM on Programming Languages Volume 2 Issue OOPSLAArticle No.: 121pp 1–30https://doi.org/10.1145/3276491

Published:24 October 2018Publication History

Related Artifact: GraphIt-DSL/graphit: Initial release to make the artifact available for OOPSLA18 September 2018 software https://doi.org/10.5281/zenodo.1434471

Proceedings of the ACM on Programming Languages

Abstract

The performance bottlenecks of graph applications depend not only on the algorithm and the underlying hardware, but also on the size and structure of the input graph. As a result, programmers must try different combinations of a large set of techniques, which make tradeoffs among locality, work-efficiency, and parallelism, to develop the best implementation for a specific algorithm and type of graph. Existing graph frameworks and domain specific languages (DSLs) lack flexibility, supporting only a limited set of optimizations.

This paper introduces GraphIt, a new DSL for graph computations that generates fast implementations for algorithms with different performance characteristics running on graphs with different sizes and structures. GraphIt separates what is computed (algorithm) from how it is computed (schedule). Programmers specify the algorithm using an algorithm language, and performance optimizations are specified using a separate scheduling language. The algorithm language simplifies expressing the algorithms, while exposing opportunities for optimizations. We formulate graph optimizations, including edge traversal direction, data layout, parallelization, cache, NUMA, and kernel fusion optimizations, as tradeoffs among locality, parallelism, and work-efficiency. The scheduling language enables programmers to easily search through this complicated tradeoff space by composing together a large set of edge traversal, vertex data layout, and program structure optimizations. The separation of algorithm and schedule also enables us to build an autotuner on top of GraphIt to automatically find high-performance schedules. The compiler uses a new scheduling representation, the graph iteration space, to model, compose, and ensure the validity of the large number of optimizations. We evaluate GraphIt’s performance with seven algorithms on graphs with different structures and sizes. GraphIt outperforms the next fastest of six state-of-the-art shared-memory frameworks (Ligra, Green-Marl, GraphMat, Galois, Gemini, and Grazelle) on 24 out of 32 experiments by up to 4.8×, and is never more than 43% slower than the fastest framework on the other experiments. GraphIt also reduces the lines of code by up to an order of magnitude compared to the next fastest framework.

Supplemental Material

a121-zhang.webm

webm

100.4 MB

Download

References

Christopher R. Aberger, Susan Tu, Kunle Olukotun, and Christopher Ré. 2016. EmptyHeaded: A Relational Engine for Graph Processing. In International Conference on Management of Data (SIGMOD ’16). 431–446. Google ScholarDigital Library
Jason Ansel, Shoaib Kamil, Kalyan Veeramachaneni, Jonathan Ragan-Kelley, Jeffrey Bosboom, Una-May O’Reilly, and Saman Amarasinghe. 2014. OpenTuner: An Extensible Framework for Program Autotuning. In International Conference on Parallel Architectures and Compilation Techniques. Google ScholarDigital Library
Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing Breadth-first Search. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12). Article 12, 12:1–12:10 pages. Google ScholarDigital Library
Scott Beamer, Krste Asanovic, and David Patterson. 2015. Locality Exists in Graph Processing: Workload Characterization on an Ivy Bridge Server. In IEEE International Symposium on Workload Characterization (IISWC). 56–65. Google ScholarDigital Library
Scott Beamer, Krste Asanovic, and David Patterson. 2017. Reducing Pagerank Communication via Propagation Blocking. In IEEE International Parallel and Distributed Processing Symposium. 820–831.Google ScholarCross Ref
James Bennett, Stan Lanning, and Netflix Netflix. 2007. The Netflix Prize. In In KDD Cup and Workshop in conjunction with KDD.Google Scholar
Maciej Besta, Michal Podstawski, Linus Groner, Edgar Solomonik, and Torsten Hoefler. 2017. To Push or To Pull: On Reducing Communication and Synchronization in Graph Computations. In Proceedings of the 26th International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’17). 93–104. Google ScholarDigital Library
Chun Chen, Jacqueline Chame, and Mary Hall. 2008. CHiLL: A framework for composing high-level loop transformations. Technical Report.Google Scholar
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys ’15). Article 1, 1:1–1:15 pages. Google ScholarDigital Library
Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, and Keshav Pingali. 2018. Gluon: A Communication-optimizing Substrate for Distributed Heterogeneous Graph Analytics. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). ACM, New York, NY, USA, 752–768. Google ScholarDigital Library
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 1:1–1:25 pages. Google ScholarDigital Library
Zachary DeVito, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham, Alex Aiken, Karthik Duraisamy, Eric Darve, Juan Alonso, and Pat Hanrahan. 2011. Liszt: A Domain Specific Language for Building Portable Mesh-based PDE Solvers. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’11). Article 9, 12 pages. Google ScholarDigital Library
Chantat Eksombatchai, Pranav Jindal, Jerry Zitao Liu, Yuchen Liu, Rahul Sharma, Charles Sugnet, Mark Ulrich, and Jure Leskovec. 2018. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time. In Proceedings of the 2018 World Wide Web Conference (WWW ’18). 1775–1784. Google ScholarDigital Library
Zhisong Fu, Zhengwei Wu, Houyu Li, Yize Li, Min Wu, Xiaojie Chen, Xiaomeng Ye, Benquan Yu, and Xi Hu. 2017. GeaBase: A High-Performance Distributed Graph Database for Industry-Scale Applications. In International Conference on Advanced Cloud and Big Data (CBD). 170–175.Google Scholar
Gurbinder Gill, Roshan Dathathri, Loc Hoang, Andrew Lenharth, and Keshav Pingali. 2018. Abelian: A Compiler for Graph Analytics on Distributed, Heterogeneous Platforms. In Euro-Par 2018: Parallel Processing - 24th International Conference on Parallel and Distributed Computing, Turin, Italy, August 27-31, 2018, Proceedings. 249–264.Google Scholar
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graphparallel Computation on Natural Graphs. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). USENIX Association, Berkeley, CA, USA, 17–30. Google ScholarDigital Library
Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making Pull-based Graph Processing Performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). 246–260. Google ScholarDigital Library
Sungpack Hong, Hassan Chafi, Edic Sedlar, and Kunle Olukotun. 2012. Green-Marl: A DSL for Easy and Efficient Graph Analysis. SIGARCH Comput. Archit. News 40, 1 (March 2012), 349–362. Google ScholarDigital Library
Sang Woo Jun, Andy Wright, Sizhuo Zhang, Shuotao Xu, and Arvind. 2018. GraFBoost: Accelerated Flash Storage for External Graph Analytics. In International Symposium on Computer Architecture. Google ScholarDigital Library
U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2011. PEGASUS: mining peta-scale graphs. Knowl. Inf. Syst. 27, 2 (2011). Google ScholarDigital Library
Vladimir Kiriansky, Yunming Zhang, and Saman Amarasinghe. 2016. Optimizing Indirect Memory References with Milk. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation (PACT ’16). 299–312. Google ScholarDigital Library
Fredrik Kjolstad, Shoaib Kamil, Jonathan Ragan-Kelley, David I. W. Levin, Shinjiro Sueda, Desai Chen, Etienne Vouga, Danny M. Kaufman, Gurtej Kanwar, Wojciech Matusik, and Saman Amarasinghe. 2016. Simit: A Language for Physical Simulation. ACM Trans. Graph. 35, 2, Article 20 (March 2016), 20:1–20:21 pages. Google ScholarDigital Library
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (WWW ’10). 591–600. Google ScholarDigital Library
Aapo Kyrola, Guy Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-scale Graph Computation on Just a PC. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation (OSDI’12). 31–46. Google ScholarDigital Library
Monica S. Lam, Stephen Guo, and Jiwon Seo. 2013. SociaLite: Datalog Extensions for Efficient Social Network Analysis. In ICDE. 278–289. Google ScholarDigital Library
Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford. edu/data . (June 2014).Google Scholar
Boduo Li, Sandeep Tata, and Yannis Sismanis. 2013. Sparkler: Supporting Large-scale Matrix Factorization. In Proceedings of the 16th International Conference on Extending Database Technology (EDBT ’13). 625–636. Google ScholarDigital Library
Zhiyuan Li, Pen-Chung Yew, and Chuag-Qi Zhu. 1989. Data dependence analysis on multi-dimensional array references. In Proceedings of the 3rd international conference on Supercomputing. 215–224. Google ScholarDigital Library
Yucheng Low, Joseph Gonzalez, Aapo Kyrola, Danny Bickson, Carlos Guestrin, and Joseph M. Hellerstein. 2010. GraphLab: A New Framework For Parallel Machine Learning.. In UAI. 340–349. Google ScholarDigital Library
Adam Lugowski, David Alber, Aydın Buluç, John Gilbert, Steve Reinhardt, Yun Teng, and Andrew Waranis. 2012. A Flexible Open-Source Toolbox for Scalable Complex Graph Analysis. In SDM.Google Scholar
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In Proceedings of the Twelfth European Conference on Computer Systems (EuroSys ’17). 527–543. Google ScholarDigital Library
Jasmina Malicevic, Baptiste Lepers, and Willy Zwaenepoel. 2017. Everything you always wanted to know about multicore graph processing but were afraid to ask. In USENIX Annual Technical Conference. Google ScholarDigital Library
Dror E Maydan, John L Hennessy, and Monica S Lam. 1991. Efficient and exact data dependence analysis. In ACM SIGPLAN Notices, Vol. 26. 1–14. Google ScholarDigital Library
Robert Ryan McCune, Tim Weninger, and Greg Madey. 2015. Thinking Like a Vertex: A Survey of Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing. ACM Comput. Surv. 48, 2, Article 25 (Oct. 2015), 25:1–25:39 pages. Google ScholarDigital Library
Donald Nguyen, Andrew Lenharth, and Keshav Pingali. 2013. A Lightweight Infrastructure for Graph Analytics. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). 456–471. Google ScholarDigital Library
Rajesh Nishtala, Richard W. Vuduc, James W. Demmel, and Katherine A. Yelick. 2007. When cache blocking of sparse matrix vector multiply works and why. Applicable Algebra in Engineering, Communication and Computing 18, 3 (01 May 2007), 297–311.Google Scholar
David A Padua and Michael J Wolfe. 1986. Advanced compiler optimizations for supercomputers. Commun. ACM 29, 12 (1986), 1184–1201. Google ScholarDigital Library
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report 1999-66. Stanford InfoLab.Google Scholar
Vijayan Prabhakaran, Ming Wu, Xuetian Weng, Frank McSherry, Lidong Zhou, and Maya Haridasan. 2012. Managing Large Graphs on Multi-cores with Graph Awareness. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference (USENIX ATC’12). Google ScholarDigital Library
Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2012. Elixir: a system for synthesizing concurrent graph programs. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. 375–394. Google ScholarDigital Library
Dimitrios Prountzos, Roman Manevich, and Keshav Pingali. 2015. Synthesizing parallel graph programs via automated planning. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. 533–544. Google ScholarDigital Library
Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, and Frédo Durand. 2017. Halide: Decoupling Algorithms from Schedules for High-performance Image Processing. Commun. ACM 61, 1 (Dec. 2017), 106–115. Google ScholarDigital Library
Dolbeau Romain, Stephane Bihan, and Francois Bodin. 2007. HMPP: A hybrid multi-core parallel programming environment. In Workshop on general purpose processing on graphics processing units (GPGPU 2007).Google Scholar
Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel. 2015. Chaos: Scale-out Graph Processing from Secondary Storage. In Proceedings of the 25th Symposium on Operating Systems Principles (SOSP ’15). 410–424. Google ScholarDigital Library
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). 472–488. Google ScholarDigital Library
Sherif Sakr, Faisal Moeen Orakzai, Ibrahim Abdelaziz, and Zuhair Khayyat. 2017. Large-Scale Graph Processing Using Apache Giraph (1st ed.). Springer Publishing Company, Incorporated. Google ScholarDigital Library
Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. 2014. Navigating the Maze of Graph Analytics Frameworks Using Massive Graph Datasets. In ACM SIGMOD International Conference on Management of Data (SIGMOD ’14). 979–990. Google ScholarDigital Library
Aneesh Sharma, Jerry Jiang, Praveen Bommannavar, Brian Larson, and Jimmy Lin. 2016. GraphJet: Real-time Content Recommendations at Twitter. Proc. VLDB Endow. 9, 13 (Sept. 2016), 1281–1292. Google ScholarDigital Library
Xuanhua Shi, Zhigao Zheng, Yongluan Zhou, Hai Jin, Ligang He, Bo Liu, and Qiang-Sheng Hua. 2018. Graph Processing on GP Us: A Survey. ACM Comput. Surv. 50, 6, Article 81 (Jan. 2018), 81:1–81:35 pages. Google ScholarDigital Library
Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. In Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13). 135–146. Google ScholarDigital Library
Jiawen Sun, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: Addressing Load Imbalance of Graph Partitioning. In Proceedings of the International Conference on Supercomputing (ICS ’17). Article 16, 16:1–16:10 pages. Google ScholarDigital Library
Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R. Dulloor, Michael J. Anderson, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. GraphMat: High Performance Graph Analytics Made Productive. Proc. VLDB Endow. 8, 11 (July 2015), 1214–1225. Google ScholarDigital Library
Keval Vora, Guoqing Xu, and Rajiv Gupta. 2016. Load the Edges You Need: A Generic I/O Optimization for Disk-based Graph Processing. In 2016 USENIX Annual Technical Conference. 507–522. Google ScholarDigital Library
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. 2017. Graspan: A Single-machine Diskbased Graph System for Interprocedural Static Analyses of Large-scale Systems Code. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). 389–404. Google ScholarDigital Library
Kai Wang, Guoqing Xu, Zhendong Su, and Yu David Liu. 2015. GraphQ: Graph Query Processing with Abstraction Refinement—Scalable and Programmable Analytics over Very Large Graphs on a Single PC. In USENIX Annual Technical Conference. 387–401. Google ScholarDigital Library
Yangzihao Wang, Andrew Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, and John D. Owens. 2016. Gunrock: A Highperformance Graph Processing Library on the GP U. SIGPLAN Not. 51, 8, Article 11 (Feb. 2016), 12 pages.Google Scholar
Michael E Wolf and Monica S Lam. 1991. A loop transformation theory and an algorithm to maximize parallelism. IEEE Transactions on Parallel and Distributed Systems 2, 4 (1991), 452–471. Google ScholarDigital Library
Da Yan, Yingyi Bu, Yuanyuan Tian, and Amol Deshpande. 2017. Big Graph Analytics Platforms. Foundations and Trends in Databases 7, 1-2 (2017), 1–195. Google ScholarDigital Library
Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-aware Graph-structured Analytics. In Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2015). 183–193. Google ScholarDigital Library
Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302.Google ScholarCross Ref
Da Zheng, Disa Mhembere, Randal Burns, Joshua Vogelstein, Carey E Priebe, and Alexander S Szalay. 2015. Flashgraph: Processing billion-node graphs on an array of commodity SSDs. In 13th USENIX Conference on File and Storage Technologies (FAST 15). 45–58. Google ScholarDigital Library
Xiaowei Zhu, Wenguang Chen, Weimin Zheng, and Xiaosong Ma. 2016. Gemini: A Computation-Centric Distributed Graph Processing System. In 12th USENIX Symposium on Operating Systems Design and Implementation. 301–316. Google ScholarDigital Library
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-scale Graph Processing on a Single Machine Using 2-level Hierarchical Partitioning. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’15). 375–386. Google ScholarDigital Library

Index Terms

GraphIt: a high-performance graph DSL
1. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms
2. Software and its engineering
  1. Software notations and tools
    1. Context specific languages
      1. Domain specific languages
    2. General programming languages
      1. Language types
        Parallel programming languages

Recommendations

Optimizing ordered graph algorithms with GraphIt
CGO 2020: Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization

Many graph problems can be solved using ordered parallel graph algorithms that achieve significant speedup over their unordered counterparts by reducing redundant work. This paper introduces a new priority-based extension to GraphIt, a domain-specific ...
Read More
Compiling graph applications for GPUs with graphit
CGO '21: Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization

The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all settings. To ...
Read More
Finding a chain graph in a bipartite permutation graph

We present a polynomial-time algorithm for solving Subgraph Isomorphism where the base graphs are bipartite permutation graphs and the pattern graphs are chain graphs. Subgraph Isomorphism is studied on graph classes.A polynomial-time algorithm is given ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 2, Issue OOPSLA
November 2018
1656 pages
EISSN:2475-1421
DOI:10.1145/3288538
Issue’s Table of Contents

Copyright © 2018 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2018
Published in pacmpl Volume 2, Issue OOPSLA

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
- Artifacts Available
- Artifacts Evaluated & Functional
Author Tags
Big Data
Code Generation
Compiler Optimizations
Domain Specific Languages
Graph Algorithms
Parallel Programming Languages
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 138
  Total Citations
  View Citations
- 4,263
  Total Downloads
- Downloads (Last 12 months)734
- Downloads (Last 6 weeks)75
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

GraphIt: a high-performance graph DSL

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

References

Cited By

Index Terms

Recommendations

Optimizing ordered graph algorithms with GraphIt

Compiling graph applications for GPUs with graphit

Finding a chain graph in a bipartite permutation graph