skip to main content
10.1145/3190508.3190545acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

G-Miner: an efficient task-oriented graph mining system

Published: 23 April 2018 Publication History

Abstract

Graph mining is one of the most important areas in data mining. However, scalable solutions for graph mining are still lacking as existing studies focus on sequential algorithms. While many distributed graph processing systems have been proposed in recent years, most of them were designed to parallelize computations such as PageRank and Breadth-First Search that keep states on individual vertices and propagate updates along edges. Graph mining, on the other hand, may generate many subgraphs whose number can far exceed the number of vertices. This inevitably leads to much higher computational and space complexity rendering existing graph systems inefficient. We propose G-Miner, a distributed system with a new architecture designed for general graph mining. G-Miner adopts a unified programming framework for implementing a wide range of graph mining algorithms. We model subgraph processing as independent tasks, and design a novel task pipeline to streamline task processing for better CPU, network and I/O utilization. Our extensive experiments validate the efficiency of G-Miner for a range of graph mining tasks.

References

[1]
James Abello, Mauricio Resende, and Sandra Sudarsky. 2002. Massive quasi-clique detection. LATIN 2002: Theoretical Informatics (2002), 598--612.
[2]
Nesreen K Ahmed, Jennifer Neville, Ryan A Rossi, and Nick Duffield. 2015. Efficient graphlet counting for large networks. In IEEE International Conference on Data Mining. 1--10.
[3]
Ching Avery. 2011. Giraph: Large-scale graph processing infrastructure on hadoop. Proceedings of the Hadoop Summit. Santa Clara 11 (2011).
[4]
Luca Becchetti, Paolo Boldi, Carlos Castillo, and Aristides Gionis. 2008. Efficient semi-streaming algorithms for local triangle counting in massive graphs. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 16--24.
[5]
Immanuel M Bomze, Marco Budinich, Panos M Pardalos, and Marcello Pelillo. 1999. The maximum clique problem. In Handbook of combinatorial optimization. Springer, 1--74.
[6]
Coen Bron and Joep Kerbosch. 1973. Algorithm 457: finding all cliques of an undirected graph. Commun. ACM 16, 9 (1973), 575--577.
[7]
Jeremy Buhler. 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17, 5 (2001), 419--428.
[8]
Rong Chen, Jiaxin Shi, Yanzhe Chen, and Haibo Chen. 2015. Powerlyra: Differentiated graph computation and partitioning on skewed graphs. In Proceedings of the Tenth European Conference on Computer Systems. 1.
[9]
Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S Mirrokni. 2004. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the twentieth annual symposium on Computational geometry. 253--262.
[10]
Uriel Feige, David Peleg, and Guy Kortsarz. 2001. The dense k-subgraph problem. Algorithmica 29, 3 (2001), 410--421.
[11]
Santo Fortunato. 2010. Community detection in graphs. Physics reports 486, 3 (2010), 75--174.
[12]
Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. 2012. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs. In OSDI, Vol. 12. 2.
[13]
Joseph E Gonzalez, Reynold S Xin, Ankur Dave, Daniel Crankshaw, Michael J Franklin, and Ion Stoica. 2014. GraphX: Graph Processing in a Distributed Dataflow Framework. In OSDI, Vol. 14. 599--613.
[14]
Aapo Kyrola, Guy E. Blelloch, and Carlos Guestrin. 2012. GraphChi: Large-Scale Graph Computation on Just a PC. In 10th USENIX Symposium on Operating Systems Design and Implementation. 31--46.
[15]
Jinfeng Li, James Cheng, Fan Yang, Yuzhen Huang, Yunjian Zhao, Xiao Yan, and Ruihao Zhao. 2017. LoSHa: A General Framework for Scalable Locality Sensitive Hashing. In SIGIR. 635--644.
[16]
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. 2014. Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation. PVLDB 8, 3 (2014), 281--292.
[17]
Steffen Maass, Changwoo Min, Sanidhya Kashyap, Woon-Hak Kang, Mohan Kumar, and Taesoo Kim. 2017. Mosaic: Processing a Trillion-Edge Graph on a Single Machine. In Proceedings of the Twelfth European Conference on Computer Systems. 527--543.
[18]
Grzegorz Malewicz, Matthew H Austern, Aart JC Bik, James C Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: a system for large-scale graph processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 135--146.
[19]
Frank McSherry, Michael Isard, and Derek Gordon Murray. 2015. Scalability! But at what COST?. In HotOS.
[20]
Loïc Paulevé, Hervé Jégou, and Laurent Amsaleg. 2010. Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern Recognition Letters 31, 11 (2010), 1348--1358.
[21]
Bryan Perozzi, Leman Akoglu, Patricia Iglesias Sánchez, and Emmanuel Müller. 2014. Focused clustering and outlier detection in large attributed graphs. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 1346--1355.
[22]
Abdul Quamar, Amol Deshpande, and Jimmy Lin. 2016. NScale: neighborhood-centric large-scale graph analytics in the cloud. The VLDB Journal 25, 2 (2016), 125--150.
[23]
Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-stream: Edge-centric graph processing using streaming partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. 472--488.
[24]
Semih Salihoglu and Jennifer Widom. 2013. Gps: A graph processing system. In Proceedings of the 25th International Conference on Scientific and Statistical Database Management. 22.
[25]
Satu Elisa Schaeffer. 2007. Graph clustering. Computer science review 1, 1 (2007), 27--64.
[26]
Thomas Schank and Dorothea Wagner. 2005. Finding, counting and listing all triangles in large graphs, an experimental study. In International Workshop on Experimental and Efficient Algorithms. 606--609.
[27]
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. 2014. Parallel subgraph listing in a large-scale graph. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 625--636.
[28]
Arlei Silva, Wagner Meira Jr, and Mohammed J Zaki. 2012. Mining attribute-structure correlated patterns in large attributed graphs. Proceedings of the VLDB Endowment 5, 5 (2012), 466--477.
[29]
Isabelle Stanton and Gabriel Kliot. 2012. Streaming graph partitioning for large distributed graphs. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 1222--1230.
[30]
Zhao Sun, Hongzhi Wang, Haixun Wang, Bin Shao, and Jianzhong Li. 2012. Efficient subgraph matching on billion node graphs. Proceedings of the VLDB Endowment 5, 9 (2012), 788--799.
[31]
Carlos HC Teixeira, Alexandre J Fonseca, Marco Serafini, Georgos Siganos, Mohammed J Zaki, and Ashraf Aboulnaga. 2015. Arabesque: a system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles. 425--440.
[32]
Yuanyuan Tian, Andrey Balmin, Severin Andreas Corsten, Shirish Tatikonda, and John McPherson. 2013. From think like a vertex to think like a graph. Proceedings of the VLDB Endowment 7, 3 (2013), 193--204.
[33]
Etsuji Tomita and Tomokazu Seki. 2003. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete mathematics and theoretical computer science. Springer, 278--289.
[34]
Ming Wu, Fan Yang, Jilong Xue, Wencong Xiao, Youshan Miao, Lan Wei, Haoxiang Lin, Yafei Dai, and Lidong Zhou. 2015. GraM: scaling graph computation to the trillions. In Proceedings of the Sixth ACM Symposium on Cloud Computing. 408--421.
[35]
Da Yan, Yingyi Bu, Yuanyuan Tian, Amol Deshpande, and James Cheng. 2016. Big Graph Analytics Systems. In Proceedings of the 2016 International Conference on Management of Data. 2241--2243.
[36]
Da Yan, Hongzhi Chen, James Cheng, M. Tamer Özsu, Qizhen Zhang, and John C. S. Lui. 2017. G-thinker: Big Graph Mining Made Easier and Faster. CoRR abs/1709.03110 (2017).
[37]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2014. Blogel: A block-centric framework for distributed computation on real-world graphs. Proceedings of the VLDB Endowment 7, 14 (2014), 1981--1992.
[38]
Da Yan, James Cheng, Yi Lu, and Wilfred Ng. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In Proceedings of the 24th International Conference on World Wide Web. 1307--1317.
[39]
Da Yan, James Cheng, Kai Xing, Yi Lu, Wilfred Ng, and Yingyi Bu. 2014. Pregel algorithms for graph connectivity problems with performance guarantees. Proceedings of the VLDB Endowment 7, 14 (2014), 1821--1832.
[40]
Da Yan, Yuzhen Huang, Miao Liu, Hongzhi Chen, James Cheng, Huanhuan Wu, and Chengcui Zhang. 2018. GraphD: Distributed Vertex-Centric Graph Processing Beyond the Memory Limit. IEEE Trans. Parallel Distrib. Syst. 29, 1 (2018), 99--114.
[41]
Da Yan, Yuanyuan Tian, and James Cheng. 2017. Systems for Big Graph Analytics. Springer.
[42]
Xifeng Yan and Jiawei Han. 2002. gspan: Graph-based substructure pattern mining. In IEEE International Conference on Data Mining. 721--724.
[43]
Xifeng Yan and Jiawei Han. 2003. CloseGraph: mining closed frequent graph patterns. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. 286--295.
[44]
Fan Yang, Yuzhen Huang, Yunjian Zhao, Jinfeng Li, Guanxian Jiang, and James Cheng. 2017. The Best of Both Worlds: Big Data Programming with Both Productivity and Performance. In SIGMOD. 1619--1622.
[45]
Fan Yang, Jinfeng Li, and James Cheng. 2016. Husky: Towards a More Efficient and Expressive Distributed Computing Framework. In PVLDB, Vol. 9. 420--431.
[46]
Jaewon Yang, Julian McAuley, and Jure Leskovec. 2013. Community detection in networks with node attributes. In IEEE 13th international conference on Data Mining. 1151--1156.
[47]
Qizhen Zhang, Hongzhi Chen, Da Yan, James Cheng, Boon Thau Loo, and Purushotham Bangalore. 2017. Architectural implications on the performance and cost of graph analytics systems. In Proceedings of the 2017 Symposium on Cloud Computing. 40--51.
[48]
Yang Zhou, Hong Cheng, and Jeffrey Xu Yu. 2009. Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment 2, 1 (2009), 718--729.
[49]
Xiaowei Zhu, Wentao Han, and Wenguang Chen. 2015. GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning. In USENIX Annual Technical Conference. 375--386.

Cited By

View all
  • (2025)Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.350129236:2(133-149)Online publication date: Feb-2025
  • (2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/3694966Online publication date: 5-Sep-2024
  • (2024)Atom: An Efficient Query Serving System for Embedding-based Knowledge Graph Reasoning with Operator-level BatchingProceedings of the ACM on Management of Data10.1145/36771292:4(1-29)Online publication date: 30-Sep-2024
  • Show More Cited By
  1. G-Miner: an efficient task-oriented graph mining system

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    EuroSys '18: Proceedings of the Thirteenth EuroSys Conference
    April 2018
    631 pages
    ISBN:9781450355841
    DOI:10.1145/3190508
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. distributed system
    2. large-scale graph mining

    Qualifiers

    • Research-article

    Funding Sources

    • Hong Kong RGC

    Conference

    EuroSys '18
    Sponsor:
    EuroSys '18: Thirteenth EuroSys Conference 2018
    April 23 - 26, 2018
    Porto, Portugal

    Acceptance Rates

    EuroSys '18 Paper Acceptance Rate 43 of 262 submissions, 16%;
    Overall Acceptance Rate 241 of 1,308 submissions, 18%

    Upcoming Conference

    EuroSys '25
    Twentieth European Conference on Computer Systems
    March 30 - April 3, 2025
    Rotterdam , Netherlands

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)50
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Two-Dimensional Balanced Partitioning and Efficient Caching for Distributed Graph AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.350129236:2(133-149)Online publication date: Feb-2025
    • (2024)A Survey of Distributed Graph Algorithms on Massive GraphsACM Computing Surveys10.1145/3694966Online publication date: 5-Sep-2024
    • (2024)Atom: An Efficient Query Serving System for Embedding-based Knowledge Graph Reasoning with Operator-level BatchingProceedings of the ACM on Management of Data10.1145/36771292:4(1-29)Online publication date: 30-Sep-2024
    • (2024)Understanding High-Performance Subgraph Pattern Matching: A Systems PerspectiveProceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)10.1145/3661304.3661897(1-12)Online publication date: 14-Jun-2024
    • (2024)PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory HardwareProceedings of the ACM on Management of Data10.1145/36549642:3(1-25)Online publication date: 30-May-2024
    • (2024)Systems for Scalable Graph Analytics and Machine Learning: Trends and MethodsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671472(6627-6632)Online publication date: 25-Aug-2024
    • (2024)Expediting Distributed GNN Training with Feature-only Partition and Optimized Communication PlanningIEEE INFOCOM 2024 - IEEE Conference on Computer Communications10.1109/INFOCOM52122.2024.10621137(1321-1330)Online publication date: 20-May-2024
    • (2024)Faster Depth-First Subgraph Matching on GPUs2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00244(3151-3163)Online publication date: 13-May-2024
    • (2024)DuMatoJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104903191:COnline publication date: 18-Jul-2024
    • (2024)Balanced parallel triangle enumeration with an adaptive algorithmDistributed and Parallel Databases10.1007/s10619-023-07437-x42:1(103-141)Online publication date: 1-Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media