skip to main content
research-article
Public Access

Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis

Published:06 March 2017Publication History
Skip Abstract Section

Abstract

Community detection is an increasingly popular approach to uncover important structures in large networks. Flow-based community detection methods rely on communication patterns of the network rather than structural properties to determine communities. The Infomap algorithm in particular optimizes a novel objective function called the map equation and has been shown to outperform other approaches in third-party benchmarks. However, Infomap and its variants are inherently sequential, limiting their use for large-scale graphs.

In this article, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on eight cores, and prioritization improves algorithm performance by an additional 20--50% on average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.

References

  1. Rodrigo Aldecoa and Ignacio Marín. 2013. Exploring the limits of community detection strategies in complex networks. Scientific Reports 3, 2216 (2013).Google ScholarGoogle Scholar
  2. Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe. 2013. Scalable flow-based community detection for large-scale network analysis. In Proceedings of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW’13). IEEE, 303--310.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sanjukta Bhowmick and Sriram Srinivasan. 2013. A template for parallelizing the Louvain method for modularity maximization. In Dynamics On and Of Complex Networks, Volume 2. Animesh Mukherjee, Monojit Choudhury, Fernando Peruani, Niloy Ganguly, and Bivas Mitra (Eds.). Springer, New York, NY, 111--124.Google ScholarGoogle Scholar
  4. Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.Google ScholarGoogle ScholarCross RefCross Ref
  5. Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW’04). ACM, Manhattan, 595--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Aaron Clauset, Mark E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 70, 6 (2004), 066111.Google ScholarGoogle ScholarCross RefCross Ref
  8. Anne Condon and Richard M. Karp. 2001. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms 18, 2 (2001), 116--140. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. 2005. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005, 9 (2005), P09008.Google ScholarGoogle ScholarCross RefCross Ref
  10. Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, 810--818. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Bas Fagginger Auer and Rob H. Bisseling. 2013. Graph coarsening and clustering on the GPU. In Graph Partitioning and Graph Clustering: 10th DIMACS Implementation Challenge Workshop, vol. 588. American Mathematical Society.Google ScholarGoogle Scholar
  12. Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 1 (Jan. 2007), 36--41.Google ScholarGoogle ScholarCross RefCross Ref
  13. Anne-Claude Gavin, Patrick Aloy, Paola Grandi, Roland Krause, Markus Boesche, Martina Marzioch, Christina Rau, Lars Juhl Jensen, Sonja Bastuck, Birgit Dümpelfeld, and others. 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 7084 (2006), 631--636.Google ScholarGoogle ScholarCross RefCross Ref
  14. Michelle Girvan and Mark E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.Google ScholarGoogle ScholarCross RefCross Ref
  15. David F. Gleich and Leonid Zhukov. 2005. Scalable computing with power-law graphs: Experience with parallel PageRank. In Proceedings of 2005 IEEE SuperComputing Conference (Poster).Google ScholarGoogle Scholar
  16. Roger Guimera and Luis A. Nunes Amaral. 2005. Functional cartography of complex metabolic networks. Nature 433, 7028 (2005), 895--900.Google ScholarGoogle ScholarCross RefCross Ref
  17. Roger Guimerà, Marta Sales-Pardo, and L. A. N. Amaral. 2004. Modularity from fluctuations in random graphs and complex networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 70, 2 (2004), 025101(R).Google ScholarGoogle ScholarCross RefCross Ref
  18. Tatsuro Kawamoto and Martin Rosvall. 2015. Estimating the resolution limit of the map equation in community detection. Phys. Rev. E 91, 1 (Jan. 2015), 012809.Google ScholarGoogle ScholarCross RefCross Ref
  19. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 591--600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 80 (2009), 056117.Google ScholarGoogle ScholarCross RefCross Ref
  21. Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 78, 4 (2008), 046110.Google ScholarGoogle ScholarCross RefCross Ref
  22. Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (2014). Retrieved June 2014 from http://snap.stanford.edu/data.Google ScholarGoogle Scholar
  23. Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 631--640. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xin Liu and Tsuyoshi Murata. 2010. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. Physica A: Statistical Mechanics and its Applications 389, 7 (2010), 1493--1500.Google ScholarGoogle Scholar
  25. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the VLDB Endowment 5, 8 (Apr. 2012), 716--727. http://dl.acm.org/citation. cfm?id=2212351.2212354 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Padmanabhan K. Menon, Gregory D. Sweriduk, and Karl D. Bilimoria. 2004. New approach for modeling, analysis, and control of air traffic flow. Journal of Guidance, Control, and Dynamics 27, 5 (2004), 737--744.Google ScholarGoogle ScholarCross RefCross Ref
  27. Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 69, 2 (2004), 026113.Google ScholarGoogle ScholarCross RefCross Ref
  28. Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J. Wright. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS. http://books.nips.cc/papers/files/nips24/NIPS2011_ 0485.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. OpenMP Architecture Review Board. 2008. OpenMP Application Program Interface Version 3.0. (2008). Retrieved May 2008 from http://www.openmp.org/mp-documents/spec30.pdf.Google ScholarGoogle Scholar
  30. E. Jason Riedy, Henning Meyerhenke, David Ediger, and David A Bader. 2011. Parallel community detection for massive graphs. In Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics -- Volume Part I. Springer-Verlag, Berlin Heidelberg, 286--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jason Riedy, David A. Bader, and Henning Meyerhenke. 2012. Scalable multi-threaded community detection in social networks. In Proceedings of the 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’12). IEEE, 1619--1628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Martin Rosvall, Daniel Axelsson, and Carl T. Bergstrom. 2009. The map equation. The European Physical Journal Special Topics 178, 1 (2009), 13--23.Google ScholarGoogle ScholarCross RefCross Ref
  33. Martin Rosvall and Carl T. Bergstrom. 2011. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS One 6, 4 (2011), e18209.Google ScholarGoogle ScholarCross RefCross Ref
  34. Martin Rosvall, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte. 2014. Memory in network flows and its effects on spreading dynamics and community detection. Nature Communications 5 (2014).Google ScholarGoogle Scholar
  35. Claude Elwood Shannon. 1948a. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (Jul. 1948), 379--423.Google ScholarGoogle ScholarCross RefCross Ref
  36. Claude Elwood Shannon. 1948b. A mathematical theory of communication. The Bell System Technical Journal 27, 4 (Oct. 1948), 623--656.Google ScholarGoogle ScholarCross RefCross Ref
  37. Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast algorithm for modularity-based graph clustering. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1170--1176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Staudt and H. Meyerhenke. 2015. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems, 27, 1 (2016), 171--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. NetworKit: A Tool Suite for Large-scale Complex Network Analysis. arXiv:1403.3005, https://arxiv.org/abs/1403.3005.Google ScholarGoogle Scholar
  40. Jevin D. West, Theodore C. Bergstrom, and Carl T. Bergstrom. 2010. The eigenfactor metrics<sup>TM</sup>: A network approach to assessing scholarly journals. College 8 Research Libraries 71, 3 (2010), 236--244.Google ScholarGoogle Scholar
  41. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. Priter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Yuzhou Zhang, Jianyong Wang, Yi Wang, and Lizhu Zhou. 2009. Parallel community detection on large networks with propinquity dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 997--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Knowledge Discovery from Data
        ACM Transactions on Knowledge Discovery from Data  Volume 11, Issue 3
        August 2017
        372 pages
        ISSN:1556-4681
        EISSN:1556-472X
        DOI:10.1145/3058790
        Issue’s Table of Contents

        Copyright © 2017 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 March 2017
        • Accepted: 1 August 2016
        • Revised: 1 June 2016
        • Received: 1 December 2014
        Published in tkdd Volume 11, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader