Abstract
Community detection is an increasingly popular approach to uncover important structures in large networks. Flow-based community detection methods rely on communication patterns of the network rather than structural properties to determine communities. The Infomap algorithm in particular optimizes a novel objective function called the map equation and has been shown to outperform other approaches in third-party benchmarks. However, Infomap and its variants are inherently sequential, limiting their use for large-scale graphs.
In this article, we propose a novel algorithm to optimize the map equation called RelaxMap. RelaxMap provides two important improvements over Infomap: parallelization, so that the map equation can be optimized over much larger graphs, and prioritization, so that the most important work occurs first, iterations take less time, and the algorithm converges faster. We implement these techniques using OpenMP on shared-memory multicore systems, and evaluate our approach on a variety of graphs from standard graph clustering benchmarks as well as real graph datasets. Our evaluation shows that both techniques are effective: RelaxMap achieves 70% parallel efficiency on eight cores, and prioritization improves algorithm performance by an additional 20--50% on average, depending on the graph properties. Additionally, RelaxMap converges in the similar number of iterations and provides solutions of equivalent quality as the serial Infomap implementation.
- Rodrigo Aldecoa and Ignacio Marín. 2013. Exploring the limits of community detection strategies in complex networks. Scientific Reports 3, 2216 (2013).Google Scholar
- Seung-Hee Bae, Daniel Halperin, Jevin West, Martin Rosvall, and Bill Howe. 2013. Scalable flow-based community detection for large-scale network analysis. In Proceedings of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW’13). IEEE, 303--310.Google ScholarDigital Library
- Sanjukta Bhowmick and Sriram Srinivasan. 2013. A template for parallelizing the Louvain method for modularity maximization. In Dynamics On and Of Complex Networks, Volume 2. Animesh Mukherjee, Monojit Choudhury, Fernando Peruani, Niloy Ganguly, and Bivas Mitra (Eds.). Springer, New York, NY, 111--124.Google Scholar
- Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.Google ScholarCross Ref
- Paolo Boldi and Sebastiano Vigna. 2004. The WebGraph framework I: Compression techniques. In Proc. of the Thirteenth International World Wide Web Conference (WWW’04). ACM, Manhattan, 595--601. Google ScholarDigital Library
- Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 1 (1998), 107--117. Google ScholarDigital Library
- Aaron Clauset, Mark E. J. Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 70, 6 (2004), 066111.Google ScholarCross Ref
- Anne Condon and Richard M. Karp. 2001. Algorithms for graph partitioning on the planted partition model. Random Structures and Algorithms 18, 2 (2001), 116--140. Google ScholarDigital Library
- Leon Danon, Albert Diaz-Guilera, Jordi Duch, and Alex Arenas. 2005. Comparing community structure identification. Journal of Statistical Mechanics: Theory and Experiment 2005, 9 (2005), P09008.Google ScholarCross Ref
- Jaliya Ekanayake, Hui Li, Bingjing Zhang, Thilina Gunarathne, Seung-Hee Bae, Judy Qiu, and Geoffrey Fox. 2010. Twister: A runtime for iterative mapreduce. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, 810--818. Google ScholarDigital Library
- Bas Fagginger Auer and Rob H. Bisseling. 2013. Graph coarsening and clustering on the GPU. In Graph Partitioning and Graph Clustering: 10th DIMACS Implementation Challenge Workshop, vol. 588. American Mathematical Society.Google Scholar
- Santo Fortunato and Marc Barthélemy. 2007. Resolution limit in community detection. Proceedings of the National Academy of Sciences 104, 1 (Jan. 2007), 36--41.Google ScholarCross Ref
- Anne-Claude Gavin, Patrick Aloy, Paola Grandi, Roland Krause, Markus Boesche, Martina Marzioch, Christina Rau, Lars Juhl Jensen, Sonja Bastuck, Birgit Dümpelfeld, and others. 2006. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 7084 (2006), 631--636.Google ScholarCross Ref
- Michelle Girvan and Mark E. J. Newman. 2002. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.Google ScholarCross Ref
- David F. Gleich and Leonid Zhukov. 2005. Scalable computing with power-law graphs: Experience with parallel PageRank. In Proceedings of 2005 IEEE SuperComputing Conference (Poster).Google Scholar
- Roger Guimera and Luis A. Nunes Amaral. 2005. Functional cartography of complex metabolic networks. Nature 433, 7028 (2005), 895--900.Google ScholarCross Ref
- Roger Guimerà, Marta Sales-Pardo, and L. A. N. Amaral. 2004. Modularity from fluctuations in random graphs and complex networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 70, 2 (2004), 025101(R).Google ScholarCross Ref
- Tatsuro Kawamoto and Martin Rosvall. 2015. Estimating the resolution limit of the map equation in community detection. Phys. Rev. E 91, 1 (Jan. 2015), 012809.Google ScholarCross Ref
- Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 591--600. Google ScholarDigital Library
- Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: A comparative analysis. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 80 (2009), 056117.Google ScholarCross Ref
- Andrea Lancichinetti, Santo Fortunato, and Filippo Radicchi. 2008. Benchmark graphs for testing community detection algorithms. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 78, 4 (2008), 046110.Google ScholarCross Ref
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. (2014). Retrieved June 2014 from http://snap.stanford.edu/data.Google Scholar
- Jure Leskovec, Kevin J. Lang, and Michael Mahoney. 2010. Empirical comparison of algorithms for network community detection. In Proceedings of the 19th International Conference on World Wide Web (WWW’10). ACM, New York, NY, 631--640. Google ScholarDigital Library
- Xin Liu and Tsuyoshi Murata. 2010. Advanced modularity-specialized label propagation algorithm for detecting communities in networks. Physica A: Statistical Mechanics and its Applications 389, 7 (2010), 1493--1500.Google Scholar
- Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In Proceedings of the VLDB Endowment 5, 8 (Apr. 2012), 716--727. http://dl.acm.org/citation. cfm?id=2212351.2212354 Google ScholarDigital Library
- Padmanabhan K. Menon, Gregory D. Sweriduk, and Karl D. Bilimoria. 2004. New approach for modeling, analysis, and control of air traffic flow. Journal of Guidance, Control, and Dynamics 27, 5 (2004), 737--744.Google ScholarCross Ref
- Mark E. J. Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical Review E: Statistical, Nonlinear, and Soft Matter Physics 69, 2 (2004), 026113.Google ScholarCross Ref
- Feng Niu, Benjamin Recht, Christopher Ré, and Stephen J. Wright. 2011. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS. http://books.nips.cc/papers/files/nips24/NIPS2011_ 0485.pdf. Google ScholarDigital Library
- OpenMP Architecture Review Board. 2008. OpenMP Application Program Interface Version 3.0. (2008). Retrieved May 2008 from http://www.openmp.org/mp-documents/spec30.pdf.Google Scholar
- E. Jason Riedy, Henning Meyerhenke, David Ediger, and David A Bader. 2011. Parallel community detection for massive graphs. In Proceedings of the 9th International Conference on Parallel Processing and Applied Mathematics -- Volume Part I. Springer-Verlag, Berlin Heidelberg, 286--296. Google ScholarDigital Library
- Jason Riedy, David A. Bader, and Henning Meyerhenke. 2012. Scalable multi-threaded community detection in social networks. In Proceedings of the 2012 IEEE 26th International on Parallel and Distributed Processing Symposium Workshops 8 PhD Forum (IPDPSW’12). IEEE, 1619--1628. Google ScholarDigital Library
- Martin Rosvall, Daniel Axelsson, and Carl T. Bergstrom. 2009. The map equation. The European Physical Journal Special Topics 178, 1 (2009), 13--23.Google ScholarCross Ref
- Martin Rosvall and Carl T. Bergstrom. 2011. Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems. PLoS One 6, 4 (2011), e18209.Google ScholarCross Ref
- Martin Rosvall, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte. 2014. Memory in network flows and its effects on spreading dynamics and community detection. Nature Communications 5 (2014).Google Scholar
- Claude Elwood Shannon. 1948a. A mathematical theory of communication. The Bell System Technical Journal 27, 3 (Jul. 1948), 379--423.Google ScholarCross Ref
- Claude Elwood Shannon. 1948b. A mathematical theory of communication. The Bell System Technical Journal 27, 4 (Oct. 1948), 623--656.Google ScholarCross Ref
- Hiroaki Shiokawa, Yasuhiro Fujiwara, and Makoto Onizuka. 2013. Fast algorithm for modularity-based graph clustering. In Proceedings of the 27th AAAI Conference on Artificial Intelligence. 1170--1176. Google ScholarDigital Library
- C. Staudt and H. Meyerhenke. 2015. Engineering parallel algorithms for community detection in massive networks. IEEE Transactions on Parallel and Distributed Systems, 27, 1 (2016), 171--184. Google ScholarDigital Library
- Christian L. Staudt, Aleksejs Sazonovs, and Henning Meyerhenke. 2014. NetworKit: A Tool Suite for Large-scale Complex Network Analysis. arXiv:1403.3005, https://arxiv.org/abs/1403.3005.Google Scholar
- Jevin D. West, Theodore C. Bergstrom, and Carl T. Bergstrom. 2010. The eigenfactor metrics<sup>TM</sup>: A network approach to assessing scholarly journals. College 8 Research Libraries 71, 3 (2010), 236--244.Google Scholar
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, 2. Google ScholarDigital Library
- Yanfeng Zhang, Qixin Gao, Lixin Gao, and Cuirong Wang. 2011. Priter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, 13. Google ScholarDigital Library
- Yuzhou Zhang, Jianyong Wang, Yi Wang, and Lizhu Zhou. 2009. Parallel community detection on large networks with propinquity dynamics. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 997--1006. Google ScholarDigital Library
Index Terms
- Scalable and Efficient Flow-Based Community Detection for Large-Scale Graph Analysis
Recommendations
Parallel community detection for massive graphs
PPAM'11: Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part ITackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to ...
Scalable Flow-Based Community Detection for Large-Scale Network Analysis
ICDMW '13: Proceedings of the 2013 IEEE 13th International Conference on Data Mining WorkshopsCommunity-detection is a powerful approach to uncover important structures in large networks. Since networks often describe flow of some entity, flow-based community-detection methods are particularly interesting. One such algorithm is called Info map, ...
Scalable distributed Louvain algorithm for community detection in large graphs
AbstractCommunity detection (or clustering) in large-scale graphs is an important problem in graph mining. Communities reveal interesting organizational and functional characteristics of a network. Louvain algorithm is an efficient sequential algorithm ...
Comments