ABSTRACT
Networks are a general language for representing relational information among objects. An effective way to model, reason about, and summarize networks, is to discover sets of nodes with common connectivity patterns. Such sets are commonly referred to as network communities. Research on network community detection has predominantly focused on identifying communities of densely connected nodes in undirected networks.
In this paper we develop a novel overlapping community detection method that scales to networks of millions of nodes and edges and advances research along two dimensions: the connectivity structure of communities, and the use of edge directedness for community detection. First, we extend traditional definitions of network communities by building on the observation that nodes can be densely interlinked in two different ways: In cohesive communities nodes link to each other, while in 2-mode communities nodes link in a bipartite fashion, where links predominate between the two partitions rather than inside them. Our method successfully detects both 2-mode as well as cohesive communities, that may also overlap or be hierarchically nested. Second, while most existing community detection methods treat directed edges as though they were undirected, our method accounts for edge directions and is able to identify novel and meaningful community structures in both directed and undirected networks, using data from social, biological, and ecological domains.
- L. Adamic and N. Glance. The political blogosphere and the 2004 U.S. election: divided they blog. In LinkKDD '05, 2005. Google ScholarDigital Library
- Y.-Y. Ahn, J. Bagrow, and S. Lehmann. Link communities reveal multi-scale complexity in networks. Nature, 2010.Google Scholar
- E. Airoldi, D. Blei, S. Fienberg, and E. Xing. Mixed membership stochastic blockmodels. JMLR, 2007. Google ScholarDigital Library
- R. Andersen and K. Lang. Communities from seed sets. In WWW '06, 2006. Google ScholarDigital Library
- R. Balasubramanyan and W. Cohen. Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In SDM '11, 2011.Google ScholarCross Ref
- B. Ball, B. Karrer, and M. Newman. Efficient and principled method for detecting communities in networks. In Phys. Rev. E, 2011.Google Scholar
- E. Boyle et al. GO::TermFinder--open source software for accessing gene ontology information and finding significantly enriched gene ontology terms associated with a list of genes. Bioinformatics, 2004. Google ScholarDigital Library
- L. Breiger. The duality of persons and groups. Social Forces, 1974.Google ScholarCross Ref
- R. Burt. Cohesion versus structural equivalence as a basis for network subgroups. Sociological Methods and Research, 1978.Google ScholarCross Ref
- D. Carney, B. Davies, and B. Horazdovsky. Vps9 domain-containing proteins: activators of Rab5 GTPases from yeast to neurons. Trends in Cell Biology, 2006.Google ScholarCross Ref
- M. Coscia, G. Rossetti, F. Giannotti, and D. Pedreschi. Demon: a local-first discovery method for overlapping communities. In KDD '12, 2012. Google ScholarDigital Library
- L. Danon, J. Duch, A. Diaz-Guilera, and A. Arenas. Comparing community structure identification. J. of Stat. Mech.: Theory and Experiment, 2005.Google Scholar
- I. Dhillon, Y. Guan, and B. Kulis. Weighted graph cuts without eigenvectors: A multilevel approach. IEEE PAMI, 2007. Google ScholarDigital Library
- S. Fortunato. Community detection in graphs. Physics Reports, 2010.Google ScholarCross Ref
- D. Gleich and Seshadhri. Neighborhoods are good communities. In KDD '12, 2012.Google Scholar
- P. Gopalan, D. Mimno, S. Gerrish, M. Freedman, and D. Blei. Scalable inference of overlapping communities. In NIPS '12, 2012.Google Scholar
- K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu, D. Koutra, C. Faloutsos, and L. Li. Rolx: structural role extraction & mining in large graphs. In KDD, 2012. Google ScholarDigital Library
- P. W. Holland, K. B. Laskey, and S. Leinhardt. Stochastic blockmodels: First steps. Social Networks, 1983.Google Scholar
- C.-J. Hsieh and I. S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In KDD '11, 2011. Google ScholarDigital Library
- S. Johnson. Hierarchical clustering schemes. Psychometrika, 1967.Google ScholarCross Ref
- G. Karypis and V. Kumar. Multilevel k-way partitioning scheme for irregular graphs. J. of Parallel and Distributed Computing, 1998. Google ScholarDigital Library
- B. Klimt and Y. Yang. Introducing the enron corpus. In CEAS '04, 2004.Google Scholar
- R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cyber-communities. Computer Networks, 1999. Google ScholarDigital Library
- H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In WWW '10, 2010. Google ScholarDigital Library
- S. Lattanzi and D. Sivakumar. Affiliation networks. In STOC '09, 2009. Google ScholarDigital Library
- E. Leicht and M. E. Newman. Community structure in directed networks. Phys. Rev. Lett., 2008.Google Scholar
- J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD '05, 2005. Google ScholarDigital Library
- J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney. Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics, 2009.Google ScholarCross Ref
- C-J. Lin. Projected Gradient Methods for Nonnegative Matrix Factorization. Neural Computation, 2007. Google ScholarDigital Library
- F. Malliaros and M. Vazirgiannis. Clustering and Community Detection in Directed Networks: A Survey. Physics Reports, 2013.Google ScholarCross Ref
- J. McAuley and J. Leskovec. Learning to discover social circles in ego networks. In NIPS '12, 2012.Google Scholar
- A. McCallum, X. Wang, and A. Corrada-Emmanuel. Topic and role discovery in social networks with experiments on enron and academic email. JAIR, 2007. Google ScholarDigital Library
- A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee. Measurement and analysis of online social networks. In IMC '07, 2007. Google ScholarDigital Library
- G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 2005.Google Scholar
- S. Papadopoulos, Y. Kompatsiaris, A. Vakali, and P. Spyridonos. Community detection in social media. DMKD, 2011. Google ScholarDigital Library
- S. Pinkert, J. Schultz, and J. Reichardt. Protein interaction networks---more than mere modules. PLoS CompBio, 2010.Google ScholarCross Ref
- F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi. Defining and identifying communities in networks. PNAS, 2004.Google ScholarCross Ref
- M. Rosvall and C. Bergstrom. An information-theoretic framework for resolving community structure in complex networks. PNAS, 2007.Google ScholarCross Ref
- M. Rosvall and C. T. Bergstrom. Maps of random walks on complex networks reveal community structure. PNAS, 2008.Google ScholarCross Ref
- V. Satuluri and S. Parthasarathy. Scalable Graph Clustering using Stochastic Flows: Applications to Community Discovery. KDD '09, 2009. Google ScholarDigital Library
- R. Ulanowicz, C. Bondavalli, and M. Egnotovich. Network analysis of trophic dynamics in south florida ecosystem, fy 97: The florida bay ecosystem. Ref. CBL98-123. Chesapeake Biological Laboratory, 1998.Google Scholar
- J. Xie, S. Kelley, and B. K. Szymanski. Overlapping community detection in networks: the state of the art and comparative study. ACM Computing Surveys, 2013. Google ScholarDigital Library
- J. Yang and J. Leskovec. Community-affiliation network model for overlapping community detection. In ICDM '12, 2012. Google ScholarDigital Library
- J. Yang and J. Leskovec. Defining and evaluating network communities based on ground-truth. In ICDM '12, 2012. Google ScholarDigital Library
- J. Yang and J. Leskovec. Overlapping community detection at scale: A non-negative factorization approach. In WSDM '13, 2013. Google ScholarDigital Library
- H. Yu, P. Braun, M. A. Yildirim, I. Lemmens, K. Venkatesan, J. Sahalie, T. Hirozane-Kishikawa, F. Gebreab, N. Li, N. Simonis, and et al. High-quality binary protein interaction map of the yeast interactome network. Science, 2008.Google Scholar
- E. Zheleva, H. Sharara, and L. Getoor. Co-evolution of social and affiliation networks. In KDD '09, 2009. Google ScholarDigital Library
Index Terms
- Detecting cohesive and 2-mode communities indirected and undirected networks
Recommendations
Overlapping community detection at scale: a nonnegative matrix factorization approach
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningNetwork communities represent basic structures for understanding the organization of real-world networks. A community (also referred to as a module or a cluster) is typically thought of as a group of nodes with more connections amongst its members than ...
Structure and Overlaps of Ground-Truth Communities in Networks
Special Issue on Linking Social Granularity and FunctionsOne of the main organizing principles in real-world networks is that of network communities, where sets of nodes organize into densely linked clusters. Even though detection of such communities is of great interest, understanding the structure ...
Defining and evaluating network communities based on ground-truth
Nodes in real-world networks organize into densely linked communities where edges appear with high concentration among the members of the community. Identifying such communities of nodes has proven to be a challenging task due to a plethora of ...
Comments