skip to main content
10.1145/2783258.2783385acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling

Published: 10 August 2015 Publication History

Abstract

Extracting dense subgraphs from large graphs is a key primitive in a variety of graph mining applications, ranging from mining social networks and the Web graph to bioinformatics [41]. In this paper we focus on a family of poly-time solvable formulations, known as the k-clique densest subgraph problem (k-Clique-DSP) [57]. When k=2, the problem becomes the well-known densest subgraph problem (DSP) [22, 31, 33, 39]. Our main contribution is a sampling scheme that gives densest subgraph sparsifier, yielding a randomized algorithm that produces high-quality approximations while providing significant speedups and improved space complexity. We also extend this family of formulations to bipartite graphs by introducing the (p,q)-biclique densest subgraph problem ((p,q)-Biclique-DSP), and devise an exact algorithm that can treat both clique and biclique densities in a unified way.
As an example of performance, our sparsifying algorithm extracts the 5-clique densest subgraph --which is a large-near clique on 62 vertices-- from a large collaboration network. Our algorithm achieves 100% accuracy over five runs, while achieving an average speedup factor of over 10,000. Specifically, we reduce the running time from ∼2 107 seconds to an average running time of 0.15 seconds. We also use our methods to study how the k-clique densest subgraphs change as a function of time in time-evolving networks for various small values of k. We observe significant deviations between the experimental findings on real-world networks and stochastic Kronecker graphs, a random graph model that mimics real-world networks in certain aspects.
We believe that our work is a significant advance in routines with rigorous theoretical guarantees for scalable extraction of large near-cliques from networks.

Supplementary Material

MP4 File (p815.mp4)

References

[1]
http://www.avglab.com/soft/hipr.tar.
[2]
http://snap.stanford.edu/data/index.html.
[3]
http://grouplens.org/datasets.
[4]
http://research.nii.ac.jp/~uno/codes.htm.
[5]
Large Near-Clique Detection. http://tinyurl.com/o6y33g9.
[6]
J. Abello, M. G. C. Resende, and S. Sudarsky. Massive quasi-clique detection. In LATIN, 2002.
[7]
R. Andersen and K. Chellapilla. Finding dense subgraphs with size bounds. In WAW, 2009.
[8]
A. Andoni, A. Gupta, and R. Krauthgamer. Towards (1+ ε)-approximate flow sparsifiers. In SODA, pages 279--293. SIAM, 2014.
[9]
A. Angel, N. Sarkas, N. Koudas, and D. Srivastava. Dense subgraph maintenance under streaming edge weight updates for real-time story identification. In VLDB, 5(6), pages 574--585, Feb. 2012.
[10]
Y. Asahiro, K. Iwama, H. Tamaki, and T. Tokuyama. Greedily finding a dense subgraph. In Journal of Algorithms, 34(2), 2000.
[11]
G. D. Bader and C. W. Hogue. An automated method for finding molecular complexes in large protein interaction networks. In BMC bioinformatics, 2003.
[12]
B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. In VLDB, 5(5):454--465, 2012.
[13]
O. D. Balalau, F. Bonchi, T. Chan, F. Gullo, and M. Sozio. Finding subgraphs with maximum total density and limited overlap. In WSDM, pages 379--388. ACM, 2015.
[14]
V. Batagelj and M. Zaversnik. An o(m) algorithm for cores decomposition of networks. In Arxiv, arXiv.cs/0310049, 2003.
[15]
A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in Õ(n2) time. In STOC, pages 47--55, 1996.
[16]
A. Bhaskara, M. Charikar, E. Chlamtac, U. Feige, and A. Vijayaraghavan. Detecting high log-densities: an o(n1/4) approximation for densest k-subgraph. In STOC, pages 201--210, 2010.
[17]
S. Bhattacharya, M. Henzinger, D. Nanongkai, and C. E. Tsourakakis. Space-and time-efficient algorithm for maintaining dense subgraphs on one-pass dynamic streams. In STOC, 2015 (to appear).
[18]
I. M. Bomze, M. Budinich, P. M. Pardalos, and M. Pelillo. The maximum clique problem. In Handbook of combinatorial optimization, pages 1--74, 1999.
[19]
F. Bonchi, F. Gullo, A. Kaltenbrunner, and Y. Volkovich. Core decomposition of uncertain graphs. In KDD, pages 1316--1325, 2014.
[20]
C. Bron and J. Kerbosch. Algorithm 457: finding all cliques of an undirected graph. In Communications of the ACM, 16(9):575--577, 1973.
[21]
G. Buehrer and K. Chellapilla. A scalable pattern mining approach to web graph compression with communities. In WSDM, pages 95--106, 2008.
[22]
M. Charikar. Greedy approximation algorithms for finding dense components in a graph. In APPROX, pages 84--95, 2000.
[23]
J. Chen, Y. Saad. Dense Subgraph Extraction with Application to Community Detection. In TKDE, vol. 24, pages 1216--1230, 2012.
[24]
B. V. Cherkassky and A. V. Goldberg. On implementing the push-relabel method for the maximum flow problem. In Algorithmica, 19(4), pages 390--410, 1997.
[25]
N. Chiba and T. Nishizeki. Arboricity and subgraph listing algorithms. In SIAM Journal on Computing, 14(1), pages 210--223, 1985.
[26]
D. Eppstein. Arboricity and bipartite subgraph listing algorithms. In Information Processing Letters, 51(4), pages 207--211, 1994.
[27]
D. Eppstein, M. Löffler, and D. Strash. Listing all maximal cliques in sparse graphs in near-optimal time. In ISAAC, 2010.
[28]
U. Feige, G. Kortsarz, and D. Peleg. The dense k-subgraph problem. In Algorithmica, 29(3), 2001.
[29]
I. Finocchi, M. Finocchi, and E. G. Fusco. Counting small cliques in mapreduce. In ArXiv arXiv:1403.0734, 2014.
[30]
E. Fratkin, B. T. Naughton, D. L. Brutlag, and S. Batzoglou. Motifcut: regulatory motifs finding with maximum density subgraphs. In Bioinformatics, vol. 22(14), pages 150--157, 2006.
[31]
G. Gallo, M. D. Grigoriadis, and R. E. Tarjan. A fast parametric maximum flow algorithm and applications. In Journal of Computing, 18(1), 1989.
[32]
D. Gibson, R. Kumar, and A. Tomkins. Discovering large dense subgraphs in massive graphs. In VLDB, pages 721--732, 2005.
[33]
A. V. Goldberg. Finding a maximum density subgraph. Tech. report, UC Berkeley, 1984.
[34]
A. V. Goldberg and R. E. Tarjan. A new approach to the maximum-flow problem. In Journal of the ACM (JACM), 35(4), pages 921--940, 1988.
[35]
J. Håstad. Clique is hard to approximate within n1-ε. In Acta Mathematica, 182(1), 1999.
[36]
R. Jin, Y. Xiang, N. Ruan, and D. Fuhry. 3-hop: a high-compression indexing scheme for reachability query. In SIGMOD, 2009.
[37]
D. S. Johnson and M. A. Trick. Cliques, coloring, and satisfiability: second DIMACS implementation challenge American Mathematical Soc., 1996.
[38]
R. Kannan and V. Vinay. Analyzing the structure of large graphs, manuscript, 1999.
[39]
S. Khuller and B. Saha. On finding dense subgraphs. In ICALP, 2009.
[40]
D. E. Knuth. Seminumerical algorithms. 2007.
[41]
V. E. Lee, N. Ruan, R. Jin, and C. C. Aggarwal. A survey of algorithms for dense subgraph discovery. In Managing and Mining Graph Data, pages 303--336, Springer, 2010.
[42]
Y. T. Lee and A. Sidford. Path finding methods for linear programming: Solving linear programs in õ (vrank) iterations and faster algorithms for maximum flow. In FOCS, pages 424--433, 2014.
[43]
F. T. Leighton and A. Moitra. Extensions and limits to vertex sparsification. In STOC, pages 47--56, 2010.
[44]
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. In The Journal of Machine Learning Research, vol. 11, pages 985--1042, 2010.
[45]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In KDD, pages 177--187, 2005.
[46]
K. Makino and T. Uno. New algorithms for enumerating all maximal cliques. In SWAT pages 260--272, 2004.
[47]
M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
[48]
A. Moitra. Approximation algorithms for multicommodity-type problems with guarantees independent of the graph size. In FOCS, pages 3--12, 2009.
[49]
J. B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization. In Mathematical Programming, vol. 118(2), pages 237--251, 2009.
[50]
R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. In Information Processing Letters, vol. 112(7), pages 277--281, 2012.
[51]
A. E. Sariyuce, C. Seshadhri, A. Pinar, and U. V. Catalyurek. Finding the hierarchy of dense subgraphs using nucleus decompositions. In WWW, 2015.
[52]
D. A. Spielman and S.-H. Teng. Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In STOC, pages 81--90, 2004.
[53]
M. Thorup and U. Zwick. Approximate distance oracles. In Journal of the ACM (JACM), vol. 52(1), pages 1--24, 2005.
[54]
M. Thorup and U. Zwick. Spanners and emulators with sublinear distance errors. In SODA, pages 802--809, 2006.
[55]
C.E. Tsourakakis, F. Bonchi, A. Gionis, F. Gullo, and M. Tsiarli. Denser than the densest subgraph: extracting optimal quasi-cliques with quality guarantees. In KDD, pages 104--112, 2013.
[56]
C. E. Tsourakakis. Mathematical and Algorithmic Analysis of Network and Biological Data. PhD thesis, Carnegie Mellon University, 2013.
[57]
C. E. Tsourakakis. The k-clique densest subgraph problem. In WWW, pages 1122--1132, 2015.
[58]
C. E. Tsourakakis, M. N. Kolountzakis, and G. L. Miller. Triangle sparsifiers. In J. Graph Algorithms Appl., 15(6), pages 703--726, 2011.
[59]
T. Uno. An efficient algorithm for solving pseudo clique enumeration problem. In Algorithmica, 56(1), 2010.
[60]
N. Wang, J. Zhang, K.-L. Tan, and A. K. Tung. On triangulation-based dense neighborhood graph discovery. In VLDB, 4(2), pages 58--68, 2010.

Cited By

View all
  • (2025)Density Decomposition of Bipartite GraphsProceedings of the ACM on Management of Data10.1145/37096803:1(1-25)Online publication date: 11-Feb-2025
  • (2025)Q-DISCO: Query-Centric Densest Subgraphs in Networks with Opinion InformationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703502(194-203)Online publication date: 10-Mar-2025
  • (2025)Colorful Star Motif Counting: Concepts, Algorithms and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.351499737:3(1105-1125)Online publication date: Mar-2025
  • Show More Cited By

Index Terms

  1. Scalable Large Near-Clique Detection in Large-Scale Networks via Sampling

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dense subgraphs
    2. graph mining
    3. near-clique extraction

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)46
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Density Decomposition of Bipartite GraphsProceedings of the ACM on Management of Data10.1145/37096803:1(1-25)Online publication date: 11-Feb-2025
    • (2025)Q-DISCO: Query-Centric Densest Subgraphs in Networks with Opinion InformationProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703502(194-203)Online publication date: 10-Mar-2025
    • (2025)Colorful Star Motif Counting: Concepts, Algorithms and ApplicationsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.351499737:3(1105-1125)Online publication date: Mar-2025
    • (2024)Faster streaming and scalable algorithms for finding directed dense subgraphs in large graphsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693532(35876-35891)Online publication date: 21-Jul-2024
    • (2024)An Efficient and Exact Algorithm for Locally h-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36988002:6(1-26)Online publication date: 20-Dec-2024
    • (2024)A Counting-based Approach for Efficient k-Clique Densest Subgraph DiscoveryProceedings of the ACM on Management of Data10.1145/36549222:3(1-27)Online publication date: 30-May-2024
    • (2024)A Fast Exact Algorithm to Enumerate Maximal Pseudo-cliques in Large Sparse GraphsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3672066(2479-2490)Online publication date: 25-Aug-2024
    • (2024)Parallel k-Core Decomposition with Batched Updates and Asynchronous ReadsProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638508(286-300)Online publication date: 2-Mar-2024
    • (2024)A Similarity-based Approach for Efficient Large Quasi-clique DetectionProceedings of the ACM Web Conference 202410.1145/3589334.3645374(401-409)Online publication date: 13-May-2024
    • (2024)Unified Dense Subgraph Detection: Fast Spectral Theory Based AlgorithmsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.327257436:3(1356-1370)Online publication date: Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media