ABSTRACT
We introduce and study a new notion of graph partitioning, intimately connected to spectral clustering and k-means clustering. Formally, given a graph G on n vertices, we ask to find a graph H that is the union of k cliques on n vertices, such that LG > λ LH where λ is maximized. Here LG and LH are the (normalized) Laplacians of the graphs G and H respectively. Informally, our graph partitioning objective asks for the optimal spectral simplification of a given graph as a disjoint union of k cliques.
We justify this objective function in several ways. First and foremost, we show that a commonly used spectral clustering algorithm implicitly optimizes this objective function, up to a factor of O(k). Using this connection, we immediately get an O(k)-approximation algorithm to our new objective function by simply using the spectral clustering algorithm on G. Next, we demonstrate another application of our objective function: we use it as a means to proving that simple spectral clustering algorithms can solve some well-studied graph partitioning problems (such as partitioning into expanders). Additionally, we also show that (a relaxation of) this optimization problem naturally arises as the dual problem to the question of finding the worst-case integrality gap instance for the classical k-means SDP. Finally, owing to these close connection between some classical clustering techniques (such as k-means and spectral clustering), we argue that a more complete understanding of this optimization problem could lead to new algorithmic insights and techniques for the area of graph partitioning.
- N. Alon. Eigenvalues and expanders. Combinatorica, 6(2):83--96, 1986. Google ScholarDigital Library
- N. Alon, A. Lubotzky, and A. Wigderson. Semi-direct product in groups and zig-zag product in graphs: connections and applications. In Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, pages 630--637. IEEE, 2001. Google ScholarDigital Library
- N. Alon and V. D. Milman. λ1, isoperimetric inequalities for graphs, and superconcentrators. Journal of Combinatorial Theory, Series B, 38(1):73--88, 1985.Google ScholarCross Ref
- S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227--236. ACM, 2007. Google ScholarDigital Library
- S. Arora and S. Kale. A combinatorial, primal-dual approach to semidefinite programs. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 227--236. ACM, 2007. Google ScholarDigital Library
- S. Arora, J. Lee, and A. Naor. Euclidean distortion and the sparsest cut. Journal of the American Mathematical Society, 21(1):1--21, 2008.Google ScholarCross Ref
- S. Arora, S. Rao, and U. Vazirani. Expander flows, geometric embeddings and graph partitioning. Journal of the ACM (JACM), 56(2):5, 2009. Google ScholarDigital Library
- G. Arzhantseva and R. Tessera. Relatively expanding box spaces with no expansion. arXiv preprint arXiv:1402.1481, 2014.Google Scholar
- J. Batson, D. A. Spielman, and N. Srivastava. Twice-ramanujan sparsifiers. SIAM Journal on Computing, 41(6):1704--1721, 2012.Google ScholarDigital Library
- P. Biswal, J. R. Lee, and S. Rao. Eigenvalue bounds, spectral partitioning, and metrical deformations via flows. Journal of the ACM (JACM), 57(3):13, 2010. Google ScholarDigital Library
- E. G. Boman and B. Hendrickson. Support theory for preconditioning. SIAM Journal on Matrix Analysis and Applications, 25(3):694--717, 2003. Google ScholarDigital Library
- J. Cheeger. A lower bound for the smallest eigenvalue of the laplacian. Problems in analysis, 625:195--199, 1970.Google Scholar
- E. Dahlhaus, D. S. Johnson, C. H. Papadimitriou, P. D. Seymour, and M. Yannakakis. The complexity of multiterminal cuts. SIAM Journal on Computing, 23(4):864--894, 1994. Google ScholarDigital Library
- I. Dhillon, Y. Guan, and B. Kulis. A unified view of kernel k-means, spectral clustering and graph cuts. Citeseer, 2004.Google Scholar
- I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means: spectral clustering and normalized cuts. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 551--556. ACM, 2004. Google ScholarDigital Library
- N. Garg, V. V. Vazirani, and M. Yannakakis. Approximate max-flow min-(multi) cut theorems and their applications. SIAM Journal on Computing, 25(2):235--251, 1996. Google ScholarDigital Library
- S. O. Gharan and L. Trevisan. Partitioning into expanders. In SODA, pages 1256--1266. SIAM, 2014. Google ScholarDigital Library
- O. Goldschmidt and D. S. Hochbaum. Polynomial algorithm for the k-cut problem. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 444--451. IEEE, 1988. Google ScholarDigital Library
- N. J. Harvey and N. Olver. Pipage rounding, pessimistic estimators and matrix concentration. In SODA, pages 926--945. SIAM, 2014. Google ScholarDigital Library
- J. Hopcroft and R. Kannan. Foundations of Data Science. 2014.Google Scholar
- K. Jain and V. V. Vazirani. Approximation algorithms for metric facility location and k-median problems using the primal-dual schema and lagrangian relaxation. Journal of the ACM (JACM), 48(2):274--296, 2001. Google ScholarDigital Library
- R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad and spectral. Journal of the ACM (JACM), 51(3):497--515, 2004. Google ScholarDigital Library
- J. A. Kelner. Spectral partitioning, eigenvalue bounds, and circle packings for graphs of bounded genus. SIAM Journal on Computing, 35(4):882--902, 2006. Google ScholarDigital Library
- N. L. D. Khoa and S. Chawla. Large scale spectral clustering using resistance distance and spielman-teng solvers. In Discovery Science, pages 7--21. Springer, 2012.Google ScholarCross Ref
- N. L. D. Khoa and S. Chawla. A scalable approach to spectral clustering with sdd solvers. Journal of Intelligent Information Systems, pages 1--20, 2013. Google ScholarDigital Library
- A. Kumar and R. Kannan. Clustering with spectral norm and the k-means algorithm. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on, pages 299--308. IEEE, 2010. Google ScholarDigital Library
- J. R. Lee, S. Oveis Gharan, and L. Trevisan. Multi-way spectral partitioning and higher-order cheeger inequalities. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1117--1130. ACM, 2012. Google ScholarDigital Library
- T. Leighton and S. Rao. Multicommodity max-flow min-cut theorems and their use in designing approximation algorithms. Journal of the ACM (JACM), 46(6):787--832, 1999. Google ScholarDigital Library
- M. Lichman. UCI machine learning repository, 2013.Google Scholar
- A. Louis, P. Raghavendra, P. Tetali, and S. Vempala. Many sparse cuts via higher eigenvalues. In Proceedings of the forty-fourth annual ACM symposium on Theory of computing, pages 1131--1140. ACM, 2012. Google ScholarDigital Library
- F. McSherry. Spectral partitioning of random graphs. In Foundations of Computer Science, 2001. Proceedings. 42nd IEEE Symposium on, pages 529--537. IEEE, 2001. Google ScholarDigital Library
- M. Meila and J. Shi. A random walks view of spectral segmentation. 2001.Google Scholar
- A. Y. Ng, M. I. Jordan, Y. Weiss, et al. On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2:849--856, 2002.Google Scholar
- H. Qiu and E. R. Hancock. Clustering and embedding using commute times. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 29(11):1873--1890, 2007. Google ScholarDigital Library
- J. Shi and J. Malik. Normalized cuts and image segmentation. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(8):888--905, 2000. Google ScholarDigital Library
- D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM Journal on Computing, 40(6):1913--1926, 2011. Google ScholarDigital Library
- D. A. Spielman and S.-H. Teng. Spectral partitioning works: Planar graphs and finite element meshes. Linear Algebra and its Applications, 421(2):284--305, 2007.Google ScholarCross Ref
- D. A. Spielman and S.-H. Teng. Spectral sparsification of graphs. SIAM Journal on Computing, 40(4):981--1025, 2011. Google ScholarDigital Library
- L. Trevisan. Is cheeger-type approximation possible for nonuniform sparsest cut? arXiv preprint arXiv:1303.2730, 2013.Google Scholar
- Y. Weiss. Segmentation using eigenvectors: a unifying view. In Computer vision, 1999. The proceedings of the seventh IEEE international conference on, volume 2, pages 975--982. IEEE, 1999. Google ScholarDigital Library
- E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems, volume 15, 2003.Google Scholar
- E. Xing, E. P. Xing, M. Jordan, and M. I. Jordan. On semidefinite relaxations for normalized k-cut and connections to spectral clustering. 2003.Google Scholar
- D. Yan, L. Huang, and M. I. Jordan. Fast approximate spectral clustering. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 907--916. ACM, 2009. Google ScholarDigital Library
Index Terms
- Spectral Embedding of k-Cliques, Graph Partitioning and k-Means
Recommendations
Partitioning of a graph into induced subgraphs not containing prescribed cliques
AbstractLet K p be a complete graph of order p ≥ 2. A K p-free k-coloring of a graph H is a partition of V ( H ) into V 1 , V 2 … , V k such that H [ V i ] does not contain K p for each i ≤ k. In 1977 Borodin and Kostochka conjectured that any graph H ...
Partitioning a Graph into Complementary Subgraphs
WALCOM: Algorithms and ComputationAbstractIn the Partition Into Complementary Subgraphs (Comp-Sub) problem we are given a graph , and an edge set property , and asked whether G can be decomposed into two graphs, H and its complement , for some graph H, in such a way that the edge cut-set (...
Weighted Graph Cuts without Eigenvectors A Multilevel Approach
A variety of clustering algorithms have recently been proposed to handle data that is not linearly separable; spectral clustering and kernel k-means are two of the main methods. In this paper, we discuss an equivalence between the objective functions ...
Comments