skip to main content
10.1145/1055558.1055579acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Clustering via matrix powering

Published: 14 June 2004 Publication History

Abstract

Given a set of n points with a matrix of pairwise similarity measures, one would like to partition the points into clusters so that similar points are together and different ones apart. We present an algorithm requiring only matrix powering that performs well in practice and bears an elegant interpretation in terms of random walks on a graph. Under a certain mixture model involving planting a partition via randomized rounding of tailored matrix entries, the algorithm can be proven effective for only a single squaring. It is shown that the clustering performance of the algorithm degrades with larger values of the exponent, thus revealing that a single squaring is optimal.

References

[1]
D. Achlioptas and F. McSherry. Fast computation of low rank matrix approximations. In ACM Symposium on Theory of Computing, 2001.
[2]
S. Arora, P. Raghavan, and S. Rao. Approximation schemes for euclidean k -medians and related problems. In ACM Symposium on Theory of Computing, pages 106--113, 1998.
[3]
Y. Azar, A. Fiat, A. Karlin, and F. McSherry. Data mining through spectral analysis. In IEEE Symposium on Foundations of Computer Science, 2001.
[4]
A. Blum and J. Spencer. Coloring random and semi-random k-colorable graphs. In Journal of Algorithms, 1995.
[5]
R. Boppana. Eigenvalues and graph bisection: An average-case analysis. In IEEE Symposium on Foundations of Computer Science, pages 280--285, 1985.
[6]
M. Charikar and S. Guha. Improved combinatorial algorithms for the facility location and k-median problems. In IEEE Symposium on Foundations of Computer Science, pages 378--388, 1999.
[7]
M. Charikar, S. Guha, E. Tardos, and D. B. Shmoys. A constant-factor approximation algorithm for the k-median problem (extended abstract). In ACM Symposium on Theory of Computing, pages 1--10, 1999.
[8]
A. Condon and R. Karp. Algorithms for graph partitioning on the planted partition model. In Random Structure and Algorithms, 1999.
[9]
D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions, 1990.
[10]
I. S. Dhillon, S. Mallela, and D. S. Modha. Information-theoretic co-clustering. In 9th ACM SIGKDD international conference on Knowledge discovery and data mining, 2003.
[11]
Z. Drezner, Facility Location: A survey of Applications and Methods. Springer-Verlag, 1995.
[12]
P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay. Clustering in large graphs and matrices. In ACM-SIAM Symposium on Discrete Algorithms, 1999.
[13]
M. Dyer and A. M. Frieze. A simple heuristic for the p-center problem. Operations Research Letters, 3:285--288, 1985.
[14]
D. S. Hochbaum and D. B. Shmoys. A best possible approximation algorithm for the k-center problem. Math. Oper. Res., 10:180--184, 1985.
[15]
K. Jain and V. V. Vazirani. Primal-dual approximation algorithms for metric facility location and k-median problems. In IEEE Symposium on Foundations of Computer Science, pages 2--13, 1999.
[16]
M. Jambu and M. O. Lebeaux. Cluster Analysis and Data Analysis. North-Holland, New York, 1983.
[17]
M. Jerrum and G. Sorkin. Simulated annealing for graph bisection. In IEEE Symposium on Foundations of Computer Science, 1993.
[18]
R. Kannan, S. Vempala, and A. Vetta. On clusterings: Good, bad, and spectral. In IEEE Symposium on Foundations of Computer Science, 2000.
[19]
J. Kleinberg, C. Papadimitriou, and P. Raghavan. Segmentation problems. In (ACM) Symposium on Theory of Computing, pages 473--482, 1998.
[20]
L. Lovasz. Random walks on graphs: a survey. In Combinatorics, pages 1--46, 1993.
[21]
F. McSherry. Spectral partitioning of random graphs. In IEEE Symposium on Foundations of Computer Science, 2001.
[22]
R. Ostrovsky and Y. Rabani. Polynomial time approximation schemes for geometric k-clustering. In IEEE Symposium on Foundations of Computer Science, pages 349--358, 2000.
[23]
C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent semantic indexing: a probabilistic analysis. In ACM Conference on Principles of Database Systems, 1998.
[24]
M. Szummer and T. Jaakkola. Partially labeled classification with markov random walks. In Advances in Neural Information Processing Systems, volume 14, 2001.
[25]
S. Virtanen. Clustering the chilean web. In LA-WEB, 2003.

Cited By

View all
  • (2016)Block models and personalized PageRankProceedings of the National Academy of Sciences10.1073/pnas.1611275114114:1(33-38)Online publication date: 20-Dec-2016
  • (2015)Streaming Graph Partitioning in the Planted Partition ModelProceedings of the 2015 ACM on Conference on Online Social Networks10.1145/2817946.2817950(27-35)Online publication date: 2-Nov-2015
  • (2011)O(N) implicit subspace embedding for unsupervised multi-scale image segmentationProceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2011.5995606(2209-2215)Online publication date: 20-Jun-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Block models and personalized PageRankProceedings of the National Academy of Sciences10.1073/pnas.1611275114114:1(33-38)Online publication date: 20-Dec-2016
  • (2015)Streaming Graph Partitioning in the Planted Partition ModelProceedings of the 2015 ACM on Conference on Online Social Networks10.1145/2817946.2817950(27-35)Online publication date: 2-Nov-2015
  • (2011)O(N) implicit subspace embedding for unsupervised multi-scale image segmentationProceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition10.1109/CVPR.2011.5995606(2209-2215)Online publication date: 20-Jun-2011
  • (2010)Power iteration clusteringProceedings of the 27th International Conference on International Conference on Machine Learning10.5555/3104322.3104406(655-662)Online publication date: 21-Jun-2010
  • (2010)A Very Fast Method for Clustering Big Text DatasetsProceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence10.5555/1860967.1861028(303-308)Online publication date: 4-Aug-2010
  • (2009)On the Design and Applicability of Distance Functions in High-Dimensional Data SpaceIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2008.17821:4(523-536)Online publication date: 1-Apr-2009
  • (2008)Approximation algorithms for co-clusteringProceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems10.1145/1376916.1376945(201-210)Online publication date: 9-Jun-2008

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media