Article

Matrix approximation and projective clustering via volume sampling

Authors:

Amit Deshpande,

Luis Rademacher,

Santosh Vempala,

Grant WangAuthors Info & Claims

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

Pages 1117 - 1126

Published: 22 January 2006 Publication History

Abstract

Frieze et al. [17] proved that a small sample of rows of a given matrix A contains a low-rank approximation D that minimizes ||A - D||F to within small additive error, and the sampling can be done efficiently using just two passes over the matrix [12]. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner. Using this result, we give a pass-efficient algorithm for computing low-rank approximation with reduced additive error. Our second result is that using a natural distribution on subsets of rows (called volume sampling), there exists a subset of k rows whose span contains a factor (k + 1) relative approximation and a subset of k + k(k + 1)/ε rows whose span contains a 1+ε relative approximation. The existence of such a small certificate for multiplicative low-rank approximation leads to a PTAS for the following projective clustering problem: Given a set of points P in R^d, and integers k, j, find a set of j subspaces F₁, . . ., F_j, each of dimension at most k, that minimize Σ_p∈Pmin_i d(p, F_i)².

References

[1]

D. Achlioptas, F. McSherry, Fast computation of low rank approximations, Proceedings of the 33rd Annual Symposium on Theory of Computing, 2001.

Digital Library

[2]

P. Agarwal, C. Procopiuc, K. Varadajan, Approximation algorithms for k-line center. Proceedings of European Symposium on Algorithms, 2002.

Digital Library

[3]

P. Agarwal, S. Har-Peled, K. Varadajan, Geometric approximations via coresets. Manuscript, 2004. http://valis.cs.uiuc.edu/~sariel/papers/04/survey/.

[4]

P. Agarwal, N. Mustafa, k-means projective clustering. Proceedings of PODS, 2004.

Digital Library

[5]

R. Agarwal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. Proceedings of SIGMOD, 1998.

Digital Library

[6]

C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, J. Park. Fast Algorithms for projected clustering. Proceedings of SIGMOD, 1999.

Digital Library

[7]

N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, Feb. 1999.

Digital Library

[8]

M. Artin. Algebra, Prentice-Hall, 1991.

[9]

M. Bǎdoiu, S. Har-Peled, P. Indyk, Approximate clustering via core-sets. Proceedings of 34th Annual Symposium on Theory of Computing, 2002.

Digital Library

[10]

Z. Bar-Yosseff, Sampling lower bounds via information theory. Proceedings of the 35th Annual Symposium on Theory of Computing, 2003.

Digital Library

[11]

W.F. de la Vega, M. Karpinski, C. Kenyon, Y. Rabani. Approximation schemes for clustering problems. Proceedings of the 35th Annual ACM Symposium on Theory of Computing, 2003.

Digital Library

[12]

P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering in large graphs and matrices. Proceedings of 10th SODA, 1999.

Digital Library

[13]

P. Drineas, R. Kannan, Pass Efficient algorithm for approximating large matrices, Proceedings of 14th SODA, 2003.

Digital Library

[14]

P. Drineas, R. Kannan, M. Maloney, Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. Yale University Technical Report, YALEU/DCS/TR-1270, 2004.

[15]

M. Effros, L. J. Schulman, Deterministic clustering with data nets, ECCC TR04--050, 2004.

[16]

J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, J. Zhang, On graph problems in a semi-streaming model. Proceedings of the 31st ICALP, 2004.

[17]

A. Frieze, R. Kannan, S. Vempala, Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM, 51(6):1025--1041, 2004.

Digital Library

[18]

G. Golub, C. Van Loan, Matrix Computations. John Hopkins University Press, third edition, 1996.

[19]

S. A. Goreinov, E. E. Tyrtyshnikov, The maximal-volume concept in approximation by low-rank matrices Contemporary Mathematics, Vol. 280 (2001), 47--51.

[20]

S. Guha, N. Koudas, K. Shim, Data-streams and histograms. Proceedings of 33rd ACM Symposium on Theory of Computing, 2001.

Digital Library

[21]

S. Har-Peled, S. Mazumdar, Coresets for k-means and k-median clustering and their applications. Proceedings of the 36th Annual Symposium on Theory of Computing, 2004.

Digital Library

[22]

S. Har-Peled, K. Varadarajan, Projective clustering in high dimensions using core-sets. Proceedings of Symposium on Computational Geometry, 2002.

Digital Library

[23]

M. Henzinger, P. Raghavan, S. Rajagopalan, Computing on data streams. Technical Note 1998--011, Digital Systems Research Center, Palo Alto, CA, May 1998.

[24]

D. Kempe, F. McSherry, A decentralized algorithm for spectral analysis. Proceedings of the 36th Annual Symposium on Theory of Computing, 2004.

Digital Library

[25]

A. Kumar, Y. Sabharwal, S. Sen, A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. Proceedings of the 45th Annual IEEE Foundations of Computer Science, 2004.

Digital Library

[26]

J. Matoušek, On approximate geometric k-clustering. Discrete and Computational Geometry, 61--84, 2000.

[27]

N. Megiddo, A. Tamir, On the complexity of locating linear facilities in the plane. Operations Research Letters, 1 (1982), 194--197.

[28]

R. Ostrovsky, Y. Rabani, Polynomial time approximation schemes for geometric clustering problems. Journal of the ACM, 49(2):139--156, March, 2002.

Digital Library

[29]

C. Procopiuc, P. Agarwal, T. Murali, M. Jones, A Monte Carlo algorithm for fast projective clustering. Proceedings of SIGMOD, 2002.

Digital Library

[30]

L. Rademacher, S. Vempala, G. Wang, Matrix approximation and projective clustering via iterative sampling. MIT-LCS-TR-983, 2005.

Cited By

Epperly EMoreno EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Kernel quadrature with randomly pivoted choleskyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668997(65850-65868)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668997
Dan CWang HZhang HZhou YRavikumar PWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Optimal analysis of subset-selection based ℓ low-rank approximationProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454515(2541-2552)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454515
Dereziński MWarmuth MHsu D(2018)Leveraged volume sampling for linear regressionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327176(2510-2519)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327176
Show More Cited By

Index Terms

Matrix approximation and projective clustering via volume sampling

Recommendations

Improving CUR matrix decomposition and the Nyström approximation via adaptive sampling

The CUR matrix decomposition and the Nyström approximation are two important low-rank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR ...
Adaptive sampling and fast low-rank matrix approximation
APPROX'06/RANDOM'06: Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation

We prove that any real matrix A contains a subset of at most 4k/ε+ 2k log(k+1) rows whose span “contains” a matrix of rank at most k with error only (1+ε) times the error of the best rank-k approximation of A. We complement it with an almost matching ...
Uniform Sampling for Matrix Approximation
ITCS '15: Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science

Random sampling has become a critical tool in solving massive matrix problems. For linear regression, a small, manageable set of data rows can be randomly selected to approximate a tall, skinny data matrix, improving processing time significantly. For ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

January 2006

1261 pages

ISBN:0898716055

Sponsors

SIAM Activity Group on Discrete Mathematics
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 22 January 2006

Check for updates

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
733
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Epperly EMoreno EOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Kernel quadrature with randomly pivoted choleskyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668997(65850-65868)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668997
Dan CWang HZhang HZhou YRavikumar PWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Optimal analysis of subset-selection based ℓ low-rank approximationProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454515(2541-2552)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454515
Dereziński MWarmuth MHsu D(2018)Leveraged volume sampling for linear regressionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327176(2510-2519)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327176
Dereziński MWarmuth M(2018)Reverse iterative volume sampling for linear regressionThe Journal of Machine Learning Research10.5555/3291125.329114819:1(853-891)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.5555/3291125.3291148
Dereziński MWarmuth M(2017)Unbiased estimates for linear regression via volume samplingProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294996.3295068(3087-3096)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3294996.3295068
Cohen MMusco CMusco CKlein P(2017)Input sparsity time low-rank approximation via ridge leverage score samplingProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3039686.3039801(1758-1777)Online publication date: 16-Jan-2017
https://dl.acm.org/doi/10.5555/3039686.3039801
Paul SMagdon-Ismail MDrineas P(2015)Column selection via adaptive samplingProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 110.5555/2969239.2969285(406-414)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.5555/2969239.2969285
Clarkson KWoodruff DIndyk P(2015)Sketching for M-estimatorsProceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms10.5555/2722129.2722192(921-939)Online publication date: 4-Jan-2015
https://dl.acm.org/doi/10.5555/2722129.2722192
Boutsidis CWoodruff DShmoys D(2014)Optimal CUR matrix decompositionsProceedings of the forty-sixth annual ACM symposium on Theory of computing10.1145/2591796.2591819(353-362)Online publication date: 31-May-2014
https://dl.acm.org/doi/10.1145/2591796.2591819
Kyrillidis ACevher V(2014)Matrix Recipes for Hard Thresholding MethodsJournal of Mathematical Imaging and Vision10.1007/s10851-013-0434-748:2(235-265)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1007/s10851-013-0434-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten