skip to main content
10.5555/1109557.1109681acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
Article

Matrix approximation and projective clustering via volume sampling

Published: 22 January 2006 Publication History

Abstract

Frieze et al. [17] proved that a small sample of rows of a given matrix A contains a low-rank approximation D that minimizes ||A - D||F to within small additive error, and the sampling can be done efficiently using just two passes over the matrix [12]. In this paper, we generalize this result in two ways. First, we prove that the additive error drops exponentially by iterating the sampling in an adaptive manner. Using this result, we give a pass-efficient algorithm for computing low-rank approximation with reduced additive error. Our second result is that using a natural distribution on subsets of rows (called volume sampling), there exists a subset of k rows whose span contains a factor (k + 1) relative approximation and a subset of k + k(k + 1)/ε rows whose span contains a 1+ε relative approximation. The existence of such a small certificate for multiplicative low-rank approximation leads to a PTAS for the following projective clustering problem: Given a set of points P in Rd, and integers k, j, find a set of j subspaces F1, . . ., Fj, each of dimension at most k, that minimize Σp∈Pmini d(p, Fi)2.

References

[1]
D. Achlioptas, F. McSherry, Fast computation of low rank approximations, Proceedings of the 33rd Annual Symposium on Theory of Computing, 2001.
[2]
P. Agarwal, C. Procopiuc, K. Varadajan, Approximation algorithms for k-line center. Proceedings of European Symposium on Algorithms, 2002.
[3]
P. Agarwal, S. Har-Peled, K. Varadajan, Geometric approximations via coresets. Manuscript, 2004. http://valis.cs.uiuc.edu/~sariel/papers/04/survey/.
[4]
P. Agarwal, N. Mustafa, k-means projective clustering. Proceedings of PODS, 2004.
[5]
R. Agarwal, J. Gehrke, D. Gunopulos, P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. Proceedings of SIGMOD, 1998.
[6]
C. Aggarwal, C. Procopiuc, J. Wolf, P. Yu, J. Park. Fast Algorithms for projected clustering. Proceedings of SIGMOD, 1999.
[7]
N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, Feb. 1999.
[8]
M. Artin. Algebra, Prentice-Hall, 1991.
[9]
M. Bǎdoiu, S. Har-Peled, P. Indyk, Approximate clustering via core-sets. Proceedings of 34th Annual Symposium on Theory of Computing, 2002.
[10]
Z. Bar-Yosseff, Sampling lower bounds via information theory. Proceedings of the 35th Annual Symposium on Theory of Computing, 2003.
[11]
W.F. de la Vega, M. Karpinski, C. Kenyon, Y. Rabani. Approximation schemes for clustering problems. Proceedings of the 35th Annual ACM Symposium on Theory of Computing, 2003.
[12]
P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, Clustering in large graphs and matrices. Proceedings of 10th SODA, 1999.
[13]
P. Drineas, R. Kannan, Pass Efficient algorithm for approximating large matrices, Proceedings of 14th SODA, 2003.
[14]
P. Drineas, R. Kannan, M. Maloney, Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. Yale University Technical Report, YALEU/DCS/TR-1270, 2004.
[15]
M. Effros, L. J. Schulman, Deterministic clustering with data nets, ECCC TR04--050, 2004.
[16]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, J. Zhang, On graph problems in a semi-streaming model. Proceedings of the 31st ICALP, 2004.
[17]
A. Frieze, R. Kannan, S. Vempala, Fast Monte-Carlo algorithms for finding low-rank approximations. Journal of the ACM, 51(6):1025--1041, 2004.
[18]
G. Golub, C. Van Loan, Matrix Computations. John Hopkins University Press, third edition, 1996.
[19]
S. A. Goreinov, E. E. Tyrtyshnikov, The maximal-volume concept in approximation by low-rank matrices Contemporary Mathematics, Vol. 280 (2001), 47--51.
[20]
S. Guha, N. Koudas, K. Shim, Data-streams and histograms. Proceedings of 33rd ACM Symposium on Theory of Computing, 2001.
[21]
S. Har-Peled, S. Mazumdar, Coresets for k-means and k-median clustering and their applications. Proceedings of the 36th Annual Symposium on Theory of Computing, 2004.
[22]
S. Har-Peled, K. Varadarajan, Projective clustering in high dimensions using core-sets. Proceedings of Symposium on Computational Geometry, 2002.
[23]
M. Henzinger, P. Raghavan, S. Rajagopalan, Computing on data streams. Technical Note 1998--011, Digital Systems Research Center, Palo Alto, CA, May 1998.
[24]
D. Kempe, F. McSherry, A decentralized algorithm for spectral analysis. Proceedings of the 36th Annual Symposium on Theory of Computing, 2004.
[25]
A. Kumar, Y. Sabharwal, S. Sen, A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. Proceedings of the 45th Annual IEEE Foundations of Computer Science, 2004.
[26]
J. Matoušek, On approximate geometric k-clustering. Discrete and Computational Geometry, 61--84, 2000.
[27]
N. Megiddo, A. Tamir, On the complexity of locating linear facilities in the plane. Operations Research Letters, 1 (1982), 194--197.
[28]
R. Ostrovsky, Y. Rabani, Polynomial time approximation schemes for geometric clustering problems. Journal of the ACM, 49(2):139--156, March, 2002.
[29]
C. Procopiuc, P. Agarwal, T. Murali, M. Jones, A Monte Carlo algorithm for fast projective clustering. Proceedings of SIGMOD, 2002.
[30]
L. Rademacher, S. Vempala, G. Wang, Matrix approximation and projective clustering via iterative sampling. MIT-LCS-TR-983, 2005.

Cited By

View all
  • (2023)Kernel quadrature with randomly pivoted choleskyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668997(65850-65868)Online publication date: 10-Dec-2023
  • (2019)Optimal analysis of subset-selection based ℓ low-rank approximationProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454515(2541-2552)Online publication date: 8-Dec-2019
  • (2018)Leveraged volume sampling for linear regressionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327176(2510-2519)Online publication date: 3-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
January 2006
1261 pages
ISBN:0898716055

Sponsors

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 22 January 2006

Check for updates

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 411 of 1,322 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Kernel quadrature with randomly pivoted choleskyProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668997(65850-65868)Online publication date: 10-Dec-2023
  • (2019)Optimal analysis of subset-selection based ℓ low-rank approximationProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454515(2541-2552)Online publication date: 8-Dec-2019
  • (2018)Leveraged volume sampling for linear regressionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327176(2510-2519)Online publication date: 3-Dec-2018
  • (2018)Reverse iterative volume sampling for linear regressionThe Journal of Machine Learning Research10.5555/3291125.329114819:1(853-891)Online publication date: 1-Jan-2018
  • (2017)Unbiased estimates for linear regression via volume samplingProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3294996.3295068(3087-3096)Online publication date: 4-Dec-2017
  • (2017)Input sparsity time low-rank approximation via ridge leverage score samplingProceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms10.5555/3039686.3039801(1758-1777)Online publication date: 16-Jan-2017
  • (2015)Column selection via adaptive samplingProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 110.5555/2969239.2969285(406-414)Online publication date: 7-Dec-2015
  • (2015)Sketching for M-estimatorsProceedings of the twenty-sixth annual ACM-SIAM symposium on Discrete algorithms10.5555/2722129.2722192(921-939)Online publication date: 4-Jan-2015
  • (2014)Optimal CUR matrix decompositionsProceedings of the forty-sixth annual ACM symposium on Theory of computing10.1145/2591796.2591819(353-362)Online publication date: 31-May-2014
  • (2014)Matrix Recipes for Hard Thresholding MethodsJournal of Mathematical Imaging and Vision10.1007/s10851-013-0434-748:2(235-265)Online publication date: 1-Feb-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media