Article

The space complexity of pass-efficient algorithms for clustering

Authors:
Kevin L. Chang

Yale University, New Haven

Yale University, New Haven
View Profile

,
Ravi Kannan

Yale University, New Haven

Yale University, New Haven
View Profile

Authors Info & Claims

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithmJanuary 2006Pages 1157–1166

Published:22 January 2006Publication History

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

Pages 1157–1166

ABSTRACT

We present multiple pass streaming algorithms for a basic clustering problem for massive data sets. If our algorithm is allotted 2l passes, it will produce an approximation with error at most ε using Õ(k³/ε²/l) bits of memory, the most critical resource for streaming computation. We demonstrate that this tradeoff between passes and memory allotted is intrinsic to the problem and model of computation by proving lower bounds on the memory requirements of any l pass randomized algorithm that are nearly matched by our upper bounds. To the best of our knowledge, this is the first time nearly matching bounds have been proved for such an exponential tradeoff for randomized computation.In this problem, we are given a set of n points drawn randomly according to a mixture of k uniform distributions and wish to approximate the density function of the mixture. The points are placed in a datastream (possibly in adversarial order), which may only be read sequentially by the algorithm. We argue that this models, among others, the datastream produced by a national census of the incomes of all citizens.

References

References are not available

Index Terms

The space complexity of pass-efficient algorithms for clustering
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Theory of computation
  1. Computational complexity and cryptography
    1. Complexity classes
    2. Complexity theory and logic
  2. Design and analysis of algorithms

Recommendations

Pass-efficient algorithms for clustering
Read More
Pass-Efficient Algorithms for Learning Mixtures of Uniform Distributions

We present multiple pass streaming algorithms for a basic statistical clustering problem for massive data sets. If our algorithm is allotted $2\ell$ passes, it will produce an approximation with error at most $\epsilon$ using $\tilde{O}(k^3/\epsilon^{2/\...
Read More
Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem
PODS '17: Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

We study the classic set cover problem in the streaming model: the sets that comprise the instance are revealed one by one in a stream and the goal is to solve the problem by making one or few passes over the stream while maintaining a sublinear space o(...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
January 2006
1261 pages
ISBN:0898716055
Sponsors
In-Cooperation
Publisher
Society for Industrial and Applied Mathematics
United States
Publication History
- Published: 22 January 2006
Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate411of1,322submissions,31%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 285
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The space complexity of pass-efficient algorithms for clustering

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

ABSTRACT

References

Cited By

Index Terms

Recommendations

Pass-efficient algorithms for clustering

Pass-Efficient Algorithms for Learning Mixtures of Uniform Distributions

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

The space complexity of pass-efficient algorithms for clustering

SODA '06: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm

ABSTRACT

References

Cited By

Index Terms

Recommendations

Pass-efficient algorithms for clustering

Pass-Efficient Algorithms for Learning Mixtures of Uniform Distributions

Tight Space-Approximation Tradeoff for the Multi-Pass Streaming Set Cover Problem

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media