ACM Home Page
Please provide us with feedback. Feedback
The DLT priority sampling is essentially optimal
Full text PdfPdf (635 KB)
Source Annual ACM Symposium on Theory of Computing archive
Proceedings of the thirty-eighth annual ACM symposium on Theory of computing table of contents
Seattle, WA, USA
SESSION: Session 4A table of contents
Pages: 150 - 158  
Year of Publication: 2006
ISBN:1-59593-134-1
Author
Mario Szegedy  Rutgers, The State University of NJ, Piscataway, NJ
Sponsors
SIGACT: ACM Special Interest Group on Algorithms and Computation Theory
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 3,   Downloads (12 Months): 48,   Citation Count: 3
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1132516.1132539
What is a DOI?

ABSTRACT

The priority sampling procedure of N. Duffield, C. Lund and M. Thorup is not only an exciting new approach to sampling weighted data streams, but it has also proven to be highly successful in a variety of practical applications. We resolve the two major issues related to its performance. First we solve the main conjecture of N. Alon, N. Duffield, C. Lund and M. Thorup in [1], which states that the standard deviation for the subset sum estimator obtained from k priority samples is upper bounded by W/√k-1, where W denotes the actual subset sum that the estimator estimates. Although Alon et al. give an O(W/√k-1) upper bound on the standard deviation of the estimator, their formula cannot be used as a performance guarantee in an applied setting, because the constants coming up in their proof are very large. Our constant cannot be improved. We also resolve the conjecture of Duffield, C. Lund and M. Thorup which states that the variance of the priority sampling procedure is not larger than the variance of the threshold sampling procedure with sample size only one smaller. This is the main conjecture in [7]. The conjecture's significance is that the latter procedure is provably optimal within a very general class of sampling algorithms, but unlike priority sampling, it is not practical. Our result therefore certifies that priority sampling offers the unlikely feat of uniting mathematical elegance, (essential) optimality and applicability. Our proof is based on a new integral formula and on very finely tuned telescopic sums.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
N. Alon, N. Duffield, C. Lund, and M. Thorup. Estimating sums of arbitrary selections with few probes. PODS, pages 317--325, 2005.
 
2
K.R.W. Brewer and M. Hanif. Sampling with unequal probabilities. Springer-Verlag, 1983.
 
3
Chaudhuri, R. Motwani, and V. Narasayya. On random sampling over joins. pages 263--274, 1999.
 
4
 
5
Nick G. Duffield, Carsten Lund, and Mikkel Thorup. Estimating ow distributions from sampled ow statistics. SIGCOMM, pages 325--336, 2003.
 
6
Nick G. Duffield, Carsten Lund, and Mikkel Thorup. Learn more, sample less: control of volume and variance in network measurement. IEEE Transactions in Information Theory, 51(5):1756--1775, 2005.
 
7
Nick G. Duffield, Carsten Lund, and Mikkel Thorup. Sampling to estimate arbitrary subset sums. arXiv.org:cs/0509026, 2005.
 
8
Joseph M. Hellerstein, Peter J. Haas, and Helen J. Wang. Online aggregation. pages 171--182, 1997.
9
 
10
Nicholas C. Weaver. The spread of the sapphire/slammer worm. Web page.