ACM Home Page
Please provide us with feedback. Feedback
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Full text PdfPdf (592 KB)
Source International Conference on Management of Data archive
Proceedings of the 2005 ACM SIGMOD international conference on Management of data table of contents
Baltimore, Maryland
SESSION: Research papers: streams table of contents
Pages: 25 - 36  
Year of Publication: 2005
ISBN:1-59593-060-4
Authors
Graham Cormode  Bell Laboratories
Minos Garofalakis  Bell Laboratories
S. Muthukrishnan  Rutgers and AT&T
Rajeev Rastogi  Bell Laboratories
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 6,   Downloads (12 Months): 105,   Citation Count: 13
Additional Information:

abstract   references   cited by   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1066157.1066161
What is a DOI?

ABSTRACT

While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space efficient (at each remote site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality estimates. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking complex holistic aggregates in such a distributed-streams setting --- our primary focus is on approximate quantile summaries, but our approach is more broadly applicable and can handle other holistic-aggregate functions (e.g., "heavy-hitters" queries). We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network. In a nutshell, our algorithms employ a combination of local tracking at remote sites and simple prediction models for local site behavior in order to produce highly communication- and space-efficient solutions. We perform extensive experiments with real and synthetic data to explore the various tradeoffs and understand the role of prediction models in our schemes. The results clearly validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
 
4
G. Cormode and S. Muthukrishnan. "An improved data stream summary: The count-min sketch and its applications". In Proceedings of Latin American Informatics, 2004.
5
6
 
7
Dartmouth wireless traces. (http://cmc.cs.dartmouth.-edu/data/dartmouth.html).
 
8
A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi. "Distributed Set-Expression Cardinality Estimation". In Proceedings of VLDB, 2004.
 
9
A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein, and W. Hong. "Model-Driven Data Acquisition in Sensor Networks". In Proceedings of VLDB, 2004.
10
 
11
 
12
 
13
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. "How to Summarize the Universe: Dynamic Maintenance of Quantiles". In Proceedings of VLDB, 2002.
14
15
 
16
G. H. Hardy, J. E. Littlewood, and G. Pólya. "Inequalities". Cambridge University Press, 1988. (Second Edition).
 
17
Internet traffic archive. (http://ita.ee.lbl.gov/).
18
 
19
20
 
21
G. S. Manku and R. Motwani. "Approximate Frequency Counts over Data Streams". In Proceedings of VLDB, 2002.
22
23

CITED BY  13
 
 
 
 
Collaborative Colleagues:
Graham Cormode: colleagues
Minos Garofalakis: colleagues
S. Muthukrishnan: colleagues
Rajeev Rastogi: colleagues