|
ABSTRACT
While traditional database systems optimize for performance on one-shot queries, emerging large-scale monitoring applications require continuous tracking of complex aggregates and data-distribution summaries over collections of physically-distributed streams. Thus, effective solutions have to be simultaneously space efficient (at each remote site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality estimates. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking complex holistic aggregates in such a distributed-streams setting --- our primary focus is on approximate quantile summaries, but our approach is more broadly applicable and can handle other holistic-aggregate functions (e.g., "heavy-hitters" queries). We present the first known distributed-tracking schemes for maintaining accurate quantile estimates with provable approximation guarantees, while simultaneously optimizing the storage space at each remote site as well as the communication cost across the network. In a nutshell, our algorithms employ a combination of local tracking at remote sites and simple prediction models for local site behavior in order to produce highly communication- and space-efficient solutions. We perform extensive experiments with real and synthetic data to explore the various tradeoffs and understand the role of prediction models in our schemes. The results clearly validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
 |
1
|
Noga Alon , Phillip B. Gibbons , Yossi Matias , Mario Szegedy, Tracking join and self-join sizes in limited storage, Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, p.10-20, May 31-June 03, 1999, Philadelphia, Pennsylvania, United States
[doi> 10.1145/303976.303978]
|
 |
2
|
Noga Alon , Yossi Matias , Mario Szegedy, The space complexity of approximating the frequency moments, Proceedings of the twenty-eighth annual ACM symposium on Theory of computing, p.20-29, May 22-24, 1996, Philadelphia, Pennsylvania, United States
[doi> 10.1145/237814.237823]
|
 |
3
|
|
| |
4
|
G. Cormode and S. Muthukrishnan. "An improved data stream summary: The count-min sketch and its applications". In Proceedings of Latin American Informatics, 2004.
|
 |
5
|
|
 |
6
|
|
| |
7
|
Dartmouth wireless traces. (http://cmc.cs.dartmouth.-edu/data/dartmouth.html).
|
| |
8
|
A. Das, S. Ganguly, M. Garofalakis, and R. Rastogi. "Distributed Set-Expression Cardinality Estimation". In Proceedings of VLDB, 2004.
|
| |
9
|
A. Deshpande, C. Guestrin, S. R. Madden, J. M. Hellerstein, and W. Hong. "Model-Driven Data Acquisition in Sensor Networks". In Proceedings of VLDB, 2004.
|
 |
10
|
|
| |
11
|
|
| |
12
|
|
| |
13
|
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. "How to Summarize the Universe: Dynamic Maintenance of Quantiles". In Proceedings of VLDB, 2002.
|
 |
14
|
|
 |
15
|
|
| |
16
|
G. H. Hardy, J. E. Littlewood, and G. Pólya. "Inequalities". Cambridge University Press, 1988. (Second Edition).
|
| |
17
|
Internet traffic archive. (http://ita.ee.lbl.gov/).
|
 |
18
|
|
| |
19
|
|
 |
20
|
Gurmeet Singh Manku , Sridhar Rajagopalan , Bruce G. Lindsay, Approximate medians and other quantiles in one pass and with limited memory, Proceedings of the 1998 ACM SIGMOD international conference on Management of data, p.426-435, June 01-04, 1998, Seattle, Washington, United States
|
| |
21
|
G. S. Manku and R. Motwani. "Approximate Frequency Counts over Data Streams". In Proceedings of VLDB, 2002.
|
 |
22
|
|
 |
23
|
Nisheeth Shrivastava , Chiranjeeb Buragohain , Divyakant Agrawal , Subhash Suri, Medians and beyond: new aggregation techniques for sensor networks, Proceedings of the 2nd international conference on Embedded networked sensor systems, November 03-05, 2004, Baltimore, MD, USA
[doi> 10.1145/1031495.1031524]
|
CITED BY 13
|
Jie Gao , Leonidas Guibas , Nikola Milosavljevic , John Hershberger, Sparse data aggregation in sensor networks, Proceedings of the 6th international conference on Information processing in sensor networks, April 25-27, 2007, Cambridge, Massachusetts, USA
|
|
|
|
|
|
|
|
|
|
|
Srinivas Kashyap , Supratim Deb , K. V. M. Naidu , Rajeev Rastogi , Anand Srinivasan, Efficient gossip-based aggregate computation, Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, June 26-28, 2006, Chicago, IL, USA
|
|
Dimitris Sacharidis , Kostas Patroumpas , Manolis Terrovitis , Verena Kantere , Michalis Potamias , Kyriakos Mouratidis , Timos Sellis, On-line discovery of hot motion paths, Proceedings of the 11th international conference on Extending database technology: Advances in database technology, March 25-29, 2008, Nantes, France
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|