skip to main content
10.1145/1146381.1146397acmconferencesArticle/Chapter ViewAbstractPublication PagespodcConference Proceedingsconference-collections
Article

Sketching asynchronous streams over a sliding window

Published: 23 July 2006 Publication History

Abstract

We study the problem of maintaining sketches of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in which the observed order of data is not the same as the time order in which the data was generated. The notion of recent elements of a stream is modeled by the sliding timestamp window, which is the set of elements with timestamps that are close to the current time. We design algorithms for maintaining sketches of all elements within the sliding timestamp window that can give provably accurate estimates of two basic aggregates, the sum and the median, of a stream of numbers. The space taken by the sketches, the time needed for querying the sketch, and the time for inserting new elements into the sketch are all polylog with respect to the maximum window size and the values of the data items in the window. Our sketches can be easily combined in a lossless and compact way, making them useful for distributed computations over data streams. Previous works on sketching recent elements of a data stream have all considered the more restrictive scenario of synchronous streams, where the observed order of data is the same as the time order in which the data was generated. Our notion of recency of elements is more general than that studied in previous work, and thus our sketches are more robust to network delays and asynchrony.

References

[1]
N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. Journal of Computer and System Sciences, 58(1):137--147, 1999.
[2]
A. Arasu and G. Manku. Approximate counts and quantiles over sliding windows. In Proc. ACM Symposium on Principles of Database Systems (PODS), pages 286--296, 2004.
[3]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. 21st ACM Symp. on Principles of Database Systems (PODS), pages 1--16, 2002.
[4]
B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In Proc. 22nd ACM Symp. on Principles of Database Systems (PODS), pages 234--243, June 2003.
[5]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. SIAM Journal on Computing, 31(6):1794--1813, 2002.
[6]
J. Feigenbaum, S. Kannan, and J. Zhang. Computing diameter in the streaming and sliding-window models. Algorithmica, 41:25--41, 2005.
[7]
P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proc. 27th International Conf. on Very Large Data Bases (VLDB), pages 541--550, 2001.
[8]
P. Gibbons and S. Tirthapura. Estimating simple functions on the union of data streams. In Proc. ACM Symp. on Parallel Algorithms and Architectures (SPAA), pages 281--291, 2001.
[9]
P. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. Theory of Computing Systems, 37:457--478, 2004.
[10]
M. Greenwald and S. Khanna. Space efficient online computation of quantile summaries. In Proc. ACM International Conference on Management of Data (SIGMOD), pages 58--66, 2001.
[11]
S. Guha, D. Gunopulos, and N. Koudas. Correlating synchronous and asynchronous data streams. In Proc.9th ACM International Conference on Knowledge Discovery and Data Mining (KDD), pages 529--534, 2003.
[12]
S. Madden, M. Franklin, J. Hellerstein, and W. Hong. Tag: a tiny aggregation service for ad-hoc sensor networks. SIGOPS Operating Systems Review, 36(SI):131--146, 2002.
[13]
A. Manjhi, V. Shkapenyuk, K. Dhamdhere, and C. Olston. Finding (recently) frequent items in distributed data streams. In Proc. IEEE International Conference on Data Engineering (ICDE), pages 767--778, 2005.
[14]
G. Manku, S. Rajagopalan, and B. Lindsley. Approximate medians and other quantiles in one pass and with limited memory. In Proc. ACM SIGMOD International Conf. on Management of Data, pages 426--435, June 1998.
[15]
S. Muthukrishnan. Data streams: Algorithms and applications. Technical report, Rutgers University, Piscataway, NJ, 2003.
[16]
B. Patt-Shamir. A note on efficient aggregate queries in sensor networks. In Proc. of the 23rd annual ACM symposium on Principles of Distributed Computing (PODC), pages 283--289, 2004.
[17]
A. Pavan and S. Tirthapura. Range-efficient computation of f0 over massive data streams. In Proc. IEEE International Conference on Data Engineering (ICDE), 2005.
[18]
J. Schmidt, A. Siegel, and A. Srinivasan. Chernoff-hoeffding bounds for applications with limited independence. SIAM J. Discrete Math., 8(2):223--250, 1995.
[19]
U. Srivastava and J. Widom. Flexible time management in data stream systems. In Proc. 23rd ACM Symposium on Principles of Database Systems (PODS), pages 263--274, 2004.

Cited By

View all
  • (2022)A Sketching Approach for Obtaining Real-Time Statistics Over Data Streams in CloudIEEE Transactions on Cloud Computing10.1109/TCC.2020.298702310:2(1462-1475)Online publication date: 1-Apr-2022
  • (2019)Accelerating Real-Time Tracking Applications over Big Data Stream with Constrained SpaceDatabase Systems for Advanced Applications10.1007/978-3-030-18576-3_1(3-18)Online publication date: 24-Apr-2019
  • (2018)Probabilistic Management of Late Arrival of EventsProceedings of the 12th ACM International Conference on Distributed and Event-based Systems10.1145/3210284.3210293(52-63)Online publication date: 25-Jun-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODC '06: Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing
July 2006
230 pages
ISBN:1595933840
DOI:10.1145/1146381
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 July 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. aggregates
  2. asynchronous streams
  3. data stream processing
  4. distributed streams
  5. sketches of streams
  6. sliding windows
  7. union of streams

Qualifiers

  • Article

Conference

PODC06

Acceptance Rates

Overall Acceptance Rate 740 of 2,477 submissions, 30%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A Sketching Approach for Obtaining Real-Time Statistics Over Data Streams in CloudIEEE Transactions on Cloud Computing10.1109/TCC.2020.298702310:2(1462-1475)Online publication date: 1-Apr-2022
  • (2019)Accelerating Real-Time Tracking Applications over Big Data Stream with Constrained SpaceDatabase Systems for Advanced Applications10.1007/978-3-030-18576-3_1(3-18)Online publication date: 24-Apr-2019
  • (2018)Probabilistic Management of Late Arrival of EventsProceedings of the 12th ACM International Conference on Distributed and Event-based Systems10.1145/3210284.3210293(52-63)Online publication date: 25-Jun-2018
  • (2018)Recent Advancements in Event ProcessingACM Computing Surveys10.1145/317043251:2(1-36)Online publication date: 13-Feb-2018
  • (2017)Towards an asynchronous aggregation-capable watermark for end-to-end protection of big data streamsFuture Generation Computer Systems10.1016/j.future.2016.09.00172:C(288-304)Online publication date: 1-Jul-2017
  • (2017)Supporting Real-Time Analytic Queries in Big and Fast Data EnvironmentsDatabase Systems for Advanced Applications10.1007/978-3-319-55699-4_29(477-493)Online publication date: 22-Mar-2017
  • (2016)Dynamic sketching over distributed data streams2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS)10.1109/INFCOMW.2016.7562250(1055-1056)Online publication date: Apr-2016
  • (2015)Quality-Driven Continuous Query Execution over Out-of-Order Data StreamsProceedings of the 2015 ACM SIGMOD International Conference on Management of Data10.1145/2723372.2735371(889-894)Online publication date: 27-May-2015
  • (2015)Quality-driven processing of sliding window aggregates over out-of-order data streamsProceedings of the 9th ACM International Conference on Distributed Event-Based Systems10.1145/2675743.2771828(68-79)Online publication date: 24-Jun-2015
  • (2015)Sketching distributed sliding-window data streamsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0380-724:3(345-368)Online publication date: 1-Jun-2015
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media