skip to main content
10.1145/1055558.1055598acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Approximate counts and quantiles over sliding windows

Published: 14 June 2004 Publication History

Abstract

We consider the problem of maintaining ε-approximate counts and quantiles over a stream sliding window using limited space. We consider two types of sliding windows depending on whether the number of elements N in the window is fixed (fixed-size sliding window) or variable (variable-size sliding window). In a fixed-size sliding window, both the ends of the window slide synchronously over the stream. In a variable-size sliding window, an adversary slides the window ends independently, and therefore has the ability to vary the number of elements N in the window.We present various deterministic and randomized algorithms for approximate counts and quantiles. All of our algorithms require O(1/ε polylog(1/ε, N)) space. For quantiles, this space requirement is an improvement over the previous best bound of O(1/ε2 polylog(1/ε, N)). We believe that no previous work on space-efficient approximate counts over sliding windows exists.

References

[1]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proc. of the 1993 ACM SIGMOD Intl. Conf. on Management of Data, pages 207--216, May 1993.
[2]
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 1--16, June 2002.
[3]
B. Babcock, M. Datar, and R. Motwani. Sampling from a moving window over streaming data. In Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms, pages 633--634, Jan. 2002.
[4]
B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In Proc. of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 234--243, June 2003.
[5]
M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds for selection. Journal of Computer and System Sciences, 7(4):448--461, Aug. 1973.
[6]
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In Proc. of the 29th Intl. Coll. on Automata, Languages and Programming, pages 693--703, July 2002.
[7]
G. Cormode and S. Muthukrishnan. What's hot and what's not: tracking most frequent items dynamically. In Proc. of the 22nd ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 296--306, June 2003.
[8]
G. Cormode and S. Muthukrishnan. An improved data stream summary: The count-min sketch and its applications. In LATIN 2004, pages 29--38, Apr. 2004.
[9]
M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. In Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms, pages 635--644, Jan. 2002.
[10]
E. D. Demaine, A. López-Ortiz, and J. I. Munro. Frequency estimation of internet packet streams with limited space. In Proc. of the 10th Annual European Symp. on Algorithms, pages 348--360, Sept. 2002.
[11]
D. J. DeWitt, J. F. Naughton, and D. A. Schneider. Parallel sorting on a shared-nothing architecture using probabilistic splitting. In Proc. of the 1st Intl. Conf. on Parallel and Distributed Information Systems, pages 280--291, Dec. 1991.
[12]
R. W. Floyd and R. L. Rivest. Expected time bounds for selection. Comm. of the ACM, 18(3):165--172, Mar. 1975.
[13]
P. B. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pages 331--342, June 1998.
[14]
P. B. Gibbons and S. Tirthapura. Distributed streams algorithms for sliding windows. In Proc. of the 14th Annual ACM Symp. on Parallel Algs. and Architectures, pages 63--72, Aug. 2002.
[15]
M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In Proc. of the 2001 ACM SIGMOD Intl. Conf. on Management of Data, pages 58--66, May 2001.
[16]
R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. on Database Systems, 28(1):51--55, Mar. 2003.
[17]
X. Lin, H. Lu, J. Xu, and J. X. Yu. Continuously maintaining quantile summaries of the most recent N elements over a data stream. In Proc. of the 20th Intl. Conf. on Data Engineering, Mar. 2004. (to appear).
[18]
G. S. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. of the 28th Intl. Conf. on Very Large Data Bases, pages 356--357, Aug. 2002.
[19]
G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory. In Proc. of the 1998 ACM SIGMOD Intl. Conf. on Management of Data, pages 426--435, June 1998.
[20]
G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. In Proc. of the 1999 ACM SIGMOD Intl. Conf. on Management of Data, pages 251--262, June 1999.
[21]
J. Misra and D. Gries. Finding repeated elements. Sci. Comput. Programming, 2(2):143--152, Nov. 1982.
[22]
R. Motwani, J. Widom, et al. Query processing, approximation, and resource management in a data stream management system. In Proc. of the 1st Conf. on Innovative Data Systems Research, pages 245--256, Jan. 2003.
[23]
J. I. Munro and M. Paterson. Selection and sorting with limited storage. Theoretical Computer Science, pages 315--323, 1980.
[24]
M. Paterson. Progress in selection. In Proc. of the 5th Scandinavian Workshop on Algorithm Theory, pages 368--379, July 1996.
[25]
I. Pohl. A minimum storage algorithm for computing the median. Technical Report IBM Research Report RC 2701 (# 12713), IBM T. J. Watson Center, 1969.
[26]
H. Toivonen. Sampling large databases for association rules. In Proc. of the 22nd Intl. Conf. on Very Large Data Bases, pages 134--145, Sept. 196.
[27]
J. S. Vitter. Random sampling with a reservoir. ACM Trans. on Mathematical Software, 11(1):37--57, Mar. 1985.

Cited By

View all
  • (2024)Validation of SeptiCyte RAPID to Discriminate Sepsis from Non-Infectious Systemic InflammationJournal of Clinical Medicine10.3390/jcm1305119413:5(1194)Online publication date: 20-Feb-2024
  • (2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
  • (2024)Approximate Matrix Multiplication over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671819(3896-3906)Online publication date: 25-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '04: Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2004
350 pages
ISBN:158113858X
DOI:10.1145/1055558
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2004

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS04

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)9
Reflects downloads up to 30 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Validation of SeptiCyte RAPID to Discriminate Sepsis from Non-Infectious Systemic InflammationJournal of Clinical Medicine10.3390/jcm1305119413:5(1194)Online publication date: 20-Feb-2024
  • (2024)Optimal Matrix Sketching over Sliding WindowsProceedings of the VLDB Endowment10.14778/3665844.366584717:9(2149-2161)Online publication date: 1-May-2024
  • (2024)Approximate Matrix Multiplication over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671819(3896-3906)Online publication date: 25-Aug-2024
  • (2024)DPSW-Sketch: A Differentially Private Sketch Framework for Frequency Estimation over Sliding WindowsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671694(3255-3266)Online publication date: 25-Aug-2024
  • (2024)FPCS: Feature Preserving Compensated Sampling of Streaming Time Series DataIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2024.345637531:1(1333-1342)Online publication date: 9-Sep-2024
  • (2024)HSS: A Memory-Efficient, Accurate, and Fast Network Measurement Framework in Sliding WindowsIEEE Transactions on Network and Service Management10.1109/TNSM.2024.346075121:6(5958-5976)Online publication date: Dec-2024
  • (2024)A Hybrid Method to Interest-informed and Mobility-aware Mobile Service Migration in Edge Computing2024 IEEE International Conference on Web Services (ICWS)10.1109/ICWS62655.2024.00097(778-787)Online publication date: 7-Jul-2024
  • (2024)Ring Sketch:A Generic, Low-Complexity, and Hardware-Friendly Traffic Measurement Framework over Sliding WindowsICC 2024 - IEEE International Conference on Communications10.1109/ICC51166.2024.10622576(1102-1108)Online publication date: 9-Jun-2024
  • (2024)Optimal Quantile Estimation: Beyond the Comparison Model2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS61266.2024.00075(1137-1158)Online publication date: 27-Oct-2024
  • (2024)Enhancing data preparation: insights from a time series case studyJournal of Intelligent Information Systems10.1007/s10844-024-00867-862:6(1503-1530)Online publication date: 1-Dec-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media