skip to main content
10.1145/1142351.1142392acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Deterministic k-set structure

Published: 26 June 2006 Publication History

Abstract

A k-set structure over data streams is a bounded-space data structure that supports stream insertion and deletion operations and returns the set of (item, frequency) pairs in the stream, provided, the number of distinct items in the stream does not exceed k; and returns nil otherwise. This is a fundamental problem with applications in data streaming [14], data reconciliation in distributed systems [12] and mobile computing [16], etc. In this paper, we present a deterministic algorithm for the k-set problem that matches the space lower bound to within a logarithmic factor.

References

[1]
"LAPACK: The Linear Algebra Package". Available from "www.netlib.org/LAPACK".
[2]
M. Charikar, K. Chen, and M. Farach-Colton. "Finding frequent items in data streams". In Proc. ICALP, 2002.
[3]
V. Chauhan and A. Trachtenberg. "Reconciliation puzzles". In Proc. IEEE GLOBECOM, 2004.
[4]
G. Cormode and S. Muthukrishnan. "An improved data stream summary: The Count-Min sketch and its applications". In Proc. LATIN, pages 29--38, 2004.
[5]
Graham Cormode and S. Muthukrishnan. "What's Hot and What's Not: Tracking Most Frequent Items Dynamically". In Proc. ACM PODS, pages 296--306, 2003.
[6]
A.J. Demers, D. H. Greene, C. Hause, W. Irish, and J. Larson. "Epidemic algorithms for replicated database maintenance". In Proc. ACM PODC, pages 1--12, August 1987.
[7]
Sumit Ganguly. "Counting distinct items over update streams". In Proc. ISAAC, pages 505--514, 2005.
[8]
L. Gasieniec and S. Muthukrishnan. "Deterministic algorithm for estimating heavy-hitters on Turnstile data streams". Manuscript, 2005.
[9]
G. H. Golub and C. F. Van Loan. "Matrix Computations". J. Hopkins Univ. Press, 3rd ed., 1996.
[10]
K. M. Hoffman and R. Kunze. "Linear Algebra". Prentice-Hall, 2nd Ed., 1971.
[11]
The Mathworks, Natick, MA, USA. "MATLAB 7.1".
[12]
Y. Minsky, A. Trachtenberg, and R. Zippel. "Set Reconciliation with Nearly Optimal Communication Complexity". IEEE Trans. Inf. Theory, 49(9):2213--2218, 2003.
[13]
R. Motwani and P. Raghavan. "Randomized Algorithms". Cambridge University Press, 1995.
[14]
S. Muthukrishnan. "Data Streams: Algorithms and Applications". Foundations and Trends in Theoretical Computer Science, Vol. 1, Issue 2, 2005.
[15]
M. Satyanarayanan. "Scalable, Secure, and Highly Available Distributed File Access". IEEE Computer, 23(5), May 1990.
[16]
D. Starobinski, A. Trachtenberg, and S. Agarwal. "Efficient PDA synchronization". IEEE Trans. on Mob. Comp., 2(1):40--51, 2003.
[17]
Gilbert Strang. "Introduction to Linear Algebra". Wellesley-Cambridge Press, 2nd ed., 1998.

Cited By

View all
  • (2023)Improved algorithms for white-box adversarial streamsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618807(9962-9975)Online publication date: 23-Jul-2023
  • (2019)L Samplers and Their ApplicationsACM Computing Surveys10.1145/329771552:1(1-31)Online publication date: 13-Feb-2019
  • (2011)Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom FiltersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.13223:2(297-306)Online publication date: 1-Feb-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2006
382 pages
ISBN:1595933182
DOI:10.1145/1142351
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 June 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data streams
  2. k-set structure

Qualifiers

  • Article

Conference

SIGMOD/PODS06

Acceptance Rates

PODS '06 Paper Acceptance Rate 35 of 185 submissions, 19%;
Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 07 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Improved algorithms for white-box adversarial streamsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3618807(9962-9975)Online publication date: 23-Jul-2023
  • (2019)L Samplers and Their ApplicationsACM Computing Surveys10.1145/329771552:1(1-31)Online publication date: 13-Feb-2019
  • (2011)Straggler Identification in Round-Trip Data Streams via Newton's Identities and Invertible Bloom FiltersIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2010.13223:2(297-306)Online publication date: 1-Feb-2011
  • (2008)Lower bounds on frequency estimation of data streamsProceedings of the 3rd international conference on Computer science: theory and applications10.5555/1813695.1813720(204-215)Online publication date: 7-Jun-2008
  • (2008)Data Stream Algorithms via Expander GraphsProceedings of the 19th International Symposium on Algorithms and Computation10.1007/978-3-540-92182-0_8(52-63)Online publication date: 15-Dec-2008
  • (2008)Lower Bounds on Frequency Estimation of Data Streams (Extended Abstract)Computer Science – Theory and Applications10.1007/978-3-540-79709-8_22(204-215)Online publication date: 2008
  • (2008)Finding Frequent Items in a Turnstile Data StreamProceedings of the 14th annual international conference on Computing and Combinatorics10.1007/978-3-540-69733-6_49(498-509)Online publication date: 27-Jun-2008
  • (2007)CR-PRECISProceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies10.5555/2399256.2399261(48-59)Online publication date: 7-Apr-2007
  • (2007)Space-efficient straggler identification in round-trip data streams via newton's identities and invertible bloom filtersProceedings of the 10th international conference on Algorithms and Data Structures10.5555/2394893.2394968(637-648)Online publication date: 15-Aug-2007
  • (2007)Space-Efficient Straggler Identification in Round-Trip Data Streams Via Newton’s Identities and Invertible Bloom FiltersAlgorithms and Data Structures10.1007/978-3-540-73951-7_55(637-648)Online publication date: 2007

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media