Article

Space complexity of hierarchical heavy hitters in multi-dimensional data streams

Authors:
John Hershberger

Mentor Graphics Corp., Wilsonville, OR

Mentor Graphics Corp., Wilsonville, OR
View Profile

,
Nisheeth Shrivastava

University of California at Santa Barbara, Santa Barbara, CA

University of California at Santa Barbara, Santa Barbara, CA
View Profile

,
Subhash Suri

University of California at Santa Barbara, Santa Barbara, CA

University of California at Santa Barbara, Santa Barbara, CA
View Profile

,
Csaba D. Tóth

MIT, Cambridge, MA

MIT, Cambridge, MA
View Profile

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsJune 2005Pages 338–347https://doi.org/10.1145/1065167.1065211

Published:13 June 2005Publication History

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 338–347

ABSTRACT

Heavy hitters, which are items occurring with frequency above a given threshold, are an important aggregation and summary tool when processing data streams or data warehouses. Hierarchical heavy hitters (HHHs) have been introduced as a natural generalization for hierarchical data domains, including multi-dimensional data. An item x in a hierarchy is called a ϕ-HHH if its frequency after discounting the frequencies of all its descendant hierarchical heavy hitters exceeds ϕn, where ϕ is a user-specified parameter and n is the size of the data set. Recently, single-pass schemes have been proposed for computing ϕ-HHHs using space roughly O(1/ϕ log(ϕn)). The frequency estimates of these algorithms, however, hold only for the total frequencies of items, and not the discounted frequencies; this leads to false positives because the discounted frequency can be significantly smaller than the total frequency. This paper attempts to explain the difficulty of finding hierarchical heavy hitters with better accuracy. We show that a single-pass deterministic scheme that computes ϕ-HHHs in a d-dimensional hierarchy with any approximation guarantee must use Ω(1/ϕ^d+1) space. This bound is tight: in fact, we present a data stream algorithm that can report the ϕ-HHHs without false positives in O(1/ϕ^d+1) space.

References

N. Alon, Y. Matias, M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci.58 (1999), 137--147. Google ScholarDigital Library
A. Arasu and G. Manku. Approximate counts and quantiles over sliding windows. In Proc. 23rd PODS, 2004, ACM Press, pp. 286--296. Google ScholarDigital Library
B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, 2002, ACM Press, pp. 1--16. Google ScholarDigital Library
M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In 29th Proc. ICALP, LNCS, Springer-Verlag, 2002, pp. 693--703. Google ScholarDigital Library
G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Finding hierarchical heavy hitters in data streams. In Proc. 29th Conf. on Very Large Data Bases, 2003. Google ScholarDigital Library
G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Diamond in the rough: Finding hierarchical heavy hitters in multi-dimensional data. In Proc. ACM SIGMOD, 2004. Google ScholarDigital Library
E. D. Demaine, A. López-Ortiz, and J. I. Munro. Frequency estimation of internet packet streams with limited space. In Proc. 10th European Sympos. Algorithms, LNCS 2461, 2002, pp. 348--360. Google ScholarDigital Library
C. Estan, S. Savage, and G. Varghese. Automatically inferring patterns of resource consumption in network traffic. In Proc. of ACM SIGCOMM, 2003. Google ScholarDigital Library
C. Estan and G. Varghese. New directions in traffic measurement and accounting. In Proc. 1st ACM SIGCOMM Workshop on Internet Measurement, 2001, pp. 75--80. Google ScholarDigital Library
R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems28 (1) (2003), 51--55. Google ScholarDigital Library
G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 28th Conf. Very Large Data Bases, 2002, pp. 346--357. Google ScholarDigital Library
J. Misra and D. Gries. Finding repeated elements. Sci. Comput. Programming2 (1982), 143--152.Google ScholarDigital Library
S. Muthukrishnan. Data streams: Algorithms and applications. Preprint, 2003.Google Scholar

Recommendations

Finding hierarchical heavy hitters in streaming data

Data items that arrive online as streams typically have attributes which take values from one or more hierarchies (time and geographic location, source and destination IP addresses, etc.). Providing an aggregate view of such data is important for ...
Read More
Beating CountSketch for heavy hitters in insertion streams
STOC '16: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing

Given a stream p₁, …, p_m of items from a universe U, which, without loss of generality we identify with the set of integers {1, 2, …, n}, we consider the problem of returning all ℓ₂-heavy hitters, i.e., those items j for which f_j ≥ є √F₂, where f_j is ...
Read More
Identifying correlated heavy-hitters in a two-dimensional data stream

We consider online mining of correlated heavy-hitters (CHH) from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2005
388 pages
ISBN:1595930620
DOI:10.1145/1065167
General Chair:
Georg Gottlob
Vienna University of Technology, Austria
,
Program Chair:
Foto Afrati
National Technical University of Athens, Greece
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 13 June 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate642of2,707submissions,24%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 31
  Total Citations
  View Citations
- 404
  Total Downloads
- Downloads (Last 12 months)5
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Space complexity of hierarchical heavy hitters in multi-dimensional data streams

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Recommendations

Finding hierarchical heavy hitters in streaming data

Beating CountSketch for heavy hitters in insertion streams

Identifying correlated heavy-hitters in a two-dimensional data stream

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Space complexity of hierarchical heavy hitters in multi-dimensional data streams

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Recommendations

Finding hierarchical heavy hitters in streaming data

Beating CountSketch for heavy hitters in insertion streams

Identifying correlated heavy-hitters in a two-dimensional data stream

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media