skip to main content
10.1145/1065167.1065211acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Space complexity of hierarchical heavy hitters in multi-dimensional data streams

Published:13 June 2005Publication History

ABSTRACT

Heavy hitters, which are items occurring with frequency above a given threshold, are an important aggregation and summary tool when processing data streams or data warehouses. Hierarchical heavy hitters (HHHs) have been introduced as a natural generalization for hierarchical data domains, including multi-dimensional data. An item x in a hierarchy is called a ϕ-HHH if its frequency after discounting the frequencies of all its descendant hierarchical heavy hitters exceeds ϕn, where ϕ is a user-specified parameter and n is the size of the data set. Recently, single-pass schemes have been proposed for computing ϕ-HHHs using space roughly O(1/ϕ log(ϕn)). The frequency estimates of these algorithms, however, hold only for the total frequencies of items, and not the discounted frequencies; this leads to false positives because the discounted frequency can be significantly smaller than the total frequency. This paper attempts to explain the difficulty of finding hierarchical heavy hitters with better accuracy. We show that a single-pass deterministic scheme that computes ϕ-HHHs in a d-dimensional hierarchy with any approximation guarantee must use Ω(1/ϕd+1) space. This bound is tight: in fact, we present a data stream algorithm that can report the ϕ-HHHs without false positives in O(1/ϕd+1) space.

References

  1. N. Alon, Y. Matias, M. Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci.58 (1999), 137--147. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Arasu and G. Manku. Approximate counts and quantiles over sliding windows. In Proc. 23rd PODS, 2004, ACM Press, pp. 286--296. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom. Models and issues in data stream systems. In 21st PODS, 2002, ACM Press, pp. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In 29th Proc. ICALP, LNCS, Springer-Verlag, 2002, pp. 693--703. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Finding hierarchical heavy hitters in data streams. In Proc. 29th Conf. on Very Large Data Bases, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Diamond in the rough: Finding hierarchical heavy hitters in multi-dimensional data. In Proc. ACM SIGMOD, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. D. Demaine, A. López-Ortiz, and J. I. Munro. Frequency estimation of internet packet streams with limited space. In Proc. 10th European Sympos. Algorithms, LNCS 2461, 2002, pp. 348--360. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Estan, S. Savage, and G. Varghese. Automatically inferring patterns of resource consumption in network traffic. In Proc. of ACM SIGCOMM, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Estan and G. Varghese. New directions in traffic measurement and accounting. In Proc. 1st ACM SIGCOMM Workshop on Internet Measurement, 2001, pp. 75--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Transactions on Database Systems28 (1) (2003), 51--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Manku and R. Motwani. Approximate frequency counts over data streams. In Proc. 28th Conf. Very Large Data Bases, 2002, pp. 346--357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Misra and D. Gries. Finding repeated elements. Sci. Comput. Programming2 (1982), 143--152.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Muthukrishnan. Data streams: Algorithms and applications. Preprint, 2003.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
    June 2005
    388 pages
    ISBN:1595930620
    DOI:10.1145/1065167
    • General Chair:
    • Georg Gottlob,
    • Program Chair:
    • Foto Afrati

    Copyright © 2005 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 June 2005

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate642of2,707submissions,24%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader