skip to main content
10.1145/509907.509964acmconferencesArticle/Chapter ViewAbstractPublication PagesstocConference Proceedingsconference-collections
Article

Approximate counting of inversions in a data stream

Published:19 May 2002Publication History

ABSTRACT

(MATH) Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model {14, 2] is a natural setting to design efficient algorithms.We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve is $O(\log n \log \log n)$ through a deterministic algorithm. In contrast, we derive an $\Omega(n)$ lower bound for randomized exact computation for this problem; thus approximation is essential.(MATH) We also consider two generalizations of this problem: (1) approximating the number of inversions between two permutations, for which we obtain a randomized $O(\sqrt{n} \log n)$-space algorithm, and (2) approximating the number of inversions in a general list, for which we obtain a randomized $O(\sqrt{n} \log^2 n)$-space two-pass algorithm. In contrast, we derive $\Omega(n)$-space lower bounds for deterministic approximate computation for these problems; thus both randomization and approximation are essential.All our algorithms use only O(log n) time per data item.

References

  1. N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algorithm for the maximal independent set problem. J. Algorithms, 7(4):567--583, 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. JCSS, 58(1):137--147, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Andersson and O. Petersson. Approximate indexed lists. J. Algorithms, 29(2):256--276, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. WWW7/Computer Networks, 30(1-7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. Caprara. Sorting permutations by reversals and Eulerian cycle decompositions. SIAM J. Discrete (MATH)., 12:91--110, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Diaconis. Group Representation in Probability and Statistics. IMS Lecture Series 11, IMS, 1988.Google ScholarGoogle Scholar
  7. P. Diaconis and R. Graham. Spearman's footrule as a measure of disarray. J. of the Royal Statistical Society, Series B, 39(2):262--268, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  8. P. F. Dietz. Optimal algorithms for list indexing and subset rank. Proc. WADS, Springer LNCS 382:39--46, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Ranking aggregation methods for the Web. Proc. 10th WWW, pp. 613--622, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Ergun, S. Kannan, R. Kumar. R. Rubinfeld, and M. Viswanathan. Spot-checkers. JCSS, 60(3):717--751, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. V. Estivill-Castro and D. Wood. A survey of adaptive sorting algorithms. ACM Computing Surveys, 24(4):441--476, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Feigenbaum, S. Kannan, M. Strauss, and M. Viswanathan. An approximate {L1}-difference algorithm for massive data streams. Proc. 40th FOCS, pp. 501--511, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Flajolet. Approximate counting: A detailed analysis. BIT, 25:113--134, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. C. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. J. Strauss. A few good terms: Efficient streaming computation of wavelet decompositions. Manuscript, 2001.Google ScholarGoogle Scholar
  15. S. Guha, N. Koudas, and K. Shim. Data streams and histograms. Proc. 33rd STOC, pp. 471--475, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. DIMACS series in Discr. (MATH). & Theor. Comp. Sc., 50:107--118, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Kalyanasundaram and G. Schnitger. The probabilistic communication complexity of set intersection. SIAM J. Discrete (MATH)., 5(4):545--557, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. E. Knuth. The Art of Computer Programming III: Sorting and Searching. Addison-Wesley, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Kushilevitz and N. Nisan. Communication complexity. Cambridge University Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Levcopoulos and O. Petersson. Exploiting few inversions when sorting: sequential and parallel algorithms. TCS, 163(1&2):211--238, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Morris. Counting large number of events in small registers. C. ACM, 21(10):840--842, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximate counting of inversions in a data stream

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            STOC '02: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
            May 2002
            840 pages
            ISBN:1581134959
            DOI:10.1145/509907

            Copyright © 2002 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 19 May 2002

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • Article

            Acceptance Rates

            STOC '02 Paper Acceptance Rate91of287submissions,32%Overall Acceptance Rate1,469of4,586submissions,32%

            Upcoming Conference

            STOC '24
            56th Annual ACM Symposium on Theory of Computing (STOC 2024)
            June 24 - 28, 2024
            Vancouver , BC , Canada

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader