skip to main content
research-article

Sparse Text Indexing in Small Space

Authors Info & Claims
Published:25 April 2016Publication History
Skip Abstract Section

Abstract

In this work, we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays, and sparse position heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction.

Attempts at breaking the naïve bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were not obtained until 1996, but only for the case in which the b suffixes were evenly spaced in T. In this article, there is no constraint on the locations of the suffixes.

Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(nlog 2b) time. To achieve this, we develop a technique that allows one to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte Carlo, and outputs the correct tree with high probability. We then give a Las Vegas algorithm, which also uses O(b) space and runs in the same time bounds with high probability when b = O(√ n). Additional trade-offs between space usage and construction time for the Monte Carlo algorithm are given.

Finally, we show that, at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n + blog b) time and O(b) space.

References

  1. Miklós Ajtai, János Komlós, and Endre Szemerédi. 1983. An O(nlog n) sorting network. In Proceedings of the 15th STOC. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arne Andersson, N. Jesper Larsson, and Kurt Swanson. 1996. Suffix trees on words. In Proceedings of the 7th CPM. Lecture Notes in Computer Science, Vol. 1075. 102--115. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Arne Andersson, N. Jesper Larsson, and Kurt Swanson. 1999. Suffix trees on words. Algorithmica 23, 3, 246--260.Google ScholarGoogle ScholarCross RefCross Ref
  4. Kenneth E. Batcher. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring JCC. 307--314. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Jon L. Bentley and Robert Sedgewick. 1997. Fast algorithms for sorting and searching strings. In Proceedings of the 8th SODA. 360--369. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Stefan Burkhardt and Juha Kärkkäinen. 2003. Fast lightweight suffix array construction and checking. In Proceedings of the 14th CPM. Lecture Notes in Computer Science, Vol. 2676. 55--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Maxime Crochemore and Wojciech Rytter. 1991. Usefulness of the Karp-Miller-Rosenberg algorithm in parallel computations on strings and arrays. Theoretical Computer Science 88, 1, 59--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Andrzej Ehrenfeucht, Ross M. McConnell, Nissa Osheim, and Sung-Whan Woo. 2011. Position heaps: A simple and dynamic text indexing data structure. Journal of Discrete Algorithms 9, 1, 100--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Paolo Ferragina and Johannes Fischer. 2007. Suffix arrays on words. In Proceedings of the 18th CPM. Lecture Notes in Computer Science, Vol. 4580. 328--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nathan J. Fine and Herbert S. Wilf. 1965. Uniqueness theorems for periodic functions. Proceedings of the AMS 16, 1, 109--114.Google ScholarGoogle ScholarCross RefCross Ref
  11. Johannes Fischer, Tomohiro I, and Dominik Köppl. 2015. Deterministic sparse suffix sorting on rewritable texts. In arXiv:1509.07417.Google ScholarGoogle Scholar
  12. Shunsuke Inenaga and Masayuki Takeda. 2006. On-line linear-time construction of word suffix trees. In Proceedings of the 17th CPM. Lecture Notes in Computer Science, Vol. 4009. 60--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Juha Kärkkäinen and Esko Ukkonen. 1996. Sparse suffix trees. In Proceedings of the 2nd COCOON. Lecture Notes in Computer Science, Vol. 1090. 219--230. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Richard M. Karp, Raymond E. Miller, and Arnold L. Rosenberg. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the 4th STOC. ACM, 125--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Richard M. Karp and Michael O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31, 2, 249--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th CPM. Lecture Notes in Computer Science, Vol. 2089. 181--192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Roman Kolpakov, Gregory Kucherov, and Tatiana A. Starikovskaya. 2011. Pattern matching on sparse suffix trees. In Proceedings of the 1st CCP. 92--97. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Udi Manber and Gene Myers. 1993. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 5, 935--948. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Donald R. Morrison. 1968. PATRICIA—practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15, 4, 514--534. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mike Paterson. 1990. Improved sorting networks with O(log N) depth. Algorithmica 5, 1, 65--92.Google ScholarGoogle ScholarCross RefCross Ref
  21. Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. 2014. Faster sparse suffix sorting. In Proceedings of the 31st STACS. 386--396.Google ScholarGoogle Scholar
  22. Takashi Uemura and Hiroki Arimura. 2011. Sparse and truncated suffix trees on variable-length codes. In Proceedings of the 22nd CPM. Lecture Notes in Computer Science, Vol. 6661. 246--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Peter Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th FOCS (SWAT). 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Dan E. Willard. 1983. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters 17, 2, 81--84.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Sparse Text Indexing in Small Space

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Algorithms
        ACM Transactions on Algorithms  Volume 12, Issue 3
        June 2016
        408 pages
        ISSN:1549-6325
        EISSN:1549-6333
        DOI:10.1145/2930058
        Issue’s Table of Contents

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 25 April 2016
        • Accepted: 1 October 2015
        • Revised: 1 July 2015
        • Received: 1 January 2014
        Published in talg Volume 12, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader