Abstract
In this work, we present efficient algorithms for constructing sparse suffix trees, sparse suffix arrays, and sparse position heaps for b arbitrary positions of a text T of length n while using only O(b) words of space during the construction.
Attempts at breaking the naïve bound of Ω(nb) time for constructing sparse suffix trees in O(b) space can be traced back to the origins of string indexing in 1968. First results were not obtained until 1996, but only for the case in which the b suffixes were evenly spaced in T. In this article, there is no constraint on the locations of the suffixes.
Our main contribution is to show that the sparse suffix tree (and array) can be constructed in O(nlog 2b) time. To achieve this, we develop a technique that allows one to efficiently answer b longest common prefix queries on suffixes of T, using only O(b) space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Our first solution is Monte Carlo, and outputs the correct tree with high probability. We then give a Las Vegas algorithm, which also uses O(b) space and runs in the same time bounds with high probability when b = O(√ n). Additional trade-offs between space usage and construction time for the Monte Carlo algorithm are given.
Finally, we show that, at the expense of slower pattern queries, it is possible to construct sparse position heaps in O(n + blog b) time and O(b) space.
- Miklós Ajtai, János Komlós, and Endre Szemerédi. 1983. An O(nlog n) sorting network. In Proceedings of the 15th STOC. 1--9. Google ScholarDigital Library
- Arne Andersson, N. Jesper Larsson, and Kurt Swanson. 1996. Suffix trees on words. In Proceedings of the 7th CPM. Lecture Notes in Computer Science, Vol. 1075. 102--115. Google ScholarDigital Library
- Arne Andersson, N. Jesper Larsson, and Kurt Swanson. 1999. Suffix trees on words. Algorithmica 23, 3, 246--260.Google ScholarCross Ref
- Kenneth E. Batcher. 1968. Sorting networks and their applications. In Proceedings of the AFIPS Spring JCC. 307--314. Google ScholarDigital Library
- Jon L. Bentley and Robert Sedgewick. 1997. Fast algorithms for sorting and searching strings. In Proceedings of the 8th SODA. 360--369. Google ScholarDigital Library
- Stefan Burkhardt and Juha Kärkkäinen. 2003. Fast lightweight suffix array construction and checking. In Proceedings of the 14th CPM. Lecture Notes in Computer Science, Vol. 2676. 55--69. Google ScholarDigital Library
- Maxime Crochemore and Wojciech Rytter. 1991. Usefulness of the Karp-Miller-Rosenberg algorithm in parallel computations on strings and arrays. Theoretical Computer Science 88, 1, 59--82. Google ScholarDigital Library
- Andrzej Ehrenfeucht, Ross M. McConnell, Nissa Osheim, and Sung-Whan Woo. 2011. Position heaps: A simple and dynamic text indexing data structure. Journal of Discrete Algorithms 9, 1, 100--121. Google ScholarDigital Library
- Paolo Ferragina and Johannes Fischer. 2007. Suffix arrays on words. In Proceedings of the 18th CPM. Lecture Notes in Computer Science, Vol. 4580. 328--339. Google ScholarDigital Library
- Nathan J. Fine and Herbert S. Wilf. 1965. Uniqueness theorems for periodic functions. Proceedings of the AMS 16, 1, 109--114.Google ScholarCross Ref
- Johannes Fischer, Tomohiro I, and Dominik Köppl. 2015. Deterministic sparse suffix sorting on rewritable texts. In arXiv:1509.07417.Google Scholar
- Shunsuke Inenaga and Masayuki Takeda. 2006. On-line linear-time construction of word suffix trees. In Proceedings of the 17th CPM. Lecture Notes in Computer Science, Vol. 4009. 60--71. Google ScholarDigital Library
- Juha Kärkkäinen and Esko Ukkonen. 1996. Sparse suffix trees. In Proceedings of the 2nd COCOON. Lecture Notes in Computer Science, Vol. 1090. 219--230. Google ScholarDigital Library
- Richard M. Karp, Raymond E. Miller, and Arnold L. Rosenberg. 1972. Rapid identification of repeated patterns in strings, trees and arrays. In Proceedings of the 4th STOC. ACM, 125--136. Google ScholarDigital Library
- Richard M. Karp and Michael O. Rabin. 1987. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31, 2, 249--260. Google ScholarDigital Library
- Toru Kasai, Gunho Lee, Hiroki Arimura, Setsuo Arikawa, and Kunsoo Park. 2001. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proceedings of the 12th CPM. Lecture Notes in Computer Science, Vol. 2089. 181--192. Google ScholarDigital Library
- Roman Kolpakov, Gregory Kucherov, and Tatiana A. Starikovskaya. 2011. Pattern matching on sparse suffix trees. In Proceedings of the 1st CCP. 92--97. Google ScholarDigital Library
- Udi Manber and Gene Myers. 1993. Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22, 5, 935--948. Google ScholarDigital Library
- Donald R. Morrison. 1968. PATRICIA—practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15, 4, 514--534. Google ScholarDigital Library
- Mike Paterson. 1990. Improved sorting networks with O(log N) depth. Algorithmica 5, 1, 65--92.Google ScholarCross Ref
- Tomohiro I, Juha Kärkkäinen, and Dominik Kempa. 2014. Faster sparse suffix sorting. In Proceedings of the 31st STACS. 386--396.Google Scholar
- Takashi Uemura and Hiroki Arimura. 2011. Sparse and truncated suffix trees on variable-length codes. In Proceedings of the 22nd CPM. Lecture Notes in Computer Science, Vol. 6661. 246--260. Google ScholarDigital Library
- Peter Weiner. 1973. Linear pattern matching algorithms. In Proceedings of the 14th FOCS (SWAT). 1--11. Google ScholarDigital Library
- Dan E. Willard. 1983. Log-logarithmic worst-case range queries are possible in space Θ(N). Information Processing Letters 17, 2, 81--84.Google ScholarCross Ref
Index Terms
- Sparse Text Indexing in Small Space
Recommendations
Deterministic Sparse Suffix Sorting in the Restore Model
Given a text T of length n, we propose a deterministic online algorithm computing the sparse suffix array and the sparse longest common prefix array of T in O(c √ lg n + m lg m lg n lg* n) time with O(m) words of space under the premise that the space ...
Sparse Suffix and LCP Array: Simple, Direct, Small, and Fast
LATIN 2024: Theoretical InformaticsAbstractSparse suffix sorting is the problem of sorting suffixes of a string of length n. Efficient sparse suffix sorting algorithms have existed for more than a decade. Despite the multitude of works and their justified claims for applications in ...
Optimal Substring Equality Queries with Applications to Sparse Text Indexing
We consider the problem of encoding a string of length n from an integer alphabet of size σ so access, substring equality, and Longest Common Extension (LCE) queries can be answered efficiently. We describe a new space-optimal data structure supporting ...
Comments