Abstract
We design two compressed data structures for the full-text indexing problem that support efficient substring searches using roughly the space required for storing the text in compressed form. Our first compressed data structure retrieves the occ occurrences of a pattern P[1, p] within a text T[1, n] in O(p + occ log1+ε n) time for any chosen ε, 0 < ε < 1 This data structure uses at most 5nHk(T) + o(n) bits of storage, where Hk(T) is the kth order empirical entropy of T. The space usage is &(n) bits in the worst case and o(n) bits for compressible texts. This data structure exploits the relationship between suffix arrays and the Burrows-Wheeler Transform, and can be regarded as a compressed suffix array. Our second compressed data structure achieves O(p + occ) query time using O(nHk(T)logε n) + o(n) bits of storage for any chosen ε, 0 < ε < 1. Therefore, it provides optimal output-sensitive query time using o(n log n) bits in the worst case. This second data structure builds upon the first one and exploits the interplay between two compressors: the Burrows-Wheeler Transform and the LZ78 algorithm.
Lingua originale | Inglese |
---|---|
pagine (da-a) | 552-581 |
Numero di pagine | 30 |
Rivista | Journal of the ACM |
Volume | 52 |
Numero di pubblicazione | 4 |
DOI | |
Stato di pubblicazione | Pubblicato - 2005 |
Pubblicato esternamente | Sì |