Abstract
We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet ∑. The size of the new index built on a string T[1, n] is bounded by nHκ(T)+O((n log log n)/ log|∑| n) bits, where Hκ(T) is the κ-th order empirical entropy of T. The above bound holds simultaneously for all κ < α log|∑| n and 0 < α < 1. Moreover, the index design does not depend on the parameter κ, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1, p] as a substring of T takes O(p log |∑|) time. Locating each pattern occurrence takes O(log |∑| (log2 n/ log log n)) time. Reporting a text substring of length ℓ takes O((ℓ + log2 n/ log log n) log |∑|) time.
| Lingua originale | Inglese |
|---|---|
| pagine (da-a) | 150-160 |
| Numero di pagine | 11 |
| Rivista | Lecture Notes in Computer Science |
| Volume | 3246 |
| DOI | |
| Stato di pubblicazione | Pubblicato - 2004 |
| Pubblicato esternamente | Sì |
Fingerprint
Entra nei temi di ricerca di 'An alphabet-friendly FM-index'. Insieme formano una fingerprint unica.Cita questo
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver