An alphabet-friendly FM-index

  • Paolo Ferragina
  • , Giovanni Manzini
  • , Veli Mäkinen
  • , Gonzalo Navarro

Risultato della ricerca: Contributo su rivistaArticolo in rivistapeer review

Abstract

We show that, by combining an existing compression boosting technique with the wavelet tree data structure, we are able to design a variant of the FM-index which scales well with the size of the input alphabet ∑. The size of the new index built on a string T[1, n] is bounded by nHκ(T)+O((n log log n)/ log|∑| n) bits, where Hκ(T) is the κ-th order empirical entropy of T. The above bound holds simultaneously for all κ < α log|∑| n and 0 < α < 1. Moreover, the index design does not depend on the parameter κ, which plays a role only in analysis of the space occupancy. Using our index, the counting of the occurrences of an arbitrary pattern P[1, p] as a substring of T takes O(p log |∑|) time. Locating each pattern occurrence takes O(log |∑| (log2 n/ log log n)) time. Reporting a text substring of length ℓ takes O((ℓ + log2 n/ log log n) log |∑|) time.

Lingua originaleInglese
pagine (da-a)150-160
Numero di pagine11
RivistaLecture Notes in Computer Science
Volume3246
DOI
Stato di pubblicazionePubblicato - 2004
Pubblicato esternamente

Fingerprint

Entra nei temi di ricerca di 'An alphabet-friendly FM-index'. Insieme formano una fingerprint unica.

Cita questo