Move-to-front, distance coding, and inversion frequencies revisited

Travis Gagie, Giovanni Manzini

Risultato della ricerca: Capitolo in libro/report/atti di convegnoContributo a conferenzapeer review

Abstract

Move-to-Front, Distance Coding and Inversion Frequencies are three somewhat related techniques used to process the output of the Burrows-Wheeler Transform. In this paper we analyze these techniques from the point of view of how effective they are in the task of compressing low-entropy strings, that is, strings which have many regularities and are therefore highly compressible. This is a non-trivial task since many compressors have non-constant overheads that become non-negligible when the input string is highly compressible. Because of the properties of the Burrows-Wheeler transform, being locally optimal ensures an algorithm compresses low-entropy strings effectively. Informally, local optimality implies that an algorithm is able to effectively compress an arbitrary partition of the input string. We show that in their original formulation neither Move-to-Front, nor Distance Coding, nor Inversion Frequencies is locally optimal. Then, we describe simple variants of the above algorithms which are locally optimal. To achieve local optimality with Move-to-Front it suffices to combine it with Run Length Encoding. To achieve local optimality with Distance Coding and Inversion Frequencies we use a novel "escape and re-enter" strategy. Since we build on previous results, our analyses are simple and shed new light on the inner workings of the three techniques considered in this paper.

Lingua originaleInglese
Titolo della pubblicazione ospiteCombinatorial Pattern Matching - 18th Annual Symposium, CPM 2007, Proceedings
EditoreSpringer Verlag
Pagine71-82
Numero di pagine12
ISBN (stampa)9783540734369
DOI
Stato di pubblicazionePubblicato - 2007
Pubblicato esternamente
Evento18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007 - London, ON, Canada
Durata: 9 lug 200711 lug 2007

Serie di pubblicazioni

NomeLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume4580 LNCS
ISSN (stampa)0302-9743
ISSN (elettronico)1611-3349

???event.eventtypes.event.conference???

???event.eventtypes.event.conference???18th Annual Symposium on Combinatorial Pattern Matching, CPM 2007
Paese/TerritorioCanada
CittàLondon, ON
Periodo9/07/0711/07/07

Fingerprint

Entra nei temi di ricerca di 'Move-to-front, distance coding, and inversion frequencies revisited'. Insieme formano una fingerprint unica.

Cita questo