Move-to-front, distance coding, and inversion frequencies revisited

Travis Gagie, Giovanni Manzini

Research output: Contribution to journalArticlepeer-review

Abstract

Move-to-Front, Distance Coding and Inversion Frequencies are three simple and effective techniques used to process the output of the BurrowsWheeler Transform. In this paper we provide the first complete comparative analyses of these techniques, establishing upper and lower bounds on their compression ratios. We describe simple variants of these three techniques that compress any string up to a constant factor of its kth-order empirical entropy for any k≥0. At the same time we prove lower bounds for the compression of arbitrary strings which show these variants to be nearly optimal. The bounds we establish are "entropy-only" bounds in the sense that they do not involve non-constant overheads. Our analyses provide new insights into the inner workings of these techniques, partially explain their good behavior in practice, and suggest strategies for improving their performance.

Original languageEnglish
Pages (from-to)2925-2944
Number of pages20
JournalTheoretical Computer Science
Volume411
Issue number31-33
DOIs
Publication statusPublished - 28 Jun 2010
Externally publishedYes

Keywords

  • BurrowsWheeler Transform
  • Data Compression
  • Empirical entropy

Fingerprint

Dive into the research topics of 'Move-to-front, distance coding, and inversion frequencies revisited'. Together they form a unique fingerprint.

Cite this