An analysis of the Burrows-Wheeler Transform

Giovanni Manzini

Research output: Contribution to journalArticlepeer-review

Abstract

The Burrows-Wheeler Transform (also known as Block-Sorting) is at the base of compression algorithms that are the state of the art in lossless data compression. In this paper, we analyze two algorithms that use this technique. The first one is the original algorithm described by Burrows and Wheeler, which, despite its simplicity, outperforms the Gzip compressor. The second one uses an additional run-length encoding step to improve compression. We prove that the compression ratio of both algorithms can be bounded in terms of the kth order empirical entropy of the input string for any k ≥ 0. We make no assumptions on the input and we obtain bounds which hold in the worst case, that is, for every possible input string. All previous results for Block-Sorting algorithms were concerned with the average compression ratio and have been established assuming that the input comes from a finite-order Markov source.

Original languageEnglish
Pages (from-to)407-430
Number of pages24
JournalJournal of the ACM
Volume48
Issue number3
DOIs
Publication statusPublished - May 2001
Externally publishedYes

Keywords

  • Block sorting
  • Burrows-Wheeler Transform
  • Move-to-front encoding
  • Worst-case analysis of compression

Fingerprint

Dive into the research topics of 'An analysis of the Burrows-Wheeler Transform'. Together they form a unique fingerprint.

Cite this