TY - JOUR
T1 - Compression of low entropy strings with Lempel-Ziv algorithms
AU - Kosaraju, S. Rao
AU - Manzini, Giovanni
PY - 2000
Y1 - 2000
N2 - We compare the compression ratio of the Lempel-Ziv algorithms with the empirical entropy of the input string. This approach makes it possible to analyze the performance of these algorithms without any assumption on the input and to obtain worst case results. We show that in this setting the standard definition of optimal compression algorithm is not satisfactory. In fact, although Lempel-Ziv algorithms are optimal according to the standard definition, there exist families of low entropy strings which are not compressed optimally. More precisely, the compression ratio achieved by LZ78 (resp., L277) can be much higher than the zeroth order entropy H0 (resp., the first order entropy H1). For this reason we introduce the concept of λ-optimal algorithm. An algorithm is λ-optimal with respect to Hk if, loosely speaking, its compression ratio is asymptotically bounded by λ times the kth order empirical entropy Hk. We prove that LZ78 cannot be λ-optimal with respect to any Hk with k ≥ 0. Then, we describe a new algorithm which combines LZ78 with run length encoding (RLE) and is 3-optimal with respect to H0. Finally, we prove that LZ77 is 8-optimal with respect to H0, and that it cannot be λ-optimal with respect to Hk for any k ≥ 1.
AB - We compare the compression ratio of the Lempel-Ziv algorithms with the empirical entropy of the input string. This approach makes it possible to analyze the performance of these algorithms without any assumption on the input and to obtain worst case results. We show that in this setting the standard definition of optimal compression algorithm is not satisfactory. In fact, although Lempel-Ziv algorithms are optimal according to the standard definition, there exist families of low entropy strings which are not compressed optimally. More precisely, the compression ratio achieved by LZ78 (resp., L277) can be much higher than the zeroth order entropy H0 (resp., the first order entropy H1). For this reason we introduce the concept of λ-optimal algorithm. An algorithm is λ-optimal with respect to Hk if, loosely speaking, its compression ratio is asymptotically bounded by λ times the kth order empirical entropy Hk. We prove that LZ78 cannot be λ-optimal with respect to any Hk with k ≥ 0. Then, we describe a new algorithm which combines LZ78 with run length encoding (RLE) and is 3-optimal with respect to H0. Finally, we prove that LZ77 is 8-optimal with respect to H0, and that it cannot be λ-optimal with respect to Hk for any k ≥ 1.
KW - Data compression
KW - Empirical entropy
KW - Lempel-Ziv parsing
UR - http://www.scopus.com/inward/record.url?scp=0033294876&partnerID=8YFLogxK
U2 - 10.1137/S0097539797331105
DO - 10.1137/S0097539797331105
M3 - Article
SN - 0097-5397
VL - 29
SP - 893
EP - 911
JO - SIAM Journal on Computing
JF - SIAM Journal on Computing
IS - 3
ER -