TY - JOUR
T1 - Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices
AU - Ferragina, Paolo
AU - Manzini, Giovanni
AU - Gagie, Travis
AU - Köppl, Dominik
AU - Navarro, Gonzalo
AU - Striani, Manuel
AU - Tosoni, Francesco
N1 - Publisher Copyright:
© 2022, VLDB Endowment., All rights reserved.
PY - 2022
Y1 - 2022
N2 - As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compression scheme for real-valued matrices which achieves efficient performance in terms of compression ratio and time for linear-algebra operations. Experiments show that, as a compressor, our tool is clearly superior to gzip and it is usually within 20% of xz in terms of compression ratio. In addition, our compressed format supports matrix-vector multiplications in time and space proportional to the size of the compressed representation, unlike gzip and xz that require the full decompression of the compressed matrix. To our knowledge our lossless compressor is the first one achieving time and space complexities which match the theoretical limit expressed by the k-th order statistical entropy of the input. To achieve further time/space reductions, we propose columnreordering algorithms hinging on a novel column-similarity score. Our experiments on various data sets of ML matrices show that our column reordering can yield a further reduction of up to 16% in the peak memory usage during matrix-vector multiplication. Finally, we compare our proposal against the state-of-the-art Compressed Linear Algebra (CLA) approach showing that ours runs always at least twice faster (in a multi-thread setting), and achieves better compressed space occupancy and peak memory usage. This experimentally confirms the provably effective theoretical bounds we show for our compressed-matrix approach.
AB - As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless compression scheme for real-valued matrices which achieves efficient performance in terms of compression ratio and time for linear-algebra operations. Experiments show that, as a compressor, our tool is clearly superior to gzip and it is usually within 20% of xz in terms of compression ratio. In addition, our compressed format supports matrix-vector multiplications in time and space proportional to the size of the compressed representation, unlike gzip and xz that require the full decompression of the compressed matrix. To our knowledge our lossless compressor is the first one achieving time and space complexities which match the theoretical limit expressed by the k-th order statistical entropy of the input. To achieve further time/space reductions, we propose columnreordering algorithms hinging on a novel column-similarity score. Our experiments on various data sets of ML matrices show that our column reordering can yield a further reduction of up to 16% in the peak memory usage during matrix-vector multiplication. Finally, we compare our proposal against the state-of-the-art Compressed Linear Algebra (CLA) approach showing that ours runs always at least twice faster (in a multi-thread setting), and achieves better compressed space occupancy and peak memory usage. This experimentally confirms the provably effective theoretical bounds we show for our compressed-matrix approach.
UR - http://www.scopus.com/inward/record.url?scp=85137979194&partnerID=8YFLogxK
U2 - 10.14778/3547305.3547321
DO - 10.14778/3547305.3547321
M3 - Conference article
AN - SCOPUS:85137979194
SN - 2150-8097
VL - 15
SP - 2175
EP - 2187
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 10
T2 - 48th International Conference on Very Large Data Bases, VLDB 2022
Y2 - 5 September 2022 through 9 September 2022
ER -