TY - JOUR
T1 - Compressing and indexing labeled trees, with applications
AU - Ferragina, Paolo
AU - Luccio, Fabrizio
AU - Manzini, Giovanni
AU - Muthukrishnan, S.
PY - 2009/11/1
Y1 - 2009/11/1
N2 - Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.
AB - Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.
KW - Burrows-Wheeler transform
KW - Labeled tree compression
KW - Labeled tree indexing
KW - Path searching
KW - Tree navigation
KW - XML compression
KW - XML indexing
UR - http://www.scopus.com/inward/record.url?scp=71449088969&partnerID=8YFLogxK
U2 - 10.1145/1613676.1613680
DO - 10.1145/1613676.1613680
M3 - Article
SN - 0004-5411
VL - 57
JO - Journal of the ACM
JF - Journal of the ACM
IS - 1
M1 - 4
ER -