TY - GEN
T1 - An encoding for order-preserving matching
AU - Gagie, Travis
AU - Manzini, Giovanni
AU - Venturini, Rossano
N1 - Funding Information:
∗ This work was partially supported by Academy of Finland grant 268324, Fondecyt grant 1171058, PRIN grant 201534HNXC, INdAM-GNCS Project Efficient algorithms for the analysis of Big Data.
PY - 2017/9/1
Y1 - 2017/9/1
N2 - Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: E.g., 4, 1, 3, 2 and 10, 3, 7, 5 are an order preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size σ and a constant c ≥ 1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m ≤ logcn, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if logσ=ω (log log n); our query time is optimal if logσ= (log n). Our space bound contrasts with the ω (n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(logc n) neighbouring characters.
AB - Encoding data structures store enough information to answer the queries they are meant to support but not enough to recover their underlying datasets. In this paper we give the first encoding data structure for the challenging problem of order-preserving pattern matching. This problem was introduced only a few years ago but has already attracted significant attention because of its applications in data analysis. Two strings are said to be an order-preserving match if the relative order of their characters is the same: E.g., 4, 1, 3, 2 and 10, 3, 7, 5 are an order preserving match. We show how, given a string S[1..n] over an arbitrary alphabet of size σ and a constant c ≥ 1, we can build an O(n log log n)-bit encoding such that later, given a pattern P[1..m] with m ≤ logcn, we can return the number of order-preserving occurrences of P in S in O(m) time. Within the same time bound we can also return the starting position of some order preserving match for P in S (if such a match exists). We prove that our space bound is within a constant factor of optimal if logσ=ω (log log n); our query time is optimal if logσ= (log n). Our space bound contrasts with the ω (n log n) bits needed in the worst case to store S itself, an index for order-preserving pattern matching with no restrictions on the pattern length, or an index for standard pattern matching even with restrictions on the pattern length. Moreover, we can build our encoding knowing only how each character compares to O(logc n) neighbouring characters.
KW - Compact data structures
KW - Encodings
KW - Order-preserving matching
UR - http://www.scopus.com/inward/record.url?scp=85030561024&partnerID=8YFLogxK
U2 - 10.4230/LIPIcs.ESA.2017.38
DO - 10.4230/LIPIcs.ESA.2017.38
M3 - Conference contribution
AN - SCOPUS:85030561024
T3 - Leibniz International Proceedings in Informatics, LIPIcs
BT - 25th European Symposium on Algorithms, ESA 2017
A2 - Sohler, Christian
A2 - Sohler, Christian
A2 - Pruhs, Kirk
PB - Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
T2 - 25th European Symposium on Algorithms, ESA 2017
Y2 - 4 September 2017 through 6 September 2017
ER -