TY - GEN
T1 - Learning complex and sparse events in long sequences
AU - Botta, Marco
AU - Galassi, Ugo
AU - Giordana, Attilio
PY - 2004
Y1 - 2004
N2 - The Hierarchical Hidden Markov Model (HHMM) is a well formalized tool suitable to model complex patterns in long temporal or spatial sequences. Even if effective algorithms are available to estimate HHMM parameters from sequences, little has been done in order to automatize the construction of the model architecture. The primary focus of this paper is on a multi-strategy algorithm for inferring the HHMM structure from a set of sequences, where the events to capture are present in a relevant portion of them. The algorithm follows a bottom-up strategy, in which elementary facts in the sequences are progressively grouped, thus building the abstraction hierarchy of a HHMM, layer after layer. In this process, clustering algorithms and sequence alignment algorithms, widely used in domains like molecular biology, are exploited. The induction strategy has been designed in order to deal with events characterized by a sparse structure, where gaps filled by irrelevant facts can be intermixed with the relevant ones. Irrelevant facts are modeled by "gaps", i.e., HMMs of the noise. Gaps are hypothesized when there is no significant statistical evidence for hypothesizing the existence of a specific episode. Moreover, gaps can be replaced in a second time by a episode model, after new facts have been acquired. The method is extensively evaluated on artificial datasets.
AB - The Hierarchical Hidden Markov Model (HHMM) is a well formalized tool suitable to model complex patterns in long temporal or spatial sequences. Even if effective algorithms are available to estimate HHMM parameters from sequences, little has been done in order to automatize the construction of the model architecture. The primary focus of this paper is on a multi-strategy algorithm for inferring the HHMM structure from a set of sequences, where the events to capture are present in a relevant portion of them. The algorithm follows a bottom-up strategy, in which elementary facts in the sequences are progressively grouped, thus building the abstraction hierarchy of a HHMM, layer after layer. In this process, clustering algorithms and sequence alignment algorithms, widely used in domains like molecular biology, are exploited. The induction strategy has been designed in order to deal with events characterized by a sparse structure, where gaps filled by irrelevant facts can be intermixed with the relevant ones. Irrelevant facts are modeled by "gaps", i.e., HMMs of the noise. Gaps are hypothesized when there is no significant statistical evidence for hypothesizing the existence of a specific episode. Moreover, gaps can be replaced in a second time by a episode model, after new facts have been acquired. The method is extensively evaluated on artificial datasets.
UR - http://www.scopus.com/inward/record.url?scp=85017384302&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85017384302
T3 - Frontiers in Artificial Intelligence and Applications
SP - 425
EP - 429
BT - ECAI 2004 - 16th European Conference on Artificial Intelligence, including Prestigious Applications of Intelligent Systems, PAIS 2004 - Proceedings
A2 - de Mantaras, Ramon Lopez
A2 - Saitta, Lorenza
PB - IOS Press BV
T2 - 16th European Conference on Artificial Intelligence, ECAI 2004
Y2 - 22 August 2004 through 27 August 2004
ER -