Design and analysis of periodic multiple seeds

Lavinia Egidi, Giovanni Manzini

Risultato della ricerca: Contributo su rivistaArticolo in rivistapeer review

Abstract

A wide class of approximate pattern matching algorithms are based on a filtration phase in which spaced seeds are used to discard regions where a match is not likely to occur. The problem of determining the "optimal" shape of a spaced seed in a particular setting is known to be a hard one: in practice spaced seeds are chosen using heuristics or considering a restricted family of seeds with "reasonably good" performances. In this paper we consider the family of spaced seeds with a periodic structure. Such seeds have been already proven valuable both as a theoretical tool and in bioinformatics applications. We show that known combinatorial objects, namely Difference Sets and Families and Steiner Systems, naturally lead to the design of lossless periodic seeds for approximate pattern matching with k=2 and k=3 mismatches. We analyze in depth the properties of the resulting seeds obtaining insights also on seeds without a periodic structure. The results of the analysis are then used to guide an experimental evaluation of the effectiveness of periodic seeds for pattern lengths of practical interest. Our results give a complete picture of strengths and limitations of periodic seeds, and can be used by practitioners for the design of effective approximate pattern matching algorithms.

Lingua originaleInglese
pagine (da-a)62-76
Numero di pagine15
RivistaTheoretical Computer Science
Volume522
DOI
Stato di pubblicazionePubblicato - 20 feb 2014

Fingerprint

Entra nei temi di ricerca di 'Design and analysis of periodic multiple seeds'. Insieme formano una fingerprint unica.

Cita questo