Skip to main navigation Skip to search Skip to main content

Multiple seeds sensitivity using a single seed with threshold

Research output: Contribution to journalArticlepeer-review

Abstract

Spaced seeds are a fundamental tool for similarity search in biosequences. The best sensitivity/selectivity trade-offs are obtained using many seeds simultaneously: This is known as the multiple seed approach. Unfortunately, spaced seeds use a large amount of memory and the available RAM is a practical limit to the number of seeds one can use simultaneously. Inspired by some recent results on lossless seeds, we revisit the approach of using a single spaced seed and considering two regions homologous if the seed hits in at least t sufficiently close positions. We show that by choosing the locations of the don't care symbols in the seed using quadratic residues modulo a prime number, we derive single seeds that when used with a threshold t > 1 have competitive sensitivity/selectivity trade-offs, indeed close to the best multiple seeds known in the literature. In addition, the choice of the threshold t can be adjusted to modify sensitivity and selectivity a posteriori, thus enabling a more accurate search in the specific instance at issue. The seeds we propose also exhibit robustness and allow flexibility in usage.
Original languageEnglish
Number of pages15
JournalJournal of Bioinformatics and Computational Biology
Volume13
Issue number4
DOIs
Publication statusPublished - 2015

Keywords

  • Homology search
  • sensitivity/selectivity tradeoffs
  • spaced seeds

Fingerprint

Dive into the research topics of 'Multiple seeds sensitivity using a single seed with threshold'. Together they form a unique fingerprint.

Cite this