Skip to main navigation Skip to search Skip to main content

Prefix-free parsing for building big BWTs

  • Christina Boucher
  • , Travis Gagie
  • , Alan Kuhnle
  • , Giovanni Manzini

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High-throughput sequencing technologies have led to explosive growth of genomic databases; one of which will soon reach hundreds of terabytes. For many applications we want to build and store indexes of these databases but constructing such indexes is a challenge. Fortunately, many of these genomic databases are highly-repetitive - a characteristic that can be exploited and enable the computation of the Burrows-Wheeler Transform (BWT), which underlies many popular indexes. In this paper, we introduce a preprocessing algorithm, referred to as prefix-free parsing, that takes a text T as input, and in one-pass generates a dictionary D and a parse P of T with the property that the BWT of T can be constructed from D and P using workspace proportional to their total size and O(|T|)-time. Our experiments show that D and P are significantly smaller than T in practice, and thus, can fit in a reasonable internal memory even when T is very large. Therefore, prefix-free parsing eases BWT construction, which is pertinent to many bioinformatics applications.

Original languageEnglish
Title of host publication18th International Workshop on Algorithms in Bioinformatics, WABI 2018
EditorsLaxmi Parida, Esko Ukkonen
PublisherSchloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
ISBN (Print)9783959770828
DOIs
Publication statusPublished - 1 Aug 2018
Externally publishedYes
Event18th International Workshop on Algorithms in Bioinformatics, WABI 2018 - Helsinki, Finland
Duration: 20 Aug 201822 Aug 2018

Publication series

NameLeibniz International Proceedings in Informatics, LIPIcs
Volume113
ISSN (Print)1868-8969

Conference

Conference18th International Workshop on Algorithms in Bioinformatics, WABI 2018
Country/TerritoryFinland
CityHelsinki
Period20/08/1822/08/18

Keywords

  • Burrows-Wheeler Transform
  • Compression-aware algorithms
  • Genomic databases
  • Prefix-free parsing

Fingerprint

Dive into the research topics of 'Prefix-free parsing for building big BWTs'. Together they form a unique fingerprint.

Cite this