Loading…
Comparative genomics with succinct colored de Bruijn graphs
DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the comple...
Saved in:
Published in: | Acta informatica 2025-03, Vol.62 (1), p.1, Article 1 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the complete sequence of a genome. Many genomes are left in a draft state, in which each chromosome is represented by a set of sequences with partial information on their relative order. Recently, some approaches have been proposed to compare draft genomes by comparing paths in de Bruijn graphs, which are constructed by many practical genome assemblers. In this article we describe in more detail a method for comparing genomes represented as succinct colored de Bruijn graphs directly and without resorting to sequence alignments, called
gcBB
, that evaluates the entropy and expectation measures based on the Burrows-Wheeler Similarity Distribution. We also introduce an improved version of
gcBB
, called
multi
-
gcBB
, that improves the time and space performance considerably through the selection of different data structures. We have compared phylogenies of 12 Drosophila species obtained by other methods to those obtained with
gcBB
, achieving promising results. |
---|---|
ISSN: | 0001-5903 1432-0525 |
DOI: | 10.1007/s00236-024-00467-7 |