Loading…

Comparative genomics with succinct colored de Bruijn graphs

DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the comple...

Full description

Saved in:
Bibliographic Details
Published in:Acta informatica 2025-03, Vol.62 (1), p.1, Article 1
Main Authors: Ramos, Lucas P., Louza, Felipe A., Telles, Guilherme P.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:DNA technologies have evolved significantly in the past years enabling the sequencing of a large number of genomes in a short time. Nevertheless, the underlying problem of assembling sequence fragments is computationally hard and many technical factors and limitations complicate obtaining the complete sequence of a genome. Many genomes are left in a draft state, in which each chromosome is represented by a set of sequences with partial information on their relative order. Recently, some approaches have been proposed to compare draft genomes by comparing paths in de Bruijn graphs, which are constructed by many practical genome assemblers. In this article we describe in more detail a method for comparing genomes represented as succinct colored de Bruijn graphs directly and without resorting to sequence alignments, called gcBB , that evaluates the entropy and expectation measures based on the Burrows-Wheeler Similarity Distribution. We also introduce an improved version of gcBB , called multi - gcBB , that improves the time and space performance considerably through the selection of different data structures. We have compared phylogenies of 12 Drosophila species obtained by other methods to those obtained with gcBB , achieving promising results.
ISSN:0001-5903
1432-0525
DOI:10.1007/s00236-024-00467-7