Loading…

WMSA: a novel method for multiple sequence alignment of DNA sequences

Multiple sequence alignment (MSA) is a fundamental problem in bioinformatics. The quality of alignment will affect downstream analysis. MAFFT has adopted the Fast Fourier Transform method for searching the homologous segments and using them as anchors to divide the sequences, then making alignment o...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2022-11, Vol.38 (22), p.5019-5025
Main Authors: Wei, Yanming, Zou, Quan, Tang, Furong, Yu, Liang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Multiple sequence alignment (MSA) is a fundamental problem in bioinformatics. The quality of alignment will affect downstream analysis. MAFFT has adopted the Fast Fourier Transform method for searching the homologous segments and using them as anchors to divide the sequences, then making alignment only on segments, which can save time and memory without overly reducing the sequence alignment quality. MAFFT becomes slow when the dataset is large. We made a software, WMSA, which uses the divide-and-conquer method to split the sequences into clusters, aligns those clusters into profiles with the center star strategy and then makes a progressive profile-profile alignment. The alignment is conducted by the compiled algorithms of MAFFT, K-Band with multithread parallelism. Our method can balance time, space and quality and performs better than MAFFT in test experiments on highly conserved datasets. Source code is freely available at https://github.com/malabz/WMSA/, which is implemented in C/C++ and supported on Linux, and datasets are available at https://github.com/malabz/WMSA-dataset. Supplementary data are available at Bioinformatics online.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btac658