Loading…

MEC: Misassembly Error Correction in Contigs based on Distribution of Paired-End Reads and Statistics of GC-contents

The de novo assembly tools aim at reconstructing genomes from next-generation sequencing (NGS) data. However, the assembly tools usually generate a large amount of contigs containing many misassemblies, which are caused by problems of repetitive regions, chimeric reads, and sequencing errors. As the...

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM transactions on computational biology and bioinformatics 2020-05, Vol.17 (3), p.847-857
Main Authors: Wu, Binbin, Li, Min, Liao, Xingyu, Luo, Junwei, Wu, Fang-Xiang, Pan, Yi, Wang, Jianxin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The de novo assembly tools aim at reconstructing genomes from next-generation sequencing (NGS) data. However, the assembly tools usually generate a large amount of contigs containing many misassemblies, which are caused by problems of repetitive regions, chimeric reads, and sequencing errors. As they can improve the accuracy of assembly results, detecting and correcting the misassemblies in contigs are appealing, yet challenging. In this study, a novel method, called MEC, is proposed to identify and correct misassemblies in contigs. Based on the insert size distribution of paired-end reads and the statistical analysis of GC-contents, MEC can identify more misassemblies accurately. We evaluate our MEC with the metrics (NA50, NGA50) on four datasets, compared it with the most available misassembly correction tools, and carry out experiments to analyze the influence of MEC on scaffolding results, which shows that MEC can reduce misassemblies effectively and result in quantitative improvements in scaffolding quality. MEC is publicly available at https://github.com/bioinfomaticsCSU/MEC .
ISSN:1545-5963
1557-9964
DOI:10.1109/TCBB.2018.2876855