Loading…

Reconstruction of Microbial Haplotypes by Integration of Statistical and Physical Linkage in Scaffolding

Abstract DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or “haplotypes.” However, current next-gene...

Full description

Saved in:
Bibliographic Details
Published in:Molecular biology and evolution 2021-05, Vol.38 (6), p.2660-2672
Main Authors: Cao, Chen, He, Jingni, Mak, Lauren, Perera, Deshan, Kwok, Devin, Wang, Jia, Li, Minghao, Mourier, Tobias, Gavriliuc, Stefan, Greenberg, Matthew, Morrissy, A Sorana, Sycuro, Laura K, Yang, Guang, Jeffares, Daniel C, Long, Quan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract DNA sequencing technologies provide unprecedented opportunities to analyze within-host evolution of microorganism populations. Often, within-host populations are analyzed via pooled sequencing of the population, which contains multiple individuals or “haplotypes.” However, current next-generation sequencing instruments, in conjunction with single-molecule barcoded linked-reads, cannot distinguish long haplotypes directly. Computational reconstruction of haplotypes from pooled sequencing has been attempted in virology, bacterial genomics, metagenomics, and human genetics, using algorithms based on either cross-host genetic sharing or within-host genomic reads. Here, we describe PoolHapX, a flexible computational approach that integrates information from both genetic sharing and genomic sequencing. We demonstrated that PoolHapX outperforms state-of-the-art tools tailored to specific organismal systems, and is robust to within-host evolution. Importantly, together with barcoded linked-reads, PoolHapX can infer whole-chromosome-scale haplotypes from 50 pools each containing 12 different haplotypes. By analyzing real data, we uncovered dynamic variations in the evolutionary processes of within-patient HIV populations previously unobserved in single position-based analysis.
ISSN:1537-1719
0737-4038
1537-1719
DOI:10.1093/molbev/msab037