Loading…

HiPhase: jointly phasing small, structural, and tandem repeat variants from HiFi sequencing

Abstract Motivation In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also exce...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics (Oxford, England) England), 2024-02, Vol.40 (2)
Main Authors: Holt, James M, Saunders, Christopher T, Rowell, William J, Kronenberg, Zev, Wenger, Aaron M, Eberle, Michael
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Motivation In diploid organisms, phasing is the problem of assigning the alleles at heterozygous variants to one of two haplotypes. Reads from PacBio HiFi sequencing provide long, accurate observations that can be used as the basis for both calling and phasing variants. HiFi reads also excel at calling larger classes of variation, such as structural or tandem repeat variants. However, current phasing tools typically only phase small variants, leaving larger variants unphased. Results We developed HiPhase, a tool that jointly phases SNVs, indels, structural, and tandem repeat variants. The main benefits of HiPhase are (i) dual mode allele assignment for detecting large variants, (ii) a novel application of the A*-algorithm to phasing, and (iii) logic allowing phase blocks to span breaks caused by alignment issues around reference gaps and homozygous deletions. In our assessment, HiPhase produced an average phase block NG50 of 480 kb with 929 switchflip errors and fully phased 93.8% of genes, improving over the current state of the art. Additionally, HiPhase jointly phases SNVs, indels, structural, and tandem repeat variants and includes innate multi-threading, statistics gathering, and concurrent phased alignment output generation. Availability and implementation HiPhase is available as source code and a pre-compiled Linux binary with a user guide at https://github.com/PacificBiosciences/HiPhase.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btae042