Loading…

How to handle high subgenome sequence similarity in allopolyploid Fragaria x ananassa: linkage disequilibrium based variant filtering

The allo-octoploid Fragaria x ananassa follows disomic inheritance, yet the high sequence similarity among its subgenomes can lead to misalignment of short sequencing reads (150 bp). This misalignment results in an increased number of erroneous variants during variant calling. To accurately associat...

Full description

Saved in:
Bibliographic Details
Published in:BMC genomics 2024-11, Vol.25 (1), p.1150-15, Article 1150
Main Authors: Koorevaar, Tim, Willemsen, Johan H, Hildebrand, Dominic, Visser, Richard G F, Arens, Paul, Maliepaard, Chris
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The allo-octoploid Fragaria x ananassa follows disomic inheritance, yet the high sequence similarity among its subgenomes can lead to misalignment of short sequencing reads (150 bp). This misalignment results in an increased number of erroneous variants during variant calling. To accurately associate traits with the appropriate subgenome, it is essential to filter out these erroneous variants. By classifying variants into correct (type 1) and erroneous types (homoeologous variants-type 2, and multi-locus variants-type 3), we can improve the reliability of downstream analyses. Our analysis reveals that while erroneous variant types often display skewed average allele balances (AAB) for heterozygous calls, this measure alone is insufficient. To mitigate the erroneous variants further, we employed a Linkage Disequilibrium (LD) based filtering method that correlates highly (99%) with an approach that utilizes a genetic map from a biparental population. This combined filtering strategy-using both LD-based and average allele balance methods-resulted in the lowest switch error rate (0.037). Notably, our best filtering approach decreased phasing switch error rates by 44% and preserved 72% of the original dataset. The results indicate that identifying erroneous variants due to subgenome similarity can be effectively achieved without extensive genotyping of mapping populations. By implementing the LD-based filtering method, the phasing accuracy improved which improves the tracability of important alleles in the germplasm, paving the way for better understanding of trait associations in F. x ananassa.
ISSN:1471-2164
1471-2164
DOI:10.1186/s12864-024-10987-8