Loading…

Telomere-to-Telomere assembly and annotation of Prunus salicina 'Fengtangli'

We generated a total of 26.84 Gb (~100× coverage, reads N50 of 17,471 bp) sequence coverage of raw PacBio high-fidelity long reads (HiFi) data and 30 Gb (∼120× coverage) of chromosome conformation capture sequencing (Hi-C) data for assembling the Prunus salicina ‘Fengtangli’ genome. For the initial...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Xuanwen, Huang, Siyang, Su, Ying
Format: Dataset
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We generated a total of 26.84 Gb (~100× coverage, reads N50 of 17,471 bp) sequence coverage of raw PacBio high-fidelity long reads (HiFi) data and 30 Gb (∼120× coverage) of chromosome conformation capture sequencing (Hi-C) data for assembling the Prunus salicina ‘Fengtangli’ genome. For the initial assembly, the N50 value of contig-level for haplotype 1 and haplotype 2 were 20,111,805 bp and 19,658,610 bp, respectively, about 14 times that of the ‘Sanyueli’ contig-level assembly, with the largest contig reaching a length of 52,178,471bp. After anchoring and ordering, the scaffold N50 sizes reached 28.08 Mb and 28.37 Mb.24 of the expected 32 telomeres (8 chromosomes of two haplotypes) were identified, and 11 and 13 telomeres were found in PS_T2T_hap1 and PS_T2T_hap2. For gene annotation, 28,775 and 28,139 protein-coding genes were predicted for two haplotypes. BUSCOs assessment using the longest transcribed proteins revealed that two haplotypes captured 96.6% and 94.0% of a BUSCOs reference gene set, respectively. An Extensive de novo TE Annotator (EDTA) was used to generate a high-quality repetitive sequence library, and identified 119,630,196 bp and 121,705,710 bp of repetitive sequences in the two haplotypes, accounting for 45.41% and 46.19% of PS_T2T_hap1 and PS_T2T_hap2, respectively.
ISSN:2052-7276
2052-7276
DOI:10.5281/zenodo.10570998