Loading…

Another lesson from unmapped reads: in-depth analysis of RNA-Seq reads from various horse tissues

In recent years, a vast amount of sequencing data has been generated and large improvements have been made to reference genome sequences. Despite these advances, significant portions of reads still do not map to reference genomes and these reads have been considered as junk or artificial sequences....

Full description

Saved in:
Bibliographic Details
Published in:Journal of applied genetics 2022-09, Vol.63 (3), p.571-581
Main Authors: Gurgul, Artur, Szmatoła, Tomasz, Ocłoń, Ewa, Jasielczuk, Igor, Semik-Gurgul, Ewelina, Finno, Carrie J., Petersen, Jessica L., Bellone, Rebecca, Hales, Erin N., Ząbek, Tomasz, Arent, Zbigniew, Kotula-Balak, Małgorzata, Bugno-Poniewierska, Monika
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, a vast amount of sequencing data has been generated and large improvements have been made to reference genome sequences. Despite these advances, significant portions of reads still do not map to reference genomes and these reads have been considered as junk or artificial sequences. Recent studies have shown that these reads can be useful, e.g., for refining reference genomes or detecting contaminating microorganisms present in the analyzed biological samples. A special case of this is RNA sequencing (RNA-Seq) reads that come from tissue transcriptomes. Unmapped reads from RNA-Seq have received much less attention than those from whole-genome sequencing. In particular, in the horse, an analysis of unmapped RNA reads has not been performed yet. Thus, in this study, we analyzed the unmapped reads originating from the RNA-Seq performed through the Functional Annotation of Animal Genomes (FAANG) project in the horse, using eight different tissues from two mares. We demonstrated that unmapped reads from RNA-Seq could be easily assembled into transcripts relating to many important genes present in the sequences of other mammals. Large portions of these transcripts did not have coding potential and, thus, can be considered as non-coding RNA. Moreover, reads that were not mapped to the reference genome but aligned to the entries in NCBI database of horse proteins were enriched for biological processes that largely correspond to the functions of organ from which RNA was isolated and thus are presumably true transcripts of genes associated with cell metabolism in those tissues. In addition, a portion of reads aligned to the common pathogenic or neutral microbiota, of which the most common was Brucella spp. These data suggest that unmapped reads can be an important target for in-depth analysis that may substantially enrich results of initial RNA-Seq experiments for various tissues and organs.
ISSN:1234-1983
2190-3883
DOI:10.1007/s13353-022-00705-z