Loading…

Abstract 2585: viGEN: An open source bioinformatics pipeline for viral RNA detection and quantification in human tumor samples

We present a novel pipeline for viral RNA detection and quantification in human RNA-seq data. Our pipeline has been tested on the TCGA liver cancer cohort, can not only detect the presence of a viral species, but also provide gene level read counts for individual viral species and extract viral-vari...

Full description

Saved in:
Bibliographic Details
Published in:Cancer research (Chicago, Ill.) Ill.), 2017-07, Vol.77 (13_Supplement), p.2585-2585
Main Authors: Bhuvaneshwar, Krithika, Song, Lei, Gusev, Yuriy
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a novel pipeline for viral RNA detection and quantification in human RNA-seq data. Our pipeline has been tested on the TCGA liver cancer cohort, can not only detect the presence of a viral species, but also provide gene level read counts for individual viral species and extract viral-variants. Introduction Approximately 20% of human cancer types are associated with viral infection that is routinely detected in blood samples. However the extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using existing Human RNA-seq data from tumor samples. We have developed a bioinformatics pipeline viGEN combining existing and novel RNAseq tools that allows for detection and quantification of viral RNA in human RNAseq data. Methods The pipeline includes 4 major modules: The first module allows to align and filter out human RNA sequences; second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module calculates quantitate read counts at the individual viral genes level thus allowing for downstream differential expression analysis of viral genes between experimental and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge there are no publicly available pipelines or packages that would provide this type of complete analysis in one package. Customized solutions have been reported in the literature however were not made public. Results We used this pipeline to examine viruses present in RNA-seq data from 75 liver cancer patients in the TCGA data collection. Our pipeline allows conducting quantitative analysis at the gene level for visualization and detection of statistically significant differentially expressed viral genes between groups of patients known to be infected with both HBV and various subtypes of HCV. Once the viral genomes are detected at the genome level, we examine the differences between “Dead” and “Alive” samples at the viral-transcript level, and at the viral-variant level. Conclusion Our results show that it is possible to detect viral sequences from whole-transcriptome (RNA-seq) data in humans. We were able to not only quantify them at a viral-gene/CDS level, but also extract variants from the 75-sample dataset from TCGA. The results presented here are in correspondence with published literature and are a proof of concept of our pipeline. This pip
ISSN:0008-5472
1538-7445
DOI:10.1158/1538-7445.AM2017-2585