Loading…

Identification of common and dissimilar biomarkers for different cancer types from gene expressions of RNA-sequencing data

PAN-Cancer research aims at characterizing the common genetic alterations across cancer types to identify the set of similar and different biomarkers for many cancer types. Analyzing RNA-Sequencing data could assist in developing predictive models for cancer progression. As clinical data is prone to...

Full description

Saved in:
Bibliographic Details
Published in:Gene reports 2020-06, Vol.19, p.100654, Article 100654
Main Authors: Venkataramana, Lokeswari, Jacob, Shomona Gracia, Saraswathi, S., Venkata Vara Prasad, D.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:PAN-Cancer research aims at characterizing the common genetic alterations across cancer types to identify the set of similar and different biomarkers for many cancer types. Analyzing RNA-Sequencing data could assist in developing predictive models for cancer progression. As clinical data is prone to grow exponentially, applying computational methods on such data is very complex. Parallel computational methods were exploited to weed out the problem of high computational complexity when applied on large clinical data. Parallelized Decremental Feature Selection (DFS) method was introduced to select pre-dominant genes from gene expressions of RNA-sequencing data. These selected genes were evaluated using parallelized classification models and tested using hold-out and 10-fold cross validation methods. A computational study was performed on five cancer types namely PRAD (Prostate Adenocarcinoma), LUAD (Lung Cancer), BRCA (Breast Cancer), KIRC (Kidney Cancer) and COAD (Colon Cancer). These five cancer samples were segregated into five separate datasets and DFS was performed to unearth the common genes that play a major role in heterogeneous cancer types. Consequently the genes were investigated to identify the inter-cancer similarities and differences. The parallel classification methods yielded the classification accuracy of 93% to 98%. The current research on investigating cancer lies in gene expressions of RNA- sequences. As RNA-Sequencing data has many features compared to number of instances, vertical partitioning and parallel Decremental Feature Selection method was employed on gene expressions of RNA-Sequencing for the first time to select more relevant genes (features) and classify cancer types accurately. The results prove to turn out higher predication accuracy with very less number of predictive genes.
ISSN:2452-0144
2452-0144
DOI:10.1016/j.genrep.2020.100654