Loading…

Gene Expression-Based Supervised Classification Models for Discriminating Early- and Late-Stage Prostate Cancer

Prostate cancer is one of the prominent types of cancer affecting the human male population throughout the world. Detecting cancer in the early-stage is a crucial factor in the effective treatment of the disease. Machine learning is a type of algorithm that can learn and predict from a given dataset...

Full description

Saved in:
Bibliographic Details
Published in:National Academy of Sciences, India. Proceedings. Section B. Biological Sciences India. Proceedings. Section B. Biological Sciences, 2020-09, Vol.90 (3), p.541-565
Main Authors: Kumar, Rajesh, Bhanti, Prateek, Marwal, Avinash, Gaur, R. K.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Prostate cancer is one of the prominent types of cancer affecting the human male population throughout the world. Detecting cancer in the early-stage is a crucial factor in the effective treatment of the disease. Machine learning is a type of algorithm that can learn and predict from a given dataset without being manually programmed. Machine learning can be useful with gene expression data to discriminate cancer stage rather than relying on histology of tissue and various other diagnostic methods used in prostate cancer detection. In this study, the authors have developed a supervised classifier for detecting early- and late-stage prostate cancer using RNA sequencing-based gene expression data collected from The Cancer Genome Atlas. Supervised learning algorithms Naive Bayes, stochastic gradient descent, J48, and Random Forest, Multilayer Perceptron were employed with 276 most informative subset of features extracted from gene expression data. Accuracies of these developed models were evaluated after tenfold cross-validation. Among all, the trained classifiers stochastic gradient descent-based classifier performed best with accuracy 86.91%, sensitivity 86.9% and area under receiver operating curve 0.656. Gene Ontology and KEGG pathway enrichment analysis of these 276 gene features were also performed to functionally categorize these genes.
ISSN:0369-8211
2250-1746
DOI:10.1007/s40011-019-01127-4