Loading…
Gene Expression-Based Supervised Classification Models for Discriminating Early- and Late-Stage Prostate Cancer
Prostate cancer is one of the prominent types of cancer affecting the human male population throughout the world. Detecting cancer in the early-stage is a crucial factor in the effective treatment of the disease. Machine learning is a type of algorithm that can learn and predict from a given dataset...
Saved in:
Published in: | National Academy of Sciences, India. Proceedings. Section B. Biological Sciences India. Proceedings. Section B. Biological Sciences, 2020-09, Vol.90 (3), p.541-565 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Prostate cancer is one of the prominent types of cancer affecting the human male population throughout the world. Detecting cancer in the early-stage is a crucial factor in the effective treatment of the disease. Machine learning is a type of algorithm that can learn and predict from a given dataset without being manually programmed. Machine learning can be useful with gene expression data to discriminate cancer stage rather than relying on histology of tissue and various other diagnostic methods used in prostate cancer detection. In this study, the authors have developed a supervised classifier for detecting early- and late-stage prostate cancer using RNA sequencing-based gene expression data collected from The Cancer Genome Atlas. Supervised learning algorithms Naive Bayes, stochastic gradient descent, J48, and Random Forest, Multilayer Perceptron were employed with 276 most informative subset of features extracted from gene expression data. Accuracies of these developed models were evaluated after tenfold cross-validation. Among all, the trained classifiers stochastic gradient descent-based classifier performed best with accuracy 86.91%, sensitivity 86.9% and area under receiver operating curve 0.656. Gene Ontology and KEGG pathway enrichment analysis of these 276 gene features were also performed to functionally categorize these genes. |
---|---|
ISSN: | 0369-8211 2250-1746 |
DOI: | 10.1007/s40011-019-01127-4 |