Loading…

Feature selection using logistic regression in case–control DNA methylation data of Parkinson's disease: A comparative study

•Feature selection from DNA Methylation data for Parkinson's disease.•Feature reduction using logistic regression and random forest.•Prediction of disease condition using classifier based on identified features.•Uniquely identified features using logistic regression were involved to PD. Parkins...

Full description

Saved in:
Bibliographic Details
Published in:Journal of theoretical biology 2018-11, Vol.457, p.14-18
Main Authors: Kakade, Aishwarya, Kumari, Baby, Dholaniya, Pankaj Singh
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Feature selection from DNA Methylation data for Parkinson's disease.•Feature reduction using logistic regression and random forest.•Prediction of disease condition using classifier based on identified features.•Uniquely identified features using logistic regression were involved to PD. Parkinson's disease (PD) is described as a progressive neurological disorder caused by the degeneration of dopaminergic neurons in substantia nigra pars compacta. The pathogenesis of the disease is not fully understood but it has been linked with complex genetic, epigenetic and environmental interactions. A substantial number of studies have shown the role of epigenetic modifications in support of the progression of PD. In the present study, we have analyzed the data containing methylation patterns of 1726 transcripts captured over from 66 samples of 450k, which includes 43 controls and 23 diseased samples. We used Logistic Regression (LR) for feature reduction and build a classifier with an improved accuracy rate than all features together. The performance of the classifier was compared with other feature reduction approaches viz. Random Forest (RF) and Principal Component Analysis (PCA). Feature reduction with LR and RF performed better than PCA. Some of the features corresponding to the genes such as COMT, DCTN1 and PRNP were uniquely identified by LR and are reported to play a significant role in PD.
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2018.08.018