Loading…

Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study

This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gen...

Full description

Saved in:
Bibliographic Details
Published in:American journal of medical genetics. Part B, Neuropsychiatric genetics Neuropsychiatric genetics, 2021-03, Vol.186 (2), p.101-112
Main Authors: Qi, Bill, Ramamurthy, Janani, Bennani, Imane, Trakadis, Yannis J.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gene expression data from the dorsolateral prefrontal cortex of post‐mortem MDD cases (n = 126) and controls (n = 103). An average area‐under‐the‐receiver‐operating‐characteristics‐curve (AUC) from 10‐fold cross‐validation of 0.72 was noted, compared to an average AUC of 0.55 for a baseline classifier (p = .0048). The classifier achieved an AUC of 0.76 on a previously unused testing‐set. We also performed external validation using DLPFC gene expression values from an independent cohort of matched MDD cases (n = 29) and controls (n = 29), obtained from Affymetrix microarray (vs. Illumina microarray for the original cohort) (AUC: 0.62). We highlighted gene sets differentially expressed in MDD that were enriched for genes identified by the ML algorithm. Next, we assessed the ML classification performance in blood‐based microarray gene expression data from MDD cases (n = 1,581) and controls (n = 369). We observed a mean AUC of 0.64 on 10‐fold cross‐validation, which was significantly above baseline (p = .0020). Similar performance was observed on the testing‐set (AUC: 0.61). Finally, we analyzed the classification performance in covariates subgroups. We identified an interesting interaction between smoking and recall performance in MDD case prediction (58% accurate predictions in cases who are smokers vs. 43% accurate predictions in cases who are non‐smokers). Overall, our results suggest that ML in combination with gene expression data and covariates could further our understanding of the pathophysiology in MDD.
ISSN:1552-4841
1552-485X
DOI:10.1002/ajmg.b.32839