Loading…

Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study

This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gen...

Full description

Saved in:
Bibliographic Details
Published in:American journal of medical genetics. Part B, Neuropsychiatric genetics Neuropsychiatric genetics, 2021-03, Vol.186 (2), p.101-112
Main Authors: Qi, Bill, Ramamurthy, Janani, Bennani, Imane, Trakadis, Yannis J.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823
cites cdi_FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823
container_end_page 112
container_issue 2
container_start_page 101
container_title American journal of medical genetics. Part B, Neuropsychiatric genetics
container_volume 186
creator Qi, Bill
Ramamurthy, Janani
Bennani, Imane
Trakadis, Yannis J.
description This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gene expression data from the dorsolateral prefrontal cortex of post‐mortem MDD cases (n = 126) and controls (n = 103). An average area‐under‐the‐receiver‐operating‐characteristics‐curve (AUC) from 10‐fold cross‐validation of 0.72 was noted, compared to an average AUC of 0.55 for a baseline classifier (p = .0048). The classifier achieved an AUC of 0.76 on a previously unused testing‐set. We also performed external validation using DLPFC gene expression values from an independent cohort of matched MDD cases (n = 29) and controls (n = 29), obtained from Affymetrix microarray (vs. Illumina microarray for the original cohort) (AUC: 0.62). We highlighted gene sets differentially expressed in MDD that were enriched for genes identified by the ML algorithm. Next, we assessed the ML classification performance in blood‐based microarray gene expression data from MDD cases (n = 1,581) and controls (n = 369). We observed a mean AUC of 0.64 on 10‐fold cross‐validation, which was significantly above baseline (p = .0020). Similar performance was observed on the testing‐set (AUC: 0.61). Finally, we analyzed the classification performance in covariates subgroups. We identified an interesting interaction between smoking and recall performance in MDD case prediction (58% accurate predictions in cases who are smokers vs. 43% accurate predictions in cases who are non‐smokers). Overall, our results suggest that ML in combination with gene expression data and covariates could further our understanding of the pathophysiology in MDD.
doi_str_mv 10.1002/ajmg.b.32839
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2494878234</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2494878234</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823</originalsourceid><addsrcrecordid>eNp90btKBDEUBuAgivfOWgI2Fu6aTBJnYrcuXvEComAXchvNMjNZkx1lO30G39AnMe6sFhZWCcnHz-H8AGxh1McIZftyVD_2VZ9kBeELYBUzlvVowR4Wf-8Ur4C1GEcIEcTyfBmsEHJAGUfFKni_kvrJNRZWVobGNY9QNgYq511T-lDLidPpRVbT6CL0JVRBuqYzlfcG1rfXAzgOvnSVjTB91XLkAzR2HGyM7sVC46IPxoZDOIBaRvv59qF9Mwm-gnHSmukGWCplFe3m_FwH9yfHd8Oz3uXN6flwcNnTaVjeK1GOFckZUVwjjAhlCvO8pIgoUpCcI8kIN9YqjQpMMGWaIcMokSXCxUGRkXWw2-WmaZ9bGyeidlHbqpKN9W0UGeW0yBOkie78oSPfhrSFmeIZZSk_qb1O6eBjDLYU4-BqGaYCI_FdjfiuRigxqybx7Xloq2prfvFPFwnQDrymXU7_DRODi6vToy73C3xkmzs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2499245145</pqid></control><display><type>article</type><title>Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Qi, Bill ; Ramamurthy, Janani ; Bennani, Imane ; Trakadis, Yannis J.</creator><creatorcontrib>Qi, Bill ; Ramamurthy, Janani ; Bennani, Imane ; Trakadis, Yannis J.</creatorcontrib><description>This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gene expression data from the dorsolateral prefrontal cortex of post‐mortem MDD cases (n = 126) and controls (n = 103). An average area‐under‐the‐receiver‐operating‐characteristics‐curve (AUC) from 10‐fold cross‐validation of 0.72 was noted, compared to an average AUC of 0.55 for a baseline classifier (p = .0048). The classifier achieved an AUC of 0.76 on a previously unused testing‐set. We also performed external validation using DLPFC gene expression values from an independent cohort of matched MDD cases (n = 29) and controls (n = 29), obtained from Affymetrix microarray (vs. Illumina microarray for the original cohort) (AUC: 0.62). We highlighted gene sets differentially expressed in MDD that were enriched for genes identified by the ML algorithm. Next, we assessed the ML classification performance in blood‐based microarray gene expression data from MDD cases (n = 1,581) and controls (n = 369). We observed a mean AUC of 0.64 on 10‐fold cross‐validation, which was significantly above baseline (p = .0020). Similar performance was observed on the testing‐set (AUC: 0.61). Finally, we analyzed the classification performance in covariates subgroups. We identified an interesting interaction between smoking and recall performance in MDD case prediction (58% accurate predictions in cases who are smokers vs. 43% accurate predictions in cases who are non‐smokers). Overall, our results suggest that ML in combination with gene expression data and covariates could further our understanding of the pathophysiology in MDD.</description><identifier>ISSN: 1552-4841</identifier><identifier>EISSN: 1552-485X</identifier><identifier>DOI: 10.1002/ajmg.b.32839</identifier><identifier>PMID: 33645908</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley &amp; Sons, Inc</publisher><subject>bioinformatics ; DNA microarrays ; Gene expression ; Genetics ; Learning algorithms ; Machine learning ; major depression ; Mental depression ; Predictions ; Prefrontal cortex ; transcriptomics</subject><ispartof>American journal of medical genetics. Part B, Neuropsychiatric genetics, 2021-03, Vol.186 (2), p.101-112</ispartof><rights>2021 Wiley Periodicals LLC.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823</citedby><cites>FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823</cites><orcidid>0000-0002-8740-4416</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/33645908$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Qi, Bill</creatorcontrib><creatorcontrib>Ramamurthy, Janani</creatorcontrib><creatorcontrib>Bennani, Imane</creatorcontrib><creatorcontrib>Trakadis, Yannis J.</creatorcontrib><title>Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study</title><title>American journal of medical genetics. Part B, Neuropsychiatric genetics</title><addtitle>Am J Med Genet B Neuropsychiatr Genet</addtitle><description>This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gene expression data from the dorsolateral prefrontal cortex of post‐mortem MDD cases (n = 126) and controls (n = 103). An average area‐under‐the‐receiver‐operating‐characteristics‐curve (AUC) from 10‐fold cross‐validation of 0.72 was noted, compared to an average AUC of 0.55 for a baseline classifier (p = .0048). The classifier achieved an AUC of 0.76 on a previously unused testing‐set. We also performed external validation using DLPFC gene expression values from an independent cohort of matched MDD cases (n = 29) and controls (n = 29), obtained from Affymetrix microarray (vs. Illumina microarray for the original cohort) (AUC: 0.62). We highlighted gene sets differentially expressed in MDD that were enriched for genes identified by the ML algorithm. Next, we assessed the ML classification performance in blood‐based microarray gene expression data from MDD cases (n = 1,581) and controls (n = 369). We observed a mean AUC of 0.64 on 10‐fold cross‐validation, which was significantly above baseline (p = .0020). Similar performance was observed on the testing‐set (AUC: 0.61). Finally, we analyzed the classification performance in covariates subgroups. We identified an interesting interaction between smoking and recall performance in MDD case prediction (58% accurate predictions in cases who are smokers vs. 43% accurate predictions in cases who are non‐smokers). Overall, our results suggest that ML in combination with gene expression data and covariates could further our understanding of the pathophysiology in MDD.</description><subject>bioinformatics</subject><subject>DNA microarrays</subject><subject>Gene expression</subject><subject>Genetics</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>major depression</subject><subject>Mental depression</subject><subject>Predictions</subject><subject>Prefrontal cortex</subject><subject>transcriptomics</subject><issn>1552-4841</issn><issn>1552-485X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp90btKBDEUBuAgivfOWgI2Fu6aTBJnYrcuXvEComAXchvNMjNZkx1lO30G39AnMe6sFhZWCcnHz-H8AGxh1McIZftyVD_2VZ9kBeELYBUzlvVowR4Wf-8Ur4C1GEcIEcTyfBmsEHJAGUfFKni_kvrJNRZWVobGNY9QNgYq511T-lDLidPpRVbT6CL0JVRBuqYzlfcG1rfXAzgOvnSVjTB91XLkAzR2HGyM7sVC46IPxoZDOIBaRvv59qF9Mwm-gnHSmukGWCplFe3m_FwH9yfHd8Oz3uXN6flwcNnTaVjeK1GOFckZUVwjjAhlCvO8pIgoUpCcI8kIN9YqjQpMMGWaIcMokSXCxUGRkXWw2-WmaZ9bGyeidlHbqpKN9W0UGeW0yBOkie78oSPfhrSFmeIZZSk_qb1O6eBjDLYU4-BqGaYCI_FdjfiuRigxqybx7Xloq2prfvFPFwnQDrymXU7_DRODi6vToy73C3xkmzs</recordid><startdate>202103</startdate><enddate>202103</enddate><creator>Qi, Bill</creator><creator>Ramamurthy, Janani</creator><creator>Bennani, Imane</creator><creator>Trakadis, Yannis J.</creator><general>John Wiley &amp; Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TK</scope><scope>8FD</scope><scope>FR3</scope><scope>K9.</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8740-4416</orcidid></search><sort><creationdate>202103</creationdate><title>Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study</title><author>Qi, Bill ; Ramamurthy, Janani ; Bennani, Imane ; Trakadis, Yannis J.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>bioinformatics</topic><topic>DNA microarrays</topic><topic>Gene expression</topic><topic>Genetics</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>major depression</topic><topic>Mental depression</topic><topic>Predictions</topic><topic>Prefrontal cortex</topic><topic>transcriptomics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qi, Bill</creatorcontrib><creatorcontrib>Ramamurthy, Janani</creatorcontrib><creatorcontrib>Bennani, Imane</creatorcontrib><creatorcontrib>Trakadis, Yannis J.</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>Neurosciences Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>American journal of medical genetics. Part B, Neuropsychiatric genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qi, Bill</au><au>Ramamurthy, Janani</au><au>Bennani, Imane</au><au>Trakadis, Yannis J.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study</atitle><jtitle>American journal of medical genetics. Part B, Neuropsychiatric genetics</jtitle><addtitle>Am J Med Genet B Neuropsychiatr Genet</addtitle><date>2021-03</date><risdate>2021</risdate><volume>186</volume><issue>2</issue><spage>101</spage><epage>112</epage><pages>101-112</pages><issn>1552-4841</issn><eissn>1552-485X</eissn><abstract>This study analyzed gene expression messenger RNA data, from cases with major depressive disorder (MDD) and controls, using supervised machine learning (ML). We built on the methodology of prior studies to obtain more generalizable/reproducible results. First, we obtained a classifier trained on gene expression data from the dorsolateral prefrontal cortex of post‐mortem MDD cases (n = 126) and controls (n = 103). An average area‐under‐the‐receiver‐operating‐characteristics‐curve (AUC) from 10‐fold cross‐validation of 0.72 was noted, compared to an average AUC of 0.55 for a baseline classifier (p = .0048). The classifier achieved an AUC of 0.76 on a previously unused testing‐set. We also performed external validation using DLPFC gene expression values from an independent cohort of matched MDD cases (n = 29) and controls (n = 29), obtained from Affymetrix microarray (vs. Illumina microarray for the original cohort) (AUC: 0.62). We highlighted gene sets differentially expressed in MDD that were enriched for genes identified by the ML algorithm. Next, we assessed the ML classification performance in blood‐based microarray gene expression data from MDD cases (n = 1,581) and controls (n = 369). We observed a mean AUC of 0.64 on 10‐fold cross‐validation, which was significantly above baseline (p = .0020). Similar performance was observed on the testing‐set (AUC: 0.61). Finally, we analyzed the classification performance in covariates subgroups. We identified an interesting interaction between smoking and recall performance in MDD case prediction (58% accurate predictions in cases who are smokers vs. 43% accurate predictions in cases who are non‐smokers). Overall, our results suggest that ML in combination with gene expression data and covariates could further our understanding of the pathophysiology in MDD.</abstract><cop>Hoboken, USA</cop><pub>John Wiley &amp; Sons, Inc</pub><pmid>33645908</pmid><doi>10.1002/ajmg.b.32839</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-8740-4416</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1552-4841
ispartof American journal of medical genetics. Part B, Neuropsychiatric genetics, 2021-03, Vol.186 (2), p.101-112
issn 1552-4841
1552-485X
language eng
recordid cdi_proquest_miscellaneous_2494878234
source Wiley-Blackwell Read & Publish Collection
subjects bioinformatics
DNA microarrays
Gene expression
Genetics
Learning algorithms
Machine learning
major depression
Mental depression
Predictions
Prefrontal cortex
transcriptomics
title Machine learning and bioinformatic analysis of brain and blood mRNA profiles in major depressive disorder: A case–control study
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T14%3A54%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Machine%20learning%20and%20bioinformatic%20analysis%20of%20brain%20and%20blood%20mRNA%20profiles%20in%20major%20depressive%20disorder:%20A%20case%E2%80%93control%20study&rft.jtitle=American%20journal%20of%20medical%20genetics.%20Part%20B,%20Neuropsychiatric%20genetics&rft.au=Qi,%20Bill&rft.date=2021-03&rft.volume=186&rft.issue=2&rft.spage=101&rft.epage=112&rft.pages=101-112&rft.issn=1552-4841&rft.eissn=1552-485X&rft_id=info:doi/10.1002/ajmg.b.32839&rft_dat=%3Cproquest_cross%3E2494878234%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3649-f071b3753b9c010345b197f403b383790a539deebc0813145c50d543af0186823%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2499245145&rft_id=info:pmid/33645908&rfr_iscdi=true