Loading…

XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease

Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performa...

Full description

Saved in:
Bibliographic Details
Published in:BMC medical informatics and decision making 2023-07, Vol.23 (1), p.137-14, Article 137
Main Authors: Yi, Fuliang, Yang, Hui, Chen, Durong, Qin, Yao, Han, Hongjuan, Cui, Jing, Bai, Wenlin, Ma, Yifei, Zhang, Rong, Yu, Hongmei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963
cites cdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963
container_end_page 14
container_issue 1
container_start_page 137
container_title BMC medical informatics and decision making
container_volume 23
creator Yi, Fuliang
Yang, Hui
Chen, Durong
Qin, Yao
Han, Hongjuan
Cui, Jing
Bai, Wenlin
Ma, Yifei
Zhang, Rong
Yu, Hongmei
description Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks
doi_str_mv 10.1186/s12911-023-02238-9
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_2f7158995ba544838dc87c3f0dbde75f</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A758479112</galeid><doaj_id>oai_doaj_org_article_2f7158995ba544838dc87c3f0dbde75f</doaj_id><sourcerecordid>A758479112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</originalsourceid><addsrcrecordid>eNptkl9rFDEUxQdRbK1-AR9kwAf7MjV_Z5InWUttCxUFFXwLd5KbbdaZyTaZVfTTm3Zr7YqEkJD8zklyc6rqOSVHlKr2daZMU9oQxktnXDX6QbVPRceaVovu4b35XvUk5xUhtFNcPq72eCc0ZULtV--_nr6NMc_Np7PFx6aHjK4O04xpnXCGfsDaBVhOhQi29glG_BHTt9rHVMPw6xLDiOlVLlDGon1aPfIwZHx2Ox5UX96dfD4-ay4-nJ4fLy4aK1sxNwDonKYtdbLtuNfQIgPONEfBCLMoaEsck6AYVb2QSgBX4BklylLwuuUH1fnW10VYmXUKI6SfJkIwNwsxLQ2kcuMBDfMdlUpr2YMUQnHlrOos98T1Djvpi9ebrdd604_oLE5zgmHHdHdnCpdmGb8bSnirFRHF4fDWIcWrDebZjCFbHAaYMG6yYUowIZmivKAv_0FXcZOmUqtCSUaIaqX8Sy2hvCBMPpaD7bWpWXSlHF35dlaoo_9QpTkcg40T-lDWdwRsK7Ap5pzQ3z2SEnOdKLNNlCmJMjeJMrqIXtwvz53kT4T4b-r6xG8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2852008655</pqid></control><display><type>article</type><title>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Yi, Fuliang ; Yang, Hui ; Chen, Durong ; Qin, Yao ; Han, Hongjuan ; Cui, Jing ; Bai, Wenlin ; Ma, Yifei ; Zhang, Rong ; Yu, Hongmei</creator><creatorcontrib>Yi, Fuliang ; Yang, Hui ; Chen, Durong ; Qin, Yao ; Han, Hongjuan ; Cui, Jing ; Bai, Wenlin ; Ma, Yifei ; Zhang, Rong ; Yu, Hongmei</creatorcontrib><description>Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.</description><identifier>ISSN: 1472-6947</identifier><identifier>EISSN: 1472-6947</identifier><identifier>DOI: 10.1186/s12911-023-02238-9</identifier><identifier>PMID: 37491248</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Accuracy ; Algorithms ; Alzheimer Disease - diagnostic imaging ; Alzheimer's disease ; Analysis ; Apolipoprotein E ; Bagging ; Bayes Theorem ; Biomarkers ; Brain research ; Care and treatment ; Classification ; Cognition ; Cognition &amp; reasoning ; Cognitive ability ; Cognitive Dysfunction - diagnosis ; Data integrity ; Datasets ; Decision making ; Decision trees ; Deep learning ; Diagnosis ; Diagnostic imaging ; Discriminant analysis ; Evaluation ; Humans ; Imbalanced classes ; Interpretable framework ; Machine Learning ; Magnetic resonance imaging ; Magnetic Resonance Imaging - methods ; Medical diagnosis ; Medical imaging ; Multiclassification ; Neural networks ; Neurodegenerative diseases ; Neuroimaging ; Neuropsychology ; Numerical prediction ; Optimization ; Ventricle ; Ventricles (cerebral) ; XGBoost-SHAP</subject><ispartof>BMC medical informatics and decision making, 2023-07, Vol.23 (1), p.137-14, Article 137</ispartof><rights>2023. The Author(s).</rights><rights>COPYRIGHT 2023 BioMed Central Ltd.</rights><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</citedby><cites>FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369804/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2852008655?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37491248$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yi, Fuliang</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Chen, Durong</creatorcontrib><creatorcontrib>Qin, Yao</creatorcontrib><creatorcontrib>Han, Hongjuan</creatorcontrib><creatorcontrib>Cui, Jing</creatorcontrib><creatorcontrib>Bai, Wenlin</creatorcontrib><creatorcontrib>Ma, Yifei</creatorcontrib><creatorcontrib>Zhang, Rong</creatorcontrib><creatorcontrib>Yu, Hongmei</creatorcontrib><title>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</title><title>BMC medical informatics and decision making</title><addtitle>BMC Med Inform Decis Mak</addtitle><description>Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Alzheimer Disease - diagnostic imaging</subject><subject>Alzheimer's disease</subject><subject>Analysis</subject><subject>Apolipoprotein E</subject><subject>Bagging</subject><subject>Bayes Theorem</subject><subject>Biomarkers</subject><subject>Brain research</subject><subject>Care and treatment</subject><subject>Classification</subject><subject>Cognition</subject><subject>Cognition &amp; reasoning</subject><subject>Cognitive ability</subject><subject>Cognitive Dysfunction - diagnosis</subject><subject>Data integrity</subject><subject>Datasets</subject><subject>Decision making</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Diagnosis</subject><subject>Diagnostic imaging</subject><subject>Discriminant analysis</subject><subject>Evaluation</subject><subject>Humans</subject><subject>Imbalanced classes</subject><subject>Interpretable framework</subject><subject>Machine Learning</subject><subject>Magnetic resonance imaging</subject><subject>Magnetic Resonance Imaging - methods</subject><subject>Medical diagnosis</subject><subject>Medical imaging</subject><subject>Multiclassification</subject><subject>Neural networks</subject><subject>Neurodegenerative diseases</subject><subject>Neuroimaging</subject><subject>Neuropsychology</subject><subject>Numerical prediction</subject><subject>Optimization</subject><subject>Ventricle</subject><subject>Ventricles (cerebral)</subject><subject>XGBoost-SHAP</subject><issn>1472-6947</issn><issn>1472-6947</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkl9rFDEUxQdRbK1-AR9kwAf7MjV_Z5InWUttCxUFFXwLd5KbbdaZyTaZVfTTm3Zr7YqEkJD8zklyc6rqOSVHlKr2daZMU9oQxktnXDX6QbVPRceaVovu4b35XvUk5xUhtFNcPq72eCc0ZULtV--_nr6NMc_Np7PFx6aHjK4O04xpnXCGfsDaBVhOhQi29glG_BHTt9rHVMPw6xLDiOlVLlDGon1aPfIwZHx2Ox5UX96dfD4-ay4-nJ4fLy4aK1sxNwDonKYtdbLtuNfQIgPONEfBCLMoaEsck6AYVb2QSgBX4BklylLwuuUH1fnW10VYmXUKI6SfJkIwNwsxLQ2kcuMBDfMdlUpr2YMUQnHlrOos98T1Djvpi9ebrdd604_oLE5zgmHHdHdnCpdmGb8bSnirFRHF4fDWIcWrDebZjCFbHAaYMG6yYUowIZmivKAv_0FXcZOmUqtCSUaIaqX8Sy2hvCBMPpaD7bWpWXSlHF35dlaoo_9QpTkcg40T-lDWdwRsK7Ap5pzQ3z2SEnOdKLNNlCmJMjeJMrqIXtwvz53kT4T4b-r6xG8</recordid><startdate>20230725</startdate><enddate>20230725</enddate><creator>Yi, Fuliang</creator><creator>Yang, Hui</creator><creator>Chen, Durong</creator><creator>Qin, Yao</creator><creator>Han, Hongjuan</creator><creator>Cui, Jing</creator><creator>Bai, Wenlin</creator><creator>Ma, Yifei</creator><creator>Zhang, Rong</creator><creator>Yu, Hongmei</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20230725</creationdate><title>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</title><author>Yi, Fuliang ; Yang, Hui ; Chen, Durong ; Qin, Yao ; Han, Hongjuan ; Cui, Jing ; Bai, Wenlin ; Ma, Yifei ; Zhang, Rong ; Yu, Hongmei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Alzheimer Disease - diagnostic imaging</topic><topic>Alzheimer's disease</topic><topic>Analysis</topic><topic>Apolipoprotein E</topic><topic>Bagging</topic><topic>Bayes Theorem</topic><topic>Biomarkers</topic><topic>Brain research</topic><topic>Care and treatment</topic><topic>Classification</topic><topic>Cognition</topic><topic>Cognition &amp; reasoning</topic><topic>Cognitive ability</topic><topic>Cognitive Dysfunction - diagnosis</topic><topic>Data integrity</topic><topic>Datasets</topic><topic>Decision making</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Diagnosis</topic><topic>Diagnostic imaging</topic><topic>Discriminant analysis</topic><topic>Evaluation</topic><topic>Humans</topic><topic>Imbalanced classes</topic><topic>Interpretable framework</topic><topic>Machine Learning</topic><topic>Magnetic resonance imaging</topic><topic>Magnetic Resonance Imaging - methods</topic><topic>Medical diagnosis</topic><topic>Medical imaging</topic><topic>Multiclassification</topic><topic>Neural networks</topic><topic>Neurodegenerative diseases</topic><topic>Neuroimaging</topic><topic>Neuropsychology</topic><topic>Numerical prediction</topic><topic>Optimization</topic><topic>Ventricle</topic><topic>Ventricles (cerebral)</topic><topic>XGBoost-SHAP</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yi, Fuliang</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Chen, Durong</creatorcontrib><creatorcontrib>Qin, Yao</creatorcontrib><creatorcontrib>Han, Hongjuan</creatorcontrib><creatorcontrib>Cui, Jing</creatorcontrib><creatorcontrib>Bai, Wenlin</creatorcontrib><creatorcontrib>Ma, Yifei</creatorcontrib><creatorcontrib>Zhang, Rong</creatorcontrib><creatorcontrib>Yu, Hongmei</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health Medical collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Biological Sciences</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database (Proquest)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC medical informatics and decision making</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yi, Fuliang</au><au>Yang, Hui</au><au>Chen, Durong</au><au>Qin, Yao</au><au>Han, Hongjuan</au><au>Cui, Jing</au><au>Bai, Wenlin</au><au>Ma, Yifei</au><au>Zhang, Rong</au><au>Yu, Hongmei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</atitle><jtitle>BMC medical informatics and decision making</jtitle><addtitle>BMC Med Inform Decis Mak</addtitle><date>2023-07-25</date><risdate>2023</risdate><volume>23</volume><issue>1</issue><spage>137</spage><epage>14</epage><pages>137-14</pages><artnum>137</artnum><issn>1472-6947</issn><eissn>1472-6947</eissn><abstract>Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD. We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset. Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset. The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>37491248</pmid><doi>10.1186/s12911-023-02238-9</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1472-6947
ispartof BMC medical informatics and decision making, 2023-07, Vol.23 (1), p.137-14, Article 137
issn 1472-6947
1472-6947
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_2f7158995ba544838dc87c3f0dbde75f
source Publicly Available Content Database; PubMed Central
subjects Accuracy
Algorithms
Alzheimer Disease - diagnostic imaging
Alzheimer's disease
Analysis
Apolipoprotein E
Bagging
Bayes Theorem
Biomarkers
Brain research
Care and treatment
Classification
Cognition
Cognition & reasoning
Cognitive ability
Cognitive Dysfunction - diagnosis
Data integrity
Datasets
Decision making
Decision trees
Deep learning
Diagnosis
Diagnostic imaging
Discriminant analysis
Evaluation
Humans
Imbalanced classes
Interpretable framework
Machine Learning
Magnetic resonance imaging
Magnetic Resonance Imaging - methods
Medical diagnosis
Medical imaging
Multiclassification
Neural networks
Neurodegenerative diseases
Neuroimaging
Neuropsychology
Numerical prediction
Optimization
Ventricle
Ventricles (cerebral)
XGBoost-SHAP
title XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T18%3A16%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=XGBoost-SHAP-based%20interpretable%20diagnostic%20framework%20for%20alzheimer's%20disease&rft.jtitle=BMC%20medical%20informatics%20and%20decision%20making&rft.au=Yi,%20Fuliang&rft.date=2023-07-25&rft.volume=23&rft.issue=1&rft.spage=137&rft.epage=14&rft.pages=137-14&rft.artnum=137&rft.issn=1472-6947&rft.eissn=1472-6947&rft_id=info:doi/10.1186/s12911-023-02238-9&rft_dat=%3Cgale_doaj_%3EA758479112%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2852008655&rft_id=info:pmid/37491248&rft_galeid=A758479112&rfr_iscdi=true