Loading…
XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease
Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performa...
Saved in:
Published in: | BMC medical informatics and decision making 2023-07, Vol.23 (1), p.137-14, Article 137 |
---|---|
Main Authors: | , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963 |
---|---|
cites | cdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963 |
container_end_page | 14 |
container_issue | 1 |
container_start_page | 137 |
container_title | BMC medical informatics and decision making |
container_volume | 23 |
creator | Yi, Fuliang Yang, Hui Chen, Durong Qin, Yao Han, Hongjuan Cui, Jing Bai, Wenlin Ma, Yifei Zhang, Rong Yu, Hongmei |
description | Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD.
We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset.
Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks |
doi_str_mv | 10.1186/s12911-023-02238-9 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_2f7158995ba544838dc87c3f0dbde75f</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A758479112</galeid><doaj_id>oai_doaj_org_article_2f7158995ba544838dc87c3f0dbde75f</doaj_id><sourcerecordid>A758479112</sourcerecordid><originalsourceid>FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</originalsourceid><addsrcrecordid>eNptkl9rFDEUxQdRbK1-AR9kwAf7MjV_Z5InWUttCxUFFXwLd5KbbdaZyTaZVfTTm3Zr7YqEkJD8zklyc6rqOSVHlKr2daZMU9oQxktnXDX6QbVPRceaVovu4b35XvUk5xUhtFNcPq72eCc0ZULtV--_nr6NMc_Np7PFx6aHjK4O04xpnXCGfsDaBVhOhQi29glG_BHTt9rHVMPw6xLDiOlVLlDGon1aPfIwZHx2Ox5UX96dfD4-ay4-nJ4fLy4aK1sxNwDonKYtdbLtuNfQIgPONEfBCLMoaEsck6AYVb2QSgBX4BklylLwuuUH1fnW10VYmXUKI6SfJkIwNwsxLQ2kcuMBDfMdlUpr2YMUQnHlrOos98T1Djvpi9ebrdd604_oLE5zgmHHdHdnCpdmGb8bSnirFRHF4fDWIcWrDebZjCFbHAaYMG6yYUowIZmivKAv_0FXcZOmUqtCSUaIaqX8Sy2hvCBMPpaD7bWpWXSlHF35dlaoo_9QpTkcg40T-lDWdwRsK7Ap5pzQ3z2SEnOdKLNNlCmJMjeJMrqIXtwvz53kT4T4b-r6xG8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2852008655</pqid></control><display><type>article</type><title>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Yi, Fuliang ; Yang, Hui ; Chen, Durong ; Qin, Yao ; Han, Hongjuan ; Cui, Jing ; Bai, Wenlin ; Ma, Yifei ; Zhang, Rong ; Yu, Hongmei</creator><creatorcontrib>Yi, Fuliang ; Yang, Hui ; Chen, Durong ; Qin, Yao ; Han, Hongjuan ; Cui, Jing ; Bai, Wenlin ; Ma, Yifei ; Zhang, Rong ; Yu, Hongmei</creatorcontrib><description>Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD.
We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset.
Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset.
The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.</description><identifier>ISSN: 1472-6947</identifier><identifier>EISSN: 1472-6947</identifier><identifier>DOI: 10.1186/s12911-023-02238-9</identifier><identifier>PMID: 37491248</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Accuracy ; Algorithms ; Alzheimer Disease - diagnostic imaging ; Alzheimer's disease ; Analysis ; Apolipoprotein E ; Bagging ; Bayes Theorem ; Biomarkers ; Brain research ; Care and treatment ; Classification ; Cognition ; Cognition & reasoning ; Cognitive ability ; Cognitive Dysfunction - diagnosis ; Data integrity ; Datasets ; Decision making ; Decision trees ; Deep learning ; Diagnosis ; Diagnostic imaging ; Discriminant analysis ; Evaluation ; Humans ; Imbalanced classes ; Interpretable framework ; Machine Learning ; Magnetic resonance imaging ; Magnetic Resonance Imaging - methods ; Medical diagnosis ; Medical imaging ; Multiclassification ; Neural networks ; Neurodegenerative diseases ; Neuroimaging ; Neuropsychology ; Numerical prediction ; Optimization ; Ventricle ; Ventricles (cerebral) ; XGBoost-SHAP</subject><ispartof>BMC medical informatics and decision making, 2023-07, Vol.23 (1), p.137-14, Article 137</ispartof><rights>2023. The Author(s).</rights><rights>COPYRIGHT 2023 BioMed Central Ltd.</rights><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</citedby><cites>FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10369804/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2852008655?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37491248$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yi, Fuliang</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Chen, Durong</creatorcontrib><creatorcontrib>Qin, Yao</creatorcontrib><creatorcontrib>Han, Hongjuan</creatorcontrib><creatorcontrib>Cui, Jing</creatorcontrib><creatorcontrib>Bai, Wenlin</creatorcontrib><creatorcontrib>Ma, Yifei</creatorcontrib><creatorcontrib>Zhang, Rong</creatorcontrib><creatorcontrib>Yu, Hongmei</creatorcontrib><title>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</title><title>BMC medical informatics and decision making</title><addtitle>BMC Med Inform Decis Mak</addtitle><description>Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD.
We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset.
Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset.
The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Alzheimer Disease - diagnostic imaging</subject><subject>Alzheimer's disease</subject><subject>Analysis</subject><subject>Apolipoprotein E</subject><subject>Bagging</subject><subject>Bayes Theorem</subject><subject>Biomarkers</subject><subject>Brain research</subject><subject>Care and treatment</subject><subject>Classification</subject><subject>Cognition</subject><subject>Cognition & reasoning</subject><subject>Cognitive ability</subject><subject>Cognitive Dysfunction - diagnosis</subject><subject>Data integrity</subject><subject>Datasets</subject><subject>Decision making</subject><subject>Decision trees</subject><subject>Deep learning</subject><subject>Diagnosis</subject><subject>Diagnostic imaging</subject><subject>Discriminant analysis</subject><subject>Evaluation</subject><subject>Humans</subject><subject>Imbalanced classes</subject><subject>Interpretable framework</subject><subject>Machine Learning</subject><subject>Magnetic resonance imaging</subject><subject>Magnetic Resonance Imaging - methods</subject><subject>Medical diagnosis</subject><subject>Medical imaging</subject><subject>Multiclassification</subject><subject>Neural networks</subject><subject>Neurodegenerative diseases</subject><subject>Neuroimaging</subject><subject>Neuropsychology</subject><subject>Numerical prediction</subject><subject>Optimization</subject><subject>Ventricle</subject><subject>Ventricles (cerebral)</subject><subject>XGBoost-SHAP</subject><issn>1472-6947</issn><issn>1472-6947</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkl9rFDEUxQdRbK1-AR9kwAf7MjV_Z5InWUttCxUFFXwLd5KbbdaZyTaZVfTTm3Zr7YqEkJD8zklyc6rqOSVHlKr2daZMU9oQxktnXDX6QbVPRceaVovu4b35XvUk5xUhtFNcPq72eCc0ZULtV--_nr6NMc_Np7PFx6aHjK4O04xpnXCGfsDaBVhOhQi29glG_BHTt9rHVMPw6xLDiOlVLlDGon1aPfIwZHx2Ox5UX96dfD4-ay4-nJ4fLy4aK1sxNwDonKYtdbLtuNfQIgPONEfBCLMoaEsck6AYVb2QSgBX4BklylLwuuUH1fnW10VYmXUKI6SfJkIwNwsxLQ2kcuMBDfMdlUpr2YMUQnHlrOos98T1Djvpi9ebrdd604_oLE5zgmHHdHdnCpdmGb8bSnirFRHF4fDWIcWrDebZjCFbHAaYMG6yYUowIZmivKAv_0FXcZOmUqtCSUaIaqX8Sy2hvCBMPpaD7bWpWXSlHF35dlaoo_9QpTkcg40T-lDWdwRsK7Ap5pzQ3z2SEnOdKLNNlCmJMjeJMrqIXtwvz53kT4T4b-r6xG8</recordid><startdate>20230725</startdate><enddate>20230725</enddate><creator>Yi, Fuliang</creator><creator>Yang, Hui</creator><creator>Chen, Durong</creator><creator>Qin, Yao</creator><creator>Han, Hongjuan</creator><creator>Cui, Jing</creator><creator>Bai, Wenlin</creator><creator>Ma, Yifei</creator><creator>Zhang, Rong</creator><creator>Yu, Hongmei</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20230725</creationdate><title>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</title><author>Yi, Fuliang ; Yang, Hui ; Chen, Durong ; Qin, Yao ; Han, Hongjuan ; Cui, Jing ; Bai, Wenlin ; Ma, Yifei ; Zhang, Rong ; Yu, Hongmei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Alzheimer Disease - diagnostic imaging</topic><topic>Alzheimer's disease</topic><topic>Analysis</topic><topic>Apolipoprotein E</topic><topic>Bagging</topic><topic>Bayes Theorem</topic><topic>Biomarkers</topic><topic>Brain research</topic><topic>Care and treatment</topic><topic>Classification</topic><topic>Cognition</topic><topic>Cognition & reasoning</topic><topic>Cognitive ability</topic><topic>Cognitive Dysfunction - diagnosis</topic><topic>Data integrity</topic><topic>Datasets</topic><topic>Decision making</topic><topic>Decision trees</topic><topic>Deep learning</topic><topic>Diagnosis</topic><topic>Diagnostic imaging</topic><topic>Discriminant analysis</topic><topic>Evaluation</topic><topic>Humans</topic><topic>Imbalanced classes</topic><topic>Interpretable framework</topic><topic>Machine Learning</topic><topic>Magnetic resonance imaging</topic><topic>Magnetic Resonance Imaging - methods</topic><topic>Medical diagnosis</topic><topic>Medical imaging</topic><topic>Multiclassification</topic><topic>Neural networks</topic><topic>Neurodegenerative diseases</topic><topic>Neuroimaging</topic><topic>Neuropsychology</topic><topic>Numerical prediction</topic><topic>Optimization</topic><topic>Ventricle</topic><topic>Ventricles (cerebral)</topic><topic>XGBoost-SHAP</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yi, Fuliang</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Chen, Durong</creatorcontrib><creatorcontrib>Qin, Yao</creatorcontrib><creatorcontrib>Han, Hongjuan</creatorcontrib><creatorcontrib>Cui, Jing</creatorcontrib><creatorcontrib>Bai, Wenlin</creatorcontrib><creatorcontrib>Ma, Yifei</creatorcontrib><creatorcontrib>Zhang, Rong</creatorcontrib><creatorcontrib>Yu, Hongmei</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health Medical collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Biological Sciences</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database (Proquest)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>ProQuest Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC medical informatics and decision making</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yi, Fuliang</au><au>Yang, Hui</au><au>Chen, Durong</au><au>Qin, Yao</au><au>Han, Hongjuan</au><au>Cui, Jing</au><au>Bai, Wenlin</au><au>Ma, Yifei</au><au>Zhang, Rong</au><au>Yu, Hongmei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease</atitle><jtitle>BMC medical informatics and decision making</jtitle><addtitle>BMC Med Inform Decis Mak</addtitle><date>2023-07-25</date><risdate>2023</risdate><volume>23</volume><issue>1</issue><spage>137</spage><epage>14</epage><pages>137-14</pages><artnum>137</artnum><issn>1472-6947</issn><eissn>1472-6947</eissn><abstract>Due to the class imbalance issue faced when Alzheimer's disease (AD) develops from normal cognition (NC) to mild cognitive impairment (MCI), present clinical practice is met with challenges regarding the auxiliary diagnosis of AD using machine learning (ML). This leads to low diagnosis performance. We aimed to construct an interpretable framework, extreme gradient boosting-Shapley additive explanations (XGBoost-SHAP), to handle the imbalance among different AD progression statuses at the algorithmic level. We also sought to achieve multiclassification of NC, MCI, and AD.
We obtained patient data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database, including clinical information, neuropsychological test results, neuroimaging-derived biomarkers, and APOE-ε4 gene statuses. First, three feature selection algorithms were applied, and they were then included in the XGBoost algorithm. Due to the imbalance among the three classes, we changed the sample weight distribution to achieve multiclassification of NC, MCI, and AD. Then, the SHAP method was linked to XGBoost to form an interpretable framework. This framework utilized attribution ideas that quantified the impacts of model predictions into numerical values and analysed them based on their directions and sizes. Subsequently, the top 10 features (optimal subset) were used to simplify the clinical decision-making process, and their performance was compared with that of a random forest (RF), Bagging, AdaBoost, and a naive Bayes (NB) classifier. Finally, the National Alzheimer's Coordinating Center (NACC) dataset was employed to assess the impact path consistency of the features within the optimal subset.
Compared to the RF, Bagging, AdaBoost, NB and XGBoost (unweighted), the interpretable framework had higher classification performance with accuracy improvements of 0.74%, 0.74%, 1.46%, 13.18%, and 0.83%, respectively. The framework achieved high sensitivity (81.21%/74.85%), specificity (92.18%/89.86%), accuracy (87.57%/80.52%), area under the receiver operating characteristic curve (AUC) (0.91/0.88), positive clinical utility index (0.71/0.56), and negative clinical utility index (0.75/0.68) on the ADNI and NACC datasets, respectively. In the ADNI dataset, the top 10 features were found to have varying associations with the risk of AD onset based on their SHAP values. Specifically, the higher SHAP values of CDRSB, ADAS13, ADAS11, ventricle volume, ADASQ4, and FAQ were associated with higher risks of AD onset. Conversely, the higher SHAP values of LDELTOTAL, mPACCdigit, RAVLT_immediate, and MMSE were associated with lower risks of AD onset. Similar results were found for the NACC dataset.
The proposed interpretable framework contributes to achieving excellent performance in imbalanced AD multiclassification tasks and provides scientific guidance (optimal subset) for clinical decision-making, thereby facilitating disease management and offering new research ideas for optimizing AD prevention and treatment programs.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>37491248</pmid><doi>10.1186/s12911-023-02238-9</doi><tpages>14</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1472-6947 |
ispartof | BMC medical informatics and decision making, 2023-07, Vol.23 (1), p.137-14, Article 137 |
issn | 1472-6947 1472-6947 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_2f7158995ba544838dc87c3f0dbde75f |
source | Publicly Available Content Database; PubMed Central |
subjects | Accuracy Algorithms Alzheimer Disease - diagnostic imaging Alzheimer's disease Analysis Apolipoprotein E Bagging Bayes Theorem Biomarkers Brain research Care and treatment Classification Cognition Cognition & reasoning Cognitive ability Cognitive Dysfunction - diagnosis Data integrity Datasets Decision making Decision trees Deep learning Diagnosis Diagnostic imaging Discriminant analysis Evaluation Humans Imbalanced classes Interpretable framework Machine Learning Magnetic resonance imaging Magnetic Resonance Imaging - methods Medical diagnosis Medical imaging Multiclassification Neural networks Neurodegenerative diseases Neuroimaging Neuropsychology Numerical prediction Optimization Ventricle Ventricles (cerebral) XGBoost-SHAP |
title | XGBoost-SHAP-based interpretable diagnostic framework for alzheimer's disease |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T18%3A16%3A15IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=XGBoost-SHAP-based%20interpretable%20diagnostic%20framework%20for%20alzheimer's%20disease&rft.jtitle=BMC%20medical%20informatics%20and%20decision%20making&rft.au=Yi,%20Fuliang&rft.date=2023-07-25&rft.volume=23&rft.issue=1&rft.spage=137&rft.epage=14&rft.pages=137-14&rft.artnum=137&rft.issn=1472-6947&rft.eissn=1472-6947&rft_id=info:doi/10.1186/s12911-023-02238-9&rft_dat=%3Cgale_doaj_%3EA758479112%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c564t-aaedd9161d5673f9a6e2a3293e4202ce4160d25a8218b4584a38af2108c1af963%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2852008655&rft_id=info:pmid/37491248&rft_galeid=A758479112&rfr_iscdi=true |