Loading…
Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods
High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk facto...
Saved in:
Published in: | BMC medical informatics and decision making 2022-11, Vol.22 (1), p.304-25, Article 304 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3 |
container_end_page | 25 |
container_issue | 1 |
container_start_page | 304 |
container_title | BMC medical informatics and decision making |
container_volume | 22 |
creator | Ebrahimi, Ali Wiil, Uffe Kock Naemi, Amin Mansourvar, Marjan Andersen, Kjeld Nielsen, Anette Søgaard |
description | High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk factors related to prediction of clinical disorders. This paper examines the prediction of patients with alcohol use disorder (AUD) using machine learning (ML) and attempts to identify risk factors related to the diagnosis of AUD.
A FS framework consisting of two operational levels, base selectors and ensemble selectors. The first level consists of five FS methods: three filter methods, one wrapper method, and one embedded method. Base selector outputs are aggregated to develop four ensemble FS methods. The outputs of FS method were then fed into three ML algorithms: support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to compare and identify the best feature subset for the prediction of AUD from EHRs.
In terms of feature reduction, the embedded FS method could significantly reduce the number of features from 361 to 131. In terms of classification performance, RF based on 272 features selected by our proposed ensemble method (Union FS) with the highest accuracy in predicting patients with AUD, 96%, outperformed all other models in terms of AUROC, AUPRC, Precision, Recall, and F1-Score. Considering the limitations of embedded and wrapper methods, the best overall performance was achieved by our proposed Union Filter FS, which reduced the number of features to 223 and improved Precision, Recall, and F1-Score in RF from 0.77, 0.65, and 0.71 to 0.87, 0.81, and 0.84, respectively. Our findings indicate that, besides gender, age, and length of stay at the hospital, diagnosis related to digestive organs, bones, muscles and connective tissue, and the nervous systems are important clinical factors related to the prediction of patients with AUD.
Our proposed FS method could improve the classification performance significantly. It could identify clinical factors related to prediction of AUD from EHRs, thereby effectively helping clinical staff to identify and treat AUD patients and improving medical knowledge of the AUD condition. Moreover, the diversity of features among female and male patients as well as gender disparity were investigated using FS methods and ML techniques. |
doi_str_mv | 10.1186/s12911-022-02051-w |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_7c57513cdc254c9a96531e2a3eba456c</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A727807889</galeid><doaj_id>oai_doaj_org_article_7c57513cdc254c9a96531e2a3eba456c</doaj_id><sourcerecordid>A727807889</sourcerecordid><originalsourceid>FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3</originalsourceid><addsrcrecordid>eNptkslqHDEQhpuQEC_JC-QQBDm3raW19CVgTJYBQy7JWail0rSG7tZE0sT4KfLKkWfswQNBiJKq_vrQ8jfNB4KvCFHiOhPaE9JiSuvEnLT3r5pz0knair6Tr1-sz5qLnDcYE6kYf9ucMdHRjvfyvPm7crCU4IM1JcQFRY_sFJa6nZA3tsSUUYLJFHCoRLRN4IJ9VprJxjFOaJcBuZBjcpCQT3FGMIEtKVYOGsFMZawQW-u5asOyRh5M2SVAeS98xM1Qxujyu-aNN1OG90_xsvn19cvP2-_t3Y9vq9ubu9ZyhkvLWA3DgC3zxilBpTEDDJ5I12PBetY7K5RTnDHV1bwA4AJ8xwjBAjgz7LJZHbgumo3epjCb9KCjCXqfiGmtTSrBTqCl5ZITZp2lvLO96QVnBKhhMJiOC1tZnw-s7W6Ywdn6oMlMJ9DTyhJGvY5_dC-UwLKrgE9PgBR_7yAXvYm7tNT7ayo554rL-nFH1drUU4XFxwqzc8hW30gqFZZK9VV19R9VHQ7mYOMCPtT8SQM9NNgUc07gjwcnWD_6TB98pqvP9N5n-r42fXx55WPLs7HYP-8c0W0</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2755585778</pqid></control><display><type>article</type><title>Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods</title><source>Open Access: PubMed Central</source><source>Publicly Available Content (ProQuest)</source><source>Coronavirus Research Database</source><creator>Ebrahimi, Ali ; Wiil, Uffe Kock ; Naemi, Amin ; Mansourvar, Marjan ; Andersen, Kjeld ; Nielsen, Anette Søgaard</creator><creatorcontrib>Ebrahimi, Ali ; Wiil, Uffe Kock ; Naemi, Amin ; Mansourvar, Marjan ; Andersen, Kjeld ; Nielsen, Anette Søgaard</creatorcontrib><description>High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk factors related to prediction of clinical disorders. This paper examines the prediction of patients with alcohol use disorder (AUD) using machine learning (ML) and attempts to identify risk factors related to the diagnosis of AUD.
A FS framework consisting of two operational levels, base selectors and ensemble selectors. The first level consists of five FS methods: three filter methods, one wrapper method, and one embedded method. Base selector outputs are aggregated to develop four ensemble FS methods. The outputs of FS method were then fed into three ML algorithms: support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to compare and identify the best feature subset for the prediction of AUD from EHRs.
In terms of feature reduction, the embedded FS method could significantly reduce the number of features from 361 to 131. In terms of classification performance, RF based on 272 features selected by our proposed ensemble method (Union FS) with the highest accuracy in predicting patients with AUD, 96%, outperformed all other models in terms of AUROC, AUPRC, Precision, Recall, and F1-Score. Considering the limitations of embedded and wrapper methods, the best overall performance was achieved by our proposed Union Filter FS, which reduced the number of features to 223 and improved Precision, Recall, and F1-Score in RF from 0.77, 0.65, and 0.71 to 0.87, 0.81, and 0.84, respectively. Our findings indicate that, besides gender, age, and length of stay at the hospital, diagnosis related to digestive organs, bones, muscles and connective tissue, and the nervous systems are important clinical factors related to the prediction of patients with AUD.
Our proposed FS method could improve the classification performance significantly. It could identify clinical factors related to prediction of AUD from EHRs, thereby effectively helping clinical staff to identify and treat AUD patients and improving medical knowledge of the AUD condition. Moreover, the diversity of features among female and male patients as well as gender disparity were investigated using FS methods and ML techniques.</description><identifier>ISSN: 1472-6947</identifier><identifier>EISSN: 1472-6947</identifier><identifier>DOI: 10.1186/s12911-022-02051-w</identifier><identifier>PMID: 36424597</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Alcohol use ; Alcohol use disorder ; Alcoholism ; Alcoholism - diagnosis ; Algorithms ; Audits ; Bones ; Classification ; Clinical factor identification ; Cluster Analysis ; Computer applications ; Connective tissues ; Datasets ; Diagnosis ; Electronic Health Records ; Electronic medical records ; Electronic records ; Feature selection ; Female ; Gender ; Gender disparity ; Health informatics ; Hospitals ; Humans ; Identification ; Machine Learning ; Male ; Medical informatics ; Medical records ; Methods ; Muscles ; Orthopedics ; Patients ; Performance evaluation ; Predictions ; Recall ; Reduction ; Risk analysis ; Risk factors ; Selectors ; Support Vector Machine ; Support vector machines</subject><ispartof>BMC medical informatics and decision making, 2022-11, Vol.22 (1), p.304-25, Article 304</ispartof><rights>2022. The Author(s).</rights><rights>COPYRIGHT 2022 BioMed Central Ltd.</rights><rights>2022. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3</citedby><cites>FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC9686074/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2755585778?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,38516,43895,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36424597$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ebrahimi, Ali</creatorcontrib><creatorcontrib>Wiil, Uffe Kock</creatorcontrib><creatorcontrib>Naemi, Amin</creatorcontrib><creatorcontrib>Mansourvar, Marjan</creatorcontrib><creatorcontrib>Andersen, Kjeld</creatorcontrib><creatorcontrib>Nielsen, Anette Søgaard</creatorcontrib><title>Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods</title><title>BMC medical informatics and decision making</title><addtitle>BMC Med Inform Decis Mak</addtitle><description>High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk factors related to prediction of clinical disorders. This paper examines the prediction of patients with alcohol use disorder (AUD) using machine learning (ML) and attempts to identify risk factors related to the diagnosis of AUD.
A FS framework consisting of two operational levels, base selectors and ensemble selectors. The first level consists of five FS methods: three filter methods, one wrapper method, and one embedded method. Base selector outputs are aggregated to develop four ensemble FS methods. The outputs of FS method were then fed into three ML algorithms: support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to compare and identify the best feature subset for the prediction of AUD from EHRs.
In terms of feature reduction, the embedded FS method could significantly reduce the number of features from 361 to 131. In terms of classification performance, RF based on 272 features selected by our proposed ensemble method (Union FS) with the highest accuracy in predicting patients with AUD, 96%, outperformed all other models in terms of AUROC, AUPRC, Precision, Recall, and F1-Score. Considering the limitations of embedded and wrapper methods, the best overall performance was achieved by our proposed Union Filter FS, which reduced the number of features to 223 and improved Precision, Recall, and F1-Score in RF from 0.77, 0.65, and 0.71 to 0.87, 0.81, and 0.84, respectively. Our findings indicate that, besides gender, age, and length of stay at the hospital, diagnosis related to digestive organs, bones, muscles and connective tissue, and the nervous systems are important clinical factors related to the prediction of patients with AUD.
Our proposed FS method could improve the classification performance significantly. It could identify clinical factors related to prediction of AUD from EHRs, thereby effectively helping clinical staff to identify and treat AUD patients and improving medical knowledge of the AUD condition. Moreover, the diversity of features among female and male patients as well as gender disparity were investigated using FS methods and ML techniques.</description><subject>Alcohol use</subject><subject>Alcohol use disorder</subject><subject>Alcoholism</subject><subject>Alcoholism - diagnosis</subject><subject>Algorithms</subject><subject>Audits</subject><subject>Bones</subject><subject>Classification</subject><subject>Clinical factor identification</subject><subject>Cluster Analysis</subject><subject>Computer applications</subject><subject>Connective tissues</subject><subject>Datasets</subject><subject>Diagnosis</subject><subject>Electronic Health Records</subject><subject>Electronic medical records</subject><subject>Electronic records</subject><subject>Feature selection</subject><subject>Female</subject><subject>Gender</subject><subject>Gender disparity</subject><subject>Health informatics</subject><subject>Hospitals</subject><subject>Humans</subject><subject>Identification</subject><subject>Machine Learning</subject><subject>Male</subject><subject>Medical informatics</subject><subject>Medical records</subject><subject>Methods</subject><subject>Muscles</subject><subject>Orthopedics</subject><subject>Patients</subject><subject>Performance evaluation</subject><subject>Predictions</subject><subject>Recall</subject><subject>Reduction</subject><subject>Risk analysis</subject><subject>Risk factors</subject><subject>Selectors</subject><subject>Support Vector Machine</subject><subject>Support vector machines</subject><issn>1472-6947</issn><issn>1472-6947</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkslqHDEQhpuQEC_JC-QQBDm3raW19CVgTJYBQy7JWail0rSG7tZE0sT4KfLKkWfswQNBiJKq_vrQ8jfNB4KvCFHiOhPaE9JiSuvEnLT3r5pz0knair6Tr1-sz5qLnDcYE6kYf9ucMdHRjvfyvPm7crCU4IM1JcQFRY_sFJa6nZA3tsSUUYLJFHCoRLRN4IJ9VprJxjFOaJcBuZBjcpCQT3FGMIEtKVYOGsFMZawQW-u5asOyRh5M2SVAeS98xM1Qxujyu-aNN1OG90_xsvn19cvP2-_t3Y9vq9ubu9ZyhkvLWA3DgC3zxilBpTEDDJ5I12PBetY7K5RTnDHV1bwA4AJ8xwjBAjgz7LJZHbgumo3epjCb9KCjCXqfiGmtTSrBTqCl5ZITZp2lvLO96QVnBKhhMJiOC1tZnw-s7W6Ywdn6oMlMJ9DTyhJGvY5_dC-UwLKrgE9PgBR_7yAXvYm7tNT7ayo554rL-nFH1drUU4XFxwqzc8hW30gqFZZK9VV19R9VHQ7mYOMCPtT8SQM9NNgUc07gjwcnWD_6TB98pqvP9N5n-r42fXx55WPLs7HYP-8c0W0</recordid><startdate>20221123</startdate><enddate>20221123</enddate><creator>Ebrahimi, Ali</creator><creator>Wiil, Uffe Kock</creator><creator>Naemi, Amin</creator><creator>Mansourvar, Marjan</creator><creator>Andersen, Kjeld</creator><creator>Nielsen, Anette Søgaard</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20221123</creationdate><title>Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods</title><author>Ebrahimi, Ali ; Wiil, Uffe Kock ; Naemi, Amin ; Mansourvar, Marjan ; Andersen, Kjeld ; Nielsen, Anette Søgaard</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Alcohol use</topic><topic>Alcohol use disorder</topic><topic>Alcoholism</topic><topic>Alcoholism - diagnosis</topic><topic>Algorithms</topic><topic>Audits</topic><topic>Bones</topic><topic>Classification</topic><topic>Clinical factor identification</topic><topic>Cluster Analysis</topic><topic>Computer applications</topic><topic>Connective tissues</topic><topic>Datasets</topic><topic>Diagnosis</topic><topic>Electronic Health Records</topic><topic>Electronic medical records</topic><topic>Electronic records</topic><topic>Feature selection</topic><topic>Female</topic><topic>Gender</topic><topic>Gender disparity</topic><topic>Health informatics</topic><topic>Hospitals</topic><topic>Humans</topic><topic>Identification</topic><topic>Machine Learning</topic><topic>Male</topic><topic>Medical informatics</topic><topic>Medical records</topic><topic>Methods</topic><topic>Muscles</topic><topic>Orthopedics</topic><topic>Patients</topic><topic>Performance evaluation</topic><topic>Predictions</topic><topic>Recall</topic><topic>Reduction</topic><topic>Risk analysis</topic><topic>Risk factors</topic><topic>Selectors</topic><topic>Support Vector Machine</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ebrahimi, Ali</creatorcontrib><creatorcontrib>Wiil, Uffe Kock</creatorcontrib><creatorcontrib>Naemi, Amin</creatorcontrib><creatorcontrib>Mansourvar, Marjan</creatorcontrib><creatorcontrib>Andersen, Kjeld</creatorcontrib><creatorcontrib>Nielsen, Anette Søgaard</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest_Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Healthcare Administration Database</collection><collection>Medical Database</collection><collection>ProQuest Biological Science Journals</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC medical informatics and decision making</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ebrahimi, Ali</au><au>Wiil, Uffe Kock</au><au>Naemi, Amin</au><au>Mansourvar, Marjan</au><au>Andersen, Kjeld</au><au>Nielsen, Anette Søgaard</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods</atitle><jtitle>BMC medical informatics and decision making</jtitle><addtitle>BMC Med Inform Decis Mak</addtitle><date>2022-11-23</date><risdate>2022</risdate><volume>22</volume><issue>1</issue><spage>304</spage><epage>25</epage><pages>304-25</pages><artnum>304</artnum><issn>1472-6947</issn><eissn>1472-6947</eissn><abstract>High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk factors related to prediction of clinical disorders. This paper examines the prediction of patients with alcohol use disorder (AUD) using machine learning (ML) and attempts to identify risk factors related to the diagnosis of AUD.
A FS framework consisting of two operational levels, base selectors and ensemble selectors. The first level consists of five FS methods: three filter methods, one wrapper method, and one embedded method. Base selector outputs are aggregated to develop four ensemble FS methods. The outputs of FS method were then fed into three ML algorithms: support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to compare and identify the best feature subset for the prediction of AUD from EHRs.
In terms of feature reduction, the embedded FS method could significantly reduce the number of features from 361 to 131. In terms of classification performance, RF based on 272 features selected by our proposed ensemble method (Union FS) with the highest accuracy in predicting patients with AUD, 96%, outperformed all other models in terms of AUROC, AUPRC, Precision, Recall, and F1-Score. Considering the limitations of embedded and wrapper methods, the best overall performance was achieved by our proposed Union Filter FS, which reduced the number of features to 223 and improved Precision, Recall, and F1-Score in RF from 0.77, 0.65, and 0.71 to 0.87, 0.81, and 0.84, respectively. Our findings indicate that, besides gender, age, and length of stay at the hospital, diagnosis related to digestive organs, bones, muscles and connective tissue, and the nervous systems are important clinical factors related to the prediction of patients with AUD.
Our proposed FS method could improve the classification performance significantly. It could identify clinical factors related to prediction of AUD from EHRs, thereby effectively helping clinical staff to identify and treat AUD patients and improving medical knowledge of the AUD condition. Moreover, the diversity of features among female and male patients as well as gender disparity were investigated using FS methods and ML techniques.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>36424597</pmid><doi>10.1186/s12911-022-02051-w</doi><tpages>25</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1472-6947 |
ispartof | BMC medical informatics and decision making, 2022-11, Vol.22 (1), p.304-25, Article 304 |
issn | 1472-6947 1472-6947 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_7c57513cdc254c9a96531e2a3eba456c |
source | Open Access: PubMed Central; Publicly Available Content (ProQuest); Coronavirus Research Database |
subjects | Alcohol use Alcohol use disorder Alcoholism Alcoholism - diagnosis Algorithms Audits Bones Classification Clinical factor identification Cluster Analysis Computer applications Connective tissues Datasets Diagnosis Electronic Health Records Electronic medical records Electronic records Feature selection Female Gender Gender disparity Health informatics Hospitals Humans Identification Machine Learning Male Medical informatics Medical records Methods Muscles Orthopedics Patients Performance evaluation Predictions Recall Reduction Risk analysis Risk factors Selectors Support Vector Machine Support vector machines |
title | Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T02%3A15%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identification%20of%20clinical%20factors%20related%20to%20prediction%20of%20alcohol%20use%20disorder%20from%20electronic%20health%20records%20using%20feature%20selection%20methods&rft.jtitle=BMC%20medical%20informatics%20and%20decision%20making&rft.au=Ebrahimi,%20Ali&rft.date=2022-11-23&rft.volume=22&rft.issue=1&rft.spage=304&rft.epage=25&rft.pages=304-25&rft.artnum=304&rft.issn=1472-6947&rft.eissn=1472-6947&rft_id=info:doi/10.1186/s12911-022-02051-w&rft_dat=%3Cgale_doaj_%3EA727807889%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c530t-33530bb0c3fad8627aabebf17d9063939dc68d853384ebf6ee56ef431106e53a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2755585778&rft_id=info:pmid/36424597&rft_galeid=A727807889&rfr_iscdi=true |