Loading…

Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit

Sepsis is a life-threatening disease that occurs as a result of the body's response to an infection. This study aims to develop a classification model for predicting patients at risk of sepsis using clinical findings and demographic information. The study was conducted using a MIMICIII dataset...

Full description

Saved in:
Bibliographic Details
Published in:Informatics in medicine unlocked 2023, Vol.38, p.101236, Article 101236
Main Authors: Gholamzadeh, Marsa, Abtahi, Hamidreza, Safdari, Reza
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3
cites cdi_FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3
container_end_page
container_issue
container_start_page 101236
container_title Informatics in medicine unlocked
container_volume 38
creator Gholamzadeh, Marsa
Abtahi, Hamidreza
Safdari, Reza
description Sepsis is a life-threatening disease that occurs as a result of the body's response to an infection. This study aims to develop a classification model for predicting patients at risk of sepsis using clinical findings and demographic information. The study was conducted using a MIMICIII dataset which is freely available as open-access data. The synthetic minority oversampling technique (SMOTE) was applied to address the imbalanced data problem in our dataset. Through preprocessing, the dataset was cleaned and missing values were imputed. Split validation was done by dividing the dataset into training and test data for developing classification models. Six algorithms including Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), KNN algorithm, and XGBoost classifier were developed. A combination of evaluation metrics was employed to evaluate the performance of the proposed models. Our dataset includes 1,552,210 entries with 44 features of critically ill patients who were admitted to the ICU. Comparing the performance of developed models using different metrics showed that the RF model had the best performance in terms of F-Measure and the area under the ROC curve. The 20 top features with high importance were determined based on the RF model. Our analysis showed that the RF model predicted sepsis with significantly higher performance in comparison to other classification models using the MIMICIII dataset. Due to the high mortality of sepsis, these kinds of studies could be supportive to prevent the side effects of the disease and lessen the risk of mortality in hospitalized patients by providing early sepsis prediction.
doi_str_mv 10.1016/j.imu.2023.101236
format article
fullrecord <record><control><sourceid>elsevier_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_482aee4a704c41b49d3a796e5a667d20</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S2352914823000783</els_id><doaj_id>oai_doaj_org_article_482aee4a704c41b49d3a796e5a667d20</doaj_id><sourcerecordid>S2352914823000783</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3</originalsourceid><addsrcrecordid>eNp9kc-O3CAMxqOqlbra7gP0xgvMFAIhiXqqRv2z0kq97B05YGY8SiACZqR9gz52Sada9dSTDeb7Gftrmo-C7wUX-tN5T8tl3_JWbudW6jfNXSu7djcKNbz9J3_fPOR85pyLXsuu7-6aX4e4rJAox8CiZ468x4ShsAXsiQKyGSEFCkcG8zEmKqclsxKZnSFn8i9shUL1fWb5kle0Bd3GOcF102RcM2VGwdcK1RYUWDlhDQVDpisyCwnZJVD50LzzMGd8-Bvvm-dvX58PP3ZPP78_Hr487axshd55CX3vcHQwTlwNbptitNwKPk2TF6jdZF0_dCiU9DigHjqtpOxUZ_kAk7xvHm9YF-Fs1kQLpBcTgcyfi5iOBlIhO6NRQwuICnqurBKTGl3tPWrsQOvetbyyxI1lU8w5oX_lCW42Y8zZVGPMZoy5GVM1n28arDNeCZPJtu7PoqNUd1R_Qf9R_wZvxZkG</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit</title><source>ScienceDirect Journals</source><creator>Gholamzadeh, Marsa ; Abtahi, Hamidreza ; Safdari, Reza</creator><creatorcontrib>Gholamzadeh, Marsa ; Abtahi, Hamidreza ; Safdari, Reza</creatorcontrib><description>Sepsis is a life-threatening disease that occurs as a result of the body's response to an infection. This study aims to develop a classification model for predicting patients at risk of sepsis using clinical findings and demographic information. The study was conducted using a MIMICIII dataset which is freely available as open-access data. The synthetic minority oversampling technique (SMOTE) was applied to address the imbalanced data problem in our dataset. Through preprocessing, the dataset was cleaned and missing values were imputed. Split validation was done by dividing the dataset into training and test data for developing classification models. Six algorithms including Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), KNN algorithm, and XGBoost classifier were developed. A combination of evaluation metrics was employed to evaluate the performance of the proposed models. Our dataset includes 1,552,210 entries with 44 features of critically ill patients who were admitted to the ICU. Comparing the performance of developed models using different metrics showed that the RF model had the best performance in terms of F-Measure and the area under the ROC curve. The 20 top features with high importance were determined based on the RF model. Our analysis showed that the RF model predicted sepsis with significantly higher performance in comparison to other classification models using the MIMICIII dataset. Due to the high mortality of sepsis, these kinds of studies could be supportive to prevent the side effects of the disease and lessen the risk of mortality in hospitalized patients by providing early sepsis prediction.</description><identifier>ISSN: 2352-9148</identifier><identifier>EISSN: 2352-9148</identifier><identifier>DOI: 10.1016/j.imu.2023.101236</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Classification ; ICU ; Machine learning ; Sepsis</subject><ispartof>Informatics in medicine unlocked, 2023, Vol.38, p.101236, Article 101236</ispartof><rights>2023 The Authors</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3</citedby><cites>FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3</cites><orcidid>0000-0001-6781-9342 ; 0000-0002-4982-337X ; 0000-0002-1111-0497</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S2352914823000783$$EHTML$$P50$$Gelsevier$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,3549,4024,27923,27924,27925,45780</link.rule.ids></links><search><creatorcontrib>Gholamzadeh, Marsa</creatorcontrib><creatorcontrib>Abtahi, Hamidreza</creatorcontrib><creatorcontrib>Safdari, Reza</creatorcontrib><title>Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit</title><title>Informatics in medicine unlocked</title><description>Sepsis is a life-threatening disease that occurs as a result of the body's response to an infection. This study aims to develop a classification model for predicting patients at risk of sepsis using clinical findings and demographic information. The study was conducted using a MIMICIII dataset which is freely available as open-access data. The synthetic minority oversampling technique (SMOTE) was applied to address the imbalanced data problem in our dataset. Through preprocessing, the dataset was cleaned and missing values were imputed. Split validation was done by dividing the dataset into training and test data for developing classification models. Six algorithms including Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), KNN algorithm, and XGBoost classifier were developed. A combination of evaluation metrics was employed to evaluate the performance of the proposed models. Our dataset includes 1,552,210 entries with 44 features of critically ill patients who were admitted to the ICU. Comparing the performance of developed models using different metrics showed that the RF model had the best performance in terms of F-Measure and the area under the ROC curve. The 20 top features with high importance were determined based on the RF model. Our analysis showed that the RF model predicted sepsis with significantly higher performance in comparison to other classification models using the MIMICIII dataset. Due to the high mortality of sepsis, these kinds of studies could be supportive to prevent the side effects of the disease and lessen the risk of mortality in hospitalized patients by providing early sepsis prediction.</description><subject>Classification</subject><subject>ICU</subject><subject>Machine learning</subject><subject>Sepsis</subject><issn>2352-9148</issn><issn>2352-9148</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp9kc-O3CAMxqOqlbra7gP0xgvMFAIhiXqqRv2z0kq97B05YGY8SiACZqR9gz52Sada9dSTDeb7Gftrmo-C7wUX-tN5T8tl3_JWbudW6jfNXSu7djcKNbz9J3_fPOR85pyLXsuu7-6aX4e4rJAox8CiZ468x4ShsAXsiQKyGSEFCkcG8zEmKqclsxKZnSFn8i9shUL1fWb5kle0Bd3GOcF102RcM2VGwdcK1RYUWDlhDQVDpisyCwnZJVD50LzzMGd8-Bvvm-dvX58PP3ZPP78_Hr487axshd55CX3vcHQwTlwNbptitNwKPk2TF6jdZF0_dCiU9DigHjqtpOxUZ_kAk7xvHm9YF-Fs1kQLpBcTgcyfi5iOBlIhO6NRQwuICnqurBKTGl3tPWrsQOvetbyyxI1lU8w5oX_lCW42Y8zZVGPMZoy5GVM1n28arDNeCZPJtu7PoqNUd1R_Qf9R_wZvxZkG</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Gholamzadeh, Marsa</creator><creator>Abtahi, Hamidreza</creator><creator>Safdari, Reza</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>6I.</scope><scope>AAFTH</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-6781-9342</orcidid><orcidid>https://orcid.org/0000-0002-4982-337X</orcidid><orcidid>https://orcid.org/0000-0002-1111-0497</orcidid></search><sort><creationdate>2023</creationdate><title>Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit</title><author>Gholamzadeh, Marsa ; Abtahi, Hamidreza ; Safdari, Reza</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Classification</topic><topic>ICU</topic><topic>Machine learning</topic><topic>Sepsis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gholamzadeh, Marsa</creatorcontrib><creatorcontrib>Abtahi, Hamidreza</creatorcontrib><creatorcontrib>Safdari, Reza</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>CrossRef</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Informatics in medicine unlocked</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gholamzadeh, Marsa</au><au>Abtahi, Hamidreza</au><au>Safdari, Reza</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit</atitle><jtitle>Informatics in medicine unlocked</jtitle><date>2023</date><risdate>2023</risdate><volume>38</volume><spage>101236</spage><pages>101236-</pages><artnum>101236</artnum><issn>2352-9148</issn><eissn>2352-9148</eissn><abstract>Sepsis is a life-threatening disease that occurs as a result of the body's response to an infection. This study aims to develop a classification model for predicting patients at risk of sepsis using clinical findings and demographic information. The study was conducted using a MIMICIII dataset which is freely available as open-access data. The synthetic minority oversampling technique (SMOTE) was applied to address the imbalanced data problem in our dataset. Through preprocessing, the dataset was cleaned and missing values were imputed. Split validation was done by dividing the dataset into training and test data for developing classification models. Six algorithms including Gaussian Naïve Bayes (NB), decision tree (DT), random forest (RF), logistic regression (LR), KNN algorithm, and XGBoost classifier were developed. A combination of evaluation metrics was employed to evaluate the performance of the proposed models. Our dataset includes 1,552,210 entries with 44 features of critically ill patients who were admitted to the ICU. Comparing the performance of developed models using different metrics showed that the RF model had the best performance in terms of F-Measure and the area under the ROC curve. The 20 top features with high importance were determined based on the RF model. Our analysis showed that the RF model predicted sepsis with significantly higher performance in comparison to other classification models using the MIMICIII dataset. Due to the high mortality of sepsis, these kinds of studies could be supportive to prevent the side effects of the disease and lessen the risk of mortality in hospitalized patients by providing early sepsis prediction.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.imu.2023.101236</doi><orcidid>https://orcid.org/0000-0001-6781-9342</orcidid><orcidid>https://orcid.org/0000-0002-4982-337X</orcidid><orcidid>https://orcid.org/0000-0002-1111-0497</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2352-9148
ispartof Informatics in medicine unlocked, 2023, Vol.38, p.101236, Article 101236
issn 2352-9148
2352-9148
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_482aee4a704c41b49d3a796e5a667d20
source ScienceDirect Journals
subjects Classification
ICU
Machine learning
Sepsis
title Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T06%3A39%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20different%20machine%20learning%20algorithms%20to%20classify%20patients%20suspected%20of%20having%20sepsis%20infection%20in%20the%20intensive%20care%20unit&rft.jtitle=Informatics%20in%20medicine%20unlocked&rft.au=Gholamzadeh,%20Marsa&rft.date=2023&rft.volume=38&rft.spage=101236&rft.pages=101236-&rft.artnum=101236&rft.issn=2352-9148&rft.eissn=2352-9148&rft_id=info:doi/10.1016/j.imu.2023.101236&rft_dat=%3Celsevier_doaj_%3ES2352914823000783%3C/elsevier_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c3216-f3a77de9da9b048d35759c0c10bbbf1e6dbcd785e143fe8e6856433545c08ab3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true