Loading…

Massive external validation of a machine learning algorithm to predict pulmonary embolism in hospitalized patients

Pulmonary embolism (PE) is a life-threatening condition associated with ~10% of deaths of hospitalized patients. Machine learning algorithms (MLAs) which predict the onset of pulmonary embolism (PE) could enable earlier treatment and improve patient outcomes. However, the extent to which they genera...

Full description

Saved in:
Bibliographic Details
Published in:Thrombosis research 2022-08, Vol.216, p.14-21
Main Authors: Shen, Jieru, Casie Chetty, Satish, Shokouhi, Sepideh, Maharjan, Jenish, Chuba, Yevheniy, Calvert, Jacob, Mao, Qingqing
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63
cites cdi_FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63
container_end_page 21
container_issue
container_start_page 14
container_title Thrombosis research
container_volume 216
creator Shen, Jieru
Casie Chetty, Satish
Shokouhi, Sepideh
Maharjan, Jenish
Chuba, Yevheniy
Calvert, Jacob
Mao, Qingqing
description Pulmonary embolism (PE) is a life-threatening condition associated with ~10% of deaths of hospitalized patients. Machine learning algorithms (MLAs) which predict the onset of pulmonary embolism (PE) could enable earlier treatment and improve patient outcomes. However, the extent to which they generalize to broader patient populations impacts their clinical utility. To conduct the first large-scale external validation of a machine learning–based PE prediction model which uses EHR data from the first three hours of a patient's hospital stay to predict the occurrence of PE within the next 10 days of the inpatient stay. This retrospective study included approximately two million adult hospital admissions across 44 medical institutions in the US from 2011 to 2017. Demographics, vital signs, and lab tests from adult inpatients at 12 institutions (n = 331,268; 3.3% PE positive) were used for training an XGBoost model. External validation of the model was conducted on patient populations from each of 32 medical institutions (total n = 1,660,715; 3.7% PE positive) without retraining. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC). Backward elimination regression was used to identify correlations between characteristics of the external validation sets and AUROC. The model performed well (AUROC = 0.87) on the 20% hold-out subset of the training set. Despite demographic differences between the 32 external validation populations (percent PE positive: min = 1.54%, max = 6.47%), without retraining, the model had excellent discrimination, with a mean AUROC of 0.88 (min = 0.79, max = 0.93). Fixing sensitivity at 0.80, the model had a mean specificity of 0.85 (min = 0.64, max = 0.93). Backward elimination regression identified a negative association (β = −0.015, p 
doi_str_mv 10.1016/j.thromres.2022.05.016
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2675612417</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0049384822002900</els_id><sourcerecordid>2675612417</sourcerecordid><originalsourceid>FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63</originalsourceid><addsrcrecordid>eNqFkMtu1DAUhi0EotPCK1ResklqO44vO1BFaaUiNrC2PM5JxyM7DrZnRHl6XE3Ltqsj_fovOh9Cl5T0lFBxte_rLqeYofSMMNaTsW_yG7ShSuqOccneog0hXHeD4uoMnZeyJ4RKqsf36GwYhdRiGDYof7el-CNg-FMhLzbgow1-stWnBacZWxyt2_kFcACbF788YBseUvZ1F3FNeM0weVfxeggxLTY_YojbFHyJ2C94l8rqayv8CxNeWykstXxA72YbCnx8vhfo183Xn9e33f2Pb3fXX-47x6monWZqAiaHLTAHelR8VkwKNgNVE-VMS5gJc3SapbBcK83VQLS0o3AzIcNWDBfo06l3zen3AUo10RcHIdgF0qEYJuQoKONUNqs4WV1OpWSYzZp9bN8YSswTb7M3L7zNE29DRtPkFrx83jhsI0z_Yy-Am-HzyQDt06OHbIprFFyjlsFVMyX_2sY_gQuXYA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2675612417</pqid></control><display><type>article</type><title>Massive external validation of a machine learning algorithm to predict pulmonary embolism in hospitalized patients</title><source>ScienceDirect Journals</source><creator>Shen, Jieru ; Casie Chetty, Satish ; Shokouhi, Sepideh ; Maharjan, Jenish ; Chuba, Yevheniy ; Calvert, Jacob ; Mao, Qingqing</creator><creatorcontrib>Shen, Jieru ; Casie Chetty, Satish ; Shokouhi, Sepideh ; Maharjan, Jenish ; Chuba, Yevheniy ; Calvert, Jacob ; Mao, Qingqing</creatorcontrib><description>Pulmonary embolism (PE) is a life-threatening condition associated with ~10% of deaths of hospitalized patients. Machine learning algorithms (MLAs) which predict the onset of pulmonary embolism (PE) could enable earlier treatment and improve patient outcomes. However, the extent to which they generalize to broader patient populations impacts their clinical utility. To conduct the first large-scale external validation of a machine learning–based PE prediction model which uses EHR data from the first three hours of a patient's hospital stay to predict the occurrence of PE within the next 10 days of the inpatient stay. This retrospective study included approximately two million adult hospital admissions across 44 medical institutions in the US from 2011 to 2017. Demographics, vital signs, and lab tests from adult inpatients at 12 institutions (n = 331,268; 3.3% PE positive) were used for training an XGBoost model. External validation of the model was conducted on patient populations from each of 32 medical institutions (total n = 1,660,715; 3.7% PE positive) without retraining. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC). Backward elimination regression was used to identify correlations between characteristics of the external validation sets and AUROC. The model performed well (AUROC = 0.87) on the 20% hold-out subset of the training set. Despite demographic differences between the 32 external validation populations (percent PE positive: min = 1.54%, max = 6.47%), without retraining, the model had excellent discrimination, with a mean AUROC of 0.88 (min = 0.79, max = 0.93). Fixing sensitivity at 0.80, the model had a mean specificity of 0.85 (min = 0.64, max = 0.93). Backward elimination regression identified a negative association (β = −0.015, p &lt; 0.001) between the percentage of PE positive encounters and AUROC. A PE prediction model performed remarkably well across 32 different external patient populations without retraining and despite significant differences in demographic characteristics, demonstrating its generalizability and potential as a clinical decision support tool to aid PE detection and improve patient outcomes in a clinical setting.</description><identifier>ISSN: 0049-3848</identifier><identifier>EISSN: 1879-2472</identifier><identifier>DOI: 10.1016/j.thromres.2022.05.016</identifier><identifier>PMID: 35679633</identifier><language>eng</language><publisher>United States: Elsevier Ltd</publisher><subject>External validation ; Machine learning ; Prophylaxis ; Pulmonary embolism ; Thrombosis</subject><ispartof>Thrombosis research, 2022-08, Vol.216, p.14-21</ispartof><rights>2022 The Authors</rights><rights>Copyright © 2022. Published by Elsevier Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63</citedby><cites>FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35679633$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Shen, Jieru</creatorcontrib><creatorcontrib>Casie Chetty, Satish</creatorcontrib><creatorcontrib>Shokouhi, Sepideh</creatorcontrib><creatorcontrib>Maharjan, Jenish</creatorcontrib><creatorcontrib>Chuba, Yevheniy</creatorcontrib><creatorcontrib>Calvert, Jacob</creatorcontrib><creatorcontrib>Mao, Qingqing</creatorcontrib><title>Massive external validation of a machine learning algorithm to predict pulmonary embolism in hospitalized patients</title><title>Thrombosis research</title><addtitle>Thromb Res</addtitle><description>Pulmonary embolism (PE) is a life-threatening condition associated with ~10% of deaths of hospitalized patients. Machine learning algorithms (MLAs) which predict the onset of pulmonary embolism (PE) could enable earlier treatment and improve patient outcomes. However, the extent to which they generalize to broader patient populations impacts their clinical utility. To conduct the first large-scale external validation of a machine learning–based PE prediction model which uses EHR data from the first three hours of a patient's hospital stay to predict the occurrence of PE within the next 10 days of the inpatient stay. This retrospective study included approximately two million adult hospital admissions across 44 medical institutions in the US from 2011 to 2017. Demographics, vital signs, and lab tests from adult inpatients at 12 institutions (n = 331,268; 3.3% PE positive) were used for training an XGBoost model. External validation of the model was conducted on patient populations from each of 32 medical institutions (total n = 1,660,715; 3.7% PE positive) without retraining. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC). Backward elimination regression was used to identify correlations between characteristics of the external validation sets and AUROC. The model performed well (AUROC = 0.87) on the 20% hold-out subset of the training set. Despite demographic differences between the 32 external validation populations (percent PE positive: min = 1.54%, max = 6.47%), without retraining, the model had excellent discrimination, with a mean AUROC of 0.88 (min = 0.79, max = 0.93). Fixing sensitivity at 0.80, the model had a mean specificity of 0.85 (min = 0.64, max = 0.93). Backward elimination regression identified a negative association (β = −0.015, p &lt; 0.001) between the percentage of PE positive encounters and AUROC. A PE prediction model performed remarkably well across 32 different external patient populations without retraining and despite significant differences in demographic characteristics, demonstrating its generalizability and potential as a clinical decision support tool to aid PE detection and improve patient outcomes in a clinical setting.</description><subject>External validation</subject><subject>Machine learning</subject><subject>Prophylaxis</subject><subject>Pulmonary embolism</subject><subject>Thrombosis</subject><issn>0049-3848</issn><issn>1879-2472</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqFkMtu1DAUhi0EotPCK1ResklqO44vO1BFaaUiNrC2PM5JxyM7DrZnRHl6XE3Ltqsj_fovOh9Cl5T0lFBxte_rLqeYofSMMNaTsW_yG7ShSuqOccneog0hXHeD4uoMnZeyJ4RKqsf36GwYhdRiGDYof7el-CNg-FMhLzbgow1-stWnBacZWxyt2_kFcACbF788YBseUvZ1F3FNeM0weVfxeggxLTY_YojbFHyJ2C94l8rqayv8CxNeWykstXxA72YbCnx8vhfo183Xn9e33f2Pb3fXX-47x6monWZqAiaHLTAHelR8VkwKNgNVE-VMS5gJc3SapbBcK83VQLS0o3AzIcNWDBfo06l3zen3AUo10RcHIdgF0qEYJuQoKONUNqs4WV1OpWSYzZp9bN8YSswTb7M3L7zNE29DRtPkFrx83jhsI0z_Yy-Am-HzyQDt06OHbIprFFyjlsFVMyX_2sY_gQuXYA</recordid><startdate>20220801</startdate><enddate>20220801</enddate><creator>Shen, Jieru</creator><creator>Casie Chetty, Satish</creator><creator>Shokouhi, Sepideh</creator><creator>Maharjan, Jenish</creator><creator>Chuba, Yevheniy</creator><creator>Calvert, Jacob</creator><creator>Mao, Qingqing</creator><general>Elsevier Ltd</general><scope>6I.</scope><scope>AAFTH</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope></search><sort><creationdate>20220801</creationdate><title>Massive external validation of a machine learning algorithm to predict pulmonary embolism in hospitalized patients</title><author>Shen, Jieru ; Casie Chetty, Satish ; Shokouhi, Sepideh ; Maharjan, Jenish ; Chuba, Yevheniy ; Calvert, Jacob ; Mao, Qingqing</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>External validation</topic><topic>Machine learning</topic><topic>Prophylaxis</topic><topic>Pulmonary embolism</topic><topic>Thrombosis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shen, Jieru</creatorcontrib><creatorcontrib>Casie Chetty, Satish</creatorcontrib><creatorcontrib>Shokouhi, Sepideh</creatorcontrib><creatorcontrib>Maharjan, Jenish</creatorcontrib><creatorcontrib>Chuba, Yevheniy</creatorcontrib><creatorcontrib>Calvert, Jacob</creatorcontrib><creatorcontrib>Mao, Qingqing</creatorcontrib><collection>ScienceDirect Open Access Titles</collection><collection>Elsevier:ScienceDirect:Open Access</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Thrombosis research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shen, Jieru</au><au>Casie Chetty, Satish</au><au>Shokouhi, Sepideh</au><au>Maharjan, Jenish</au><au>Chuba, Yevheniy</au><au>Calvert, Jacob</au><au>Mao, Qingqing</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Massive external validation of a machine learning algorithm to predict pulmonary embolism in hospitalized patients</atitle><jtitle>Thrombosis research</jtitle><addtitle>Thromb Res</addtitle><date>2022-08-01</date><risdate>2022</risdate><volume>216</volume><spage>14</spage><epage>21</epage><pages>14-21</pages><issn>0049-3848</issn><eissn>1879-2472</eissn><abstract>Pulmonary embolism (PE) is a life-threatening condition associated with ~10% of deaths of hospitalized patients. Machine learning algorithms (MLAs) which predict the onset of pulmonary embolism (PE) could enable earlier treatment and improve patient outcomes. However, the extent to which they generalize to broader patient populations impacts their clinical utility. To conduct the first large-scale external validation of a machine learning–based PE prediction model which uses EHR data from the first three hours of a patient's hospital stay to predict the occurrence of PE within the next 10 days of the inpatient stay. This retrospective study included approximately two million adult hospital admissions across 44 medical institutions in the US from 2011 to 2017. Demographics, vital signs, and lab tests from adult inpatients at 12 institutions (n = 331,268; 3.3% PE positive) were used for training an XGBoost model. External validation of the model was conducted on patient populations from each of 32 medical institutions (total n = 1,660,715; 3.7% PE positive) without retraining. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC). Backward elimination regression was used to identify correlations between characteristics of the external validation sets and AUROC. The model performed well (AUROC = 0.87) on the 20% hold-out subset of the training set. Despite demographic differences between the 32 external validation populations (percent PE positive: min = 1.54%, max = 6.47%), without retraining, the model had excellent discrimination, with a mean AUROC of 0.88 (min = 0.79, max = 0.93). Fixing sensitivity at 0.80, the model had a mean specificity of 0.85 (min = 0.64, max = 0.93). Backward elimination regression identified a negative association (β = −0.015, p &lt; 0.001) between the percentage of PE positive encounters and AUROC. A PE prediction model performed remarkably well across 32 different external patient populations without retraining and despite significant differences in demographic characteristics, demonstrating its generalizability and potential as a clinical decision support tool to aid PE detection and improve patient outcomes in a clinical setting.</abstract><cop>United States</cop><pub>Elsevier Ltd</pub><pmid>35679633</pmid><doi>10.1016/j.thromres.2022.05.016</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0049-3848
ispartof Thrombosis research, 2022-08, Vol.216, p.14-21
issn 0049-3848
1879-2472
language eng
recordid cdi_proquest_miscellaneous_2675612417
source ScienceDirect Journals
subjects External validation
Machine learning
Prophylaxis
Pulmonary embolism
Thrombosis
title Massive external validation of a machine learning algorithm to predict pulmonary embolism in hospitalized patients
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T01%3A21%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Massive%20external%20validation%20of%20a%20machine%20learning%20algorithm%20to%20predict%20pulmonary%20embolism%20in%20hospitalized%20patients&rft.jtitle=Thrombosis%20research&rft.au=Shen,%20Jieru&rft.date=2022-08-01&rft.volume=216&rft.spage=14&rft.epage=21&rft.pages=14-21&rft.issn=0049-3848&rft.eissn=1879-2472&rft_id=info:doi/10.1016/j.thromres.2022.05.016&rft_dat=%3Cproquest_cross%3E2675612417%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c416t-928de273be2ce9584f82762fe18d14297ef02c1df76a4989483097a56cf003b63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2675612417&rft_id=info:pmid/35679633&rfr_iscdi=true