Loading…
Imputation of missing values for electronic health record laboratory data
Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geis...
Saved in:
Published in: | NPJ digital medicine 2021-10, Vol.4 (1), p.147-147, Article 147 |
---|---|
Main Authors: | , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093 |
---|---|
cites | cdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093 |
container_end_page | 147 |
container_issue | 1 |
container_start_page | 147 |
container_title | NPJ digital medicine |
container_volume | 4 |
creator | Li, Jiang Yan, Xiaowei S. Chaudhary, Durgesh Avula, Venkatesh Mudiganti, Satish Husby, Hannah Shahjouei, Shima Afshar, Ardavan Stewart, Walter F. Yeasin, Mohammed Zand, Ramin Abedi, Vida |
description | Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was
not at random
and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method. |
doi_str_mv | 10.1038/s41746-021-00518-0 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_4e6264ab5c08484c90c458f0c54c9f7a</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_4e6264ab5c08484c90c458f0c54c9f7a</doaj_id><sourcerecordid>2581285822</sourcerecordid><originalsourceid>FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</originalsourceid><addsrcrecordid>eNp9kU9r3DAQxU1oSUKaL9CToZde3I5kSZYuhRL6ZyHQS3IWY3m068VrbSU5kG9fbRzapoeeNGje-zG8V1VvGXxg0OqPSbBOqAY4awAk0w2cVZe8NbpRreSv_povquuU9gDAQWgj1Hl10Yqy6BRcVpvN4bhkzGOY6-Drw5jSOG_rB5wWSrUPsaaJXI5hHl29I5zyro7kQhzqCfsQMYf4WA-Y8U312uOU6Pr5varuv365u_ne3P74trn5fNs4ybrcyA7dMHTtoBjzHbXYm64Hg54bVI56FNyT9Mh6IiGUZAo6jcJoo4w3YNqrarNyh4B7e4zjAeOjDTjap48QtxZjHt1EVpDiSmAvHWihhTPghNQenCyz77CwPq2s49IfaHA054jTC-jLzTzu7DY8WC1BCsEK4P0zIIafJbFsS4KOpglnCkuyXGrGtdScF-m7f6T7sMS5RHVSgWZMtycVX1UuhpQi-d_HMLCn4u1avC3F26fiLRRTu5pSEc9bin_Q_3H9Ahiurzw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580811832</pqid></control><display><type>article</type><title>Imputation of missing values for electronic health record laboratory data</title><source>Open Access: PubMed Central</source><source>Publicly Available Content (ProQuest)</source><source>Alma/SFX Local Collection</source><source>Coronavirus Research Database</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Li, Jiang ; Yan, Xiaowei S. ; Chaudhary, Durgesh ; Avula, Venkatesh ; Mudiganti, Satish ; Husby, Hannah ; Shahjouei, Shima ; Afshar, Ardavan ; Stewart, Walter F. ; Yeasin, Mohammed ; Zand, Ramin ; Abedi, Vida</creator><creatorcontrib>Li, Jiang ; Yan, Xiaowei S. ; Chaudhary, Durgesh ; Avula, Venkatesh ; Mudiganti, Satish ; Husby, Hannah ; Shahjouei, Shima ; Afshar, Ardavan ; Stewart, Walter F. ; Yeasin, Mohammed ; Zand, Ramin ; Abedi, Vida</creatorcontrib><description>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was
not at random
and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.</description><identifier>ISSN: 2398-6352</identifier><identifier>EISSN: 2398-6352</identifier><identifier>DOI: 10.1038/s41746-021-00518-0</identifier><identifier>PMID: 34635760</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>631/114/1305 ; 631/114/1314 ; 692/699/75 ; 692/700/139/1420 ; Algorithms ; Biomedicine ; Biotechnology ; Comorbidity ; Data analysis ; Digital technology ; Electronic health records ; Heart failure ; Ischemia ; Medical laboratories ; Medicine ; Medicine & Public Health ; Stroke ; Variables</subject><ispartof>NPJ digital medicine, 2021-10, Vol.4 (1), p.147-147, Article 147</ispartof><rights>The Author(s) 2021</rights><rights>The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</citedby><cites>FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</cites><orcidid>0000-0001-8809-5582 ; 0000-0001-5141-1502 ; 0000-0001-7689-933X ; 0000-0003-2776-8952 ; 0000-0002-9477-0094 ; 0000-0002-7006-1285</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505441/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2580811832?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,882,25734,27905,27906,36993,36994,38497,43876,44571,53772,53774</link.rule.ids></links><search><creatorcontrib>Li, Jiang</creatorcontrib><creatorcontrib>Yan, Xiaowei S.</creatorcontrib><creatorcontrib>Chaudhary, Durgesh</creatorcontrib><creatorcontrib>Avula, Venkatesh</creatorcontrib><creatorcontrib>Mudiganti, Satish</creatorcontrib><creatorcontrib>Husby, Hannah</creatorcontrib><creatorcontrib>Shahjouei, Shima</creatorcontrib><creatorcontrib>Afshar, Ardavan</creatorcontrib><creatorcontrib>Stewart, Walter F.</creatorcontrib><creatorcontrib>Yeasin, Mohammed</creatorcontrib><creatorcontrib>Zand, Ramin</creatorcontrib><creatorcontrib>Abedi, Vida</creatorcontrib><title>Imputation of missing values for electronic health record laboratory data</title><title>NPJ digital medicine</title><addtitle>npj Digit. Med</addtitle><description>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was
not at random
and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.</description><subject>631/114/1305</subject><subject>631/114/1314</subject><subject>692/699/75</subject><subject>692/700/139/1420</subject><subject>Algorithms</subject><subject>Biomedicine</subject><subject>Biotechnology</subject><subject>Comorbidity</subject><subject>Data analysis</subject><subject>Digital technology</subject><subject>Electronic health records</subject><subject>Heart failure</subject><subject>Ischemia</subject><subject>Medical laboratories</subject><subject>Medicine</subject><subject>Medicine & Public Health</subject><subject>Stroke</subject><subject>Variables</subject><issn>2398-6352</issn><issn>2398-6352</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kU9r3DAQxU1oSUKaL9CToZde3I5kSZYuhRL6ZyHQS3IWY3m068VrbSU5kG9fbRzapoeeNGje-zG8V1VvGXxg0OqPSbBOqAY4awAk0w2cVZe8NbpRreSv_povquuU9gDAQWgj1Hl10Yqy6BRcVpvN4bhkzGOY6-Drw5jSOG_rB5wWSrUPsaaJXI5hHl29I5zyro7kQhzqCfsQMYf4WA-Y8U312uOU6Pr5varuv365u_ne3P74trn5fNs4ybrcyA7dMHTtoBjzHbXYm64Hg54bVI56FNyT9Mh6IiGUZAo6jcJoo4w3YNqrarNyh4B7e4zjAeOjDTjap48QtxZjHt1EVpDiSmAvHWihhTPghNQenCyz77CwPq2s49IfaHA054jTC-jLzTzu7DY8WC1BCsEK4P0zIIafJbFsS4KOpglnCkuyXGrGtdScF-m7f6T7sMS5RHVSgWZMtycVX1UuhpQi-d_HMLCn4u1avC3F26fiLRRTu5pSEc9bin_Q_3H9Ahiurzw</recordid><startdate>20211011</startdate><enddate>20211011</enddate><creator>Li, Jiang</creator><creator>Yan, Xiaowei S.</creator><creator>Chaudhary, Durgesh</creator><creator>Avula, Venkatesh</creator><creator>Mudiganti, Satish</creator><creator>Husby, Hannah</creator><creator>Shahjouei, Shima</creator><creator>Afshar, Ardavan</creator><creator>Stewart, Walter F.</creator><creator>Yeasin, Mohammed</creator><creator>Zand, Ramin</creator><creator>Abedi, Vida</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7X7</scope><scope>7XB</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>NAPCQ</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8809-5582</orcidid><orcidid>https://orcid.org/0000-0001-5141-1502</orcidid><orcidid>https://orcid.org/0000-0001-7689-933X</orcidid><orcidid>https://orcid.org/0000-0003-2776-8952</orcidid><orcidid>https://orcid.org/0000-0002-9477-0094</orcidid><orcidid>https://orcid.org/0000-0002-7006-1285</orcidid></search><sort><creationdate>20211011</creationdate><title>Imputation of missing values for electronic health record laboratory data</title><author>Li, Jiang ; Yan, Xiaowei S. ; Chaudhary, Durgesh ; Avula, Venkatesh ; Mudiganti, Satish ; Husby, Hannah ; Shahjouei, Shima ; Afshar, Ardavan ; Stewart, Walter F. ; Yeasin, Mohammed ; Zand, Ramin ; Abedi, Vida</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>631/114/1305</topic><topic>631/114/1314</topic><topic>692/699/75</topic><topic>692/700/139/1420</topic><topic>Algorithms</topic><topic>Biomedicine</topic><topic>Biotechnology</topic><topic>Comorbidity</topic><topic>Data analysis</topic><topic>Digital technology</topic><topic>Electronic health records</topic><topic>Heart failure</topic><topic>Ischemia</topic><topic>Medical laboratories</topic><topic>Medicine</topic><topic>Medicine & Public Health</topic><topic>Stroke</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jiang</creatorcontrib><creatorcontrib>Yan, Xiaowei S.</creatorcontrib><creatorcontrib>Chaudhary, Durgesh</creatorcontrib><creatorcontrib>Avula, Venkatesh</creatorcontrib><creatorcontrib>Mudiganti, Satish</creatorcontrib><creatorcontrib>Husby, Hannah</creatorcontrib><creatorcontrib>Shahjouei, Shima</creatorcontrib><creatorcontrib>Afshar, Ardavan</creatorcontrib><creatorcontrib>Stewart, Walter F.</creatorcontrib><creatorcontrib>Yeasin, Mohammed</creatorcontrib><creatorcontrib>Zand, Ramin</creatorcontrib><creatorcontrib>Abedi, Vida</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Proquest Nursing & Allied Health Source</collection><collection>ProQuest - Health & Medical Complete保健、医学与药学数据库</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Nursing & Allied Health Premium</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>NPJ digital medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Jiang</au><au>Yan, Xiaowei S.</au><au>Chaudhary, Durgesh</au><au>Avula, Venkatesh</au><au>Mudiganti, Satish</au><au>Husby, Hannah</au><au>Shahjouei, Shima</au><au>Afshar, Ardavan</au><au>Stewart, Walter F.</au><au>Yeasin, Mohammed</au><au>Zand, Ramin</au><au>Abedi, Vida</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imputation of missing values for electronic health record laboratory data</atitle><jtitle>NPJ digital medicine</jtitle><stitle>npj Digit. Med</stitle><date>2021-10-11</date><risdate>2021</risdate><volume>4</volume><issue>1</issue><spage>147</spage><epage>147</epage><pages>147-147</pages><artnum>147</artnum><issn>2398-6352</issn><eissn>2398-6352</eissn><abstract>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was
not at random
and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>34635760</pmid><doi>10.1038/s41746-021-00518-0</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-8809-5582</orcidid><orcidid>https://orcid.org/0000-0001-5141-1502</orcidid><orcidid>https://orcid.org/0000-0001-7689-933X</orcidid><orcidid>https://orcid.org/0000-0003-2776-8952</orcidid><orcidid>https://orcid.org/0000-0002-9477-0094</orcidid><orcidid>https://orcid.org/0000-0002-7006-1285</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2398-6352 |
ispartof | NPJ digital medicine, 2021-10, Vol.4 (1), p.147-147, Article 147 |
issn | 2398-6352 2398-6352 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_4e6264ab5c08484c90c458f0c54c9f7a |
source | Open Access: PubMed Central; Publicly Available Content (ProQuest); Alma/SFX Local Collection; Coronavirus Research Database; Springer Nature - nature.com Journals - Fully Open Access |
subjects | 631/114/1305 631/114/1314 692/699/75 692/700/139/1420 Algorithms Biomedicine Biotechnology Comorbidity Data analysis Digital technology Electronic health records Heart failure Ischemia Medical laboratories Medicine Medicine & Public Health Stroke Variables |
title | Imputation of missing values for electronic health record laboratory data |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T10%3A21%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imputation%20of%20missing%20values%20for%20electronic%20health%20record%20laboratory%20data&rft.jtitle=NPJ%20digital%20medicine&rft.au=Li,%20Jiang&rft.date=2021-10-11&rft.volume=4&rft.issue=1&rft.spage=147&rft.epage=147&rft.pages=147-147&rft.artnum=147&rft.issn=2398-6352&rft.eissn=2398-6352&rft_id=info:doi/10.1038/s41746-021-00518-0&rft_dat=%3Cproquest_doaj_%3E2581285822%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2580811832&rft_id=info:pmid/34635760&rfr_iscdi=true |