Loading…

Imputation of missing values for electronic health record laboratory data

Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geis...

Full description

Saved in:
Bibliographic Details
Published in:NPJ digital medicine 2021-10, Vol.4 (1), p.147-147, Article 147
Main Authors: Li, Jiang, Yan, Xiaowei S., Chaudhary, Durgesh, Avula, Venkatesh, Mudiganti, Satish, Husby, Hannah, Shahjouei, Shima, Afshar, Ardavan, Stewart, Walter F., Yeasin, Mohammed, Zand, Ramin, Abedi, Vida
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093
cites cdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093
container_end_page 147
container_issue 1
container_start_page 147
container_title NPJ digital medicine
container_volume 4
creator Li, Jiang
Yan, Xiaowei S.
Chaudhary, Durgesh
Avula, Venkatesh
Mudiganti, Satish
Husby, Hannah
Shahjouei, Shima
Afshar, Ardavan
Stewart, Walter F.
Yeasin, Mohammed
Zand, Ramin
Abedi, Vida
description Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was not at random and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.
doi_str_mv 10.1038/s41746-021-00518-0
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_4e6264ab5c08484c90c458f0c54c9f7a</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_4e6264ab5c08484c90c458f0c54c9f7a</doaj_id><sourcerecordid>2581285822</sourcerecordid><originalsourceid>FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</originalsourceid><addsrcrecordid>eNp9kU9r3DAQxU1oSUKaL9CToZde3I5kSZYuhRL6ZyHQS3IWY3m068VrbSU5kG9fbRzapoeeNGje-zG8V1VvGXxg0OqPSbBOqAY4awAk0w2cVZe8NbpRreSv_povquuU9gDAQWgj1Hl10Yqy6BRcVpvN4bhkzGOY6-Drw5jSOG_rB5wWSrUPsaaJXI5hHl29I5zyro7kQhzqCfsQMYf4WA-Y8U312uOU6Pr5varuv365u_ne3P74trn5fNs4ybrcyA7dMHTtoBjzHbXYm64Hg54bVI56FNyT9Mh6IiGUZAo6jcJoo4w3YNqrarNyh4B7e4zjAeOjDTjap48QtxZjHt1EVpDiSmAvHWihhTPghNQenCyz77CwPq2s49IfaHA054jTC-jLzTzu7DY8WC1BCsEK4P0zIIafJbFsS4KOpglnCkuyXGrGtdScF-m7f6T7sMS5RHVSgWZMtycVX1UuhpQi-d_HMLCn4u1avC3F26fiLRRTu5pSEc9bin_Q_3H9Ahiurzw</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2580811832</pqid></control><display><type>article</type><title>Imputation of missing values for electronic health record laboratory data</title><source>Open Access: PubMed Central</source><source>Publicly Available Content (ProQuest)</source><source>Alma/SFX Local Collection</source><source>Coronavirus Research Database</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Li, Jiang ; Yan, Xiaowei S. ; Chaudhary, Durgesh ; Avula, Venkatesh ; Mudiganti, Satish ; Husby, Hannah ; Shahjouei, Shima ; Afshar, Ardavan ; Stewart, Walter F. ; Yeasin, Mohammed ; Zand, Ramin ; Abedi, Vida</creator><creatorcontrib>Li, Jiang ; Yan, Xiaowei S. ; Chaudhary, Durgesh ; Avula, Venkatesh ; Mudiganti, Satish ; Husby, Hannah ; Shahjouei, Shima ; Afshar, Ardavan ; Stewart, Walter F. ; Yeasin, Mohammed ; Zand, Ramin ; Abedi, Vida</creatorcontrib><description>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was not at random and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.</description><identifier>ISSN: 2398-6352</identifier><identifier>EISSN: 2398-6352</identifier><identifier>DOI: 10.1038/s41746-021-00518-0</identifier><identifier>PMID: 34635760</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>631/114/1305 ; 631/114/1314 ; 692/699/75 ; 692/700/139/1420 ; Algorithms ; Biomedicine ; Biotechnology ; Comorbidity ; Data analysis ; Digital technology ; Electronic health records ; Heart failure ; Ischemia ; Medical laboratories ; Medicine ; Medicine &amp; Public Health ; Stroke ; Variables</subject><ispartof>NPJ digital medicine, 2021-10, Vol.4 (1), p.147-147, Article 147</ispartof><rights>The Author(s) 2021</rights><rights>The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</citedby><cites>FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</cites><orcidid>0000-0001-8809-5582 ; 0000-0001-5141-1502 ; 0000-0001-7689-933X ; 0000-0003-2776-8952 ; 0000-0002-9477-0094 ; 0000-0002-7006-1285</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC8505441/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2580811832?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,724,777,781,882,25734,27905,27906,36993,36994,38497,43876,44571,53772,53774</link.rule.ids></links><search><creatorcontrib>Li, Jiang</creatorcontrib><creatorcontrib>Yan, Xiaowei S.</creatorcontrib><creatorcontrib>Chaudhary, Durgesh</creatorcontrib><creatorcontrib>Avula, Venkatesh</creatorcontrib><creatorcontrib>Mudiganti, Satish</creatorcontrib><creatorcontrib>Husby, Hannah</creatorcontrib><creatorcontrib>Shahjouei, Shima</creatorcontrib><creatorcontrib>Afshar, Ardavan</creatorcontrib><creatorcontrib>Stewart, Walter F.</creatorcontrib><creatorcontrib>Yeasin, Mohammed</creatorcontrib><creatorcontrib>Zand, Ramin</creatorcontrib><creatorcontrib>Abedi, Vida</creatorcontrib><title>Imputation of missing values for electronic health record laboratory data</title><title>NPJ digital medicine</title><addtitle>npj Digit. Med</addtitle><description>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was not at random and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.</description><subject>631/114/1305</subject><subject>631/114/1314</subject><subject>692/699/75</subject><subject>692/700/139/1420</subject><subject>Algorithms</subject><subject>Biomedicine</subject><subject>Biotechnology</subject><subject>Comorbidity</subject><subject>Data analysis</subject><subject>Digital technology</subject><subject>Electronic health records</subject><subject>Heart failure</subject><subject>Ischemia</subject><subject>Medical laboratories</subject><subject>Medicine</subject><subject>Medicine &amp; Public Health</subject><subject>Stroke</subject><subject>Variables</subject><issn>2398-6352</issn><issn>2398-6352</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>COVID</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kU9r3DAQxU1oSUKaL9CToZde3I5kSZYuhRL6ZyHQS3IWY3m068VrbSU5kG9fbRzapoeeNGje-zG8V1VvGXxg0OqPSbBOqAY4awAk0w2cVZe8NbpRreSv_povquuU9gDAQWgj1Hl10Yqy6BRcVpvN4bhkzGOY6-Drw5jSOG_rB5wWSrUPsaaJXI5hHl29I5zyro7kQhzqCfsQMYf4WA-Y8U312uOU6Pr5varuv365u_ne3P74trn5fNs4ybrcyA7dMHTtoBjzHbXYm64Hg54bVI56FNyT9Mh6IiGUZAo6jcJoo4w3YNqrarNyh4B7e4zjAeOjDTjap48QtxZjHt1EVpDiSmAvHWihhTPghNQenCyz77CwPq2s49IfaHA054jTC-jLzTzu7DY8WC1BCsEK4P0zIIafJbFsS4KOpglnCkuyXGrGtdScF-m7f6T7sMS5RHVSgWZMtycVX1UuhpQi-d_HMLCn4u1avC3F26fiLRRTu5pSEc9bin_Q_3H9Ahiurzw</recordid><startdate>20211011</startdate><enddate>20211011</enddate><creator>Li, Jiang</creator><creator>Yan, Xiaowei S.</creator><creator>Chaudhary, Durgesh</creator><creator>Avula, Venkatesh</creator><creator>Mudiganti, Satish</creator><creator>Husby, Hannah</creator><creator>Shahjouei, Shima</creator><creator>Afshar, Ardavan</creator><creator>Stewart, Walter F.</creator><creator>Yeasin, Mohammed</creator><creator>Zand, Ramin</creator><creator>Abedi, Vida</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7RV</scope><scope>7X7</scope><scope>7XB</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>COVID</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>NAPCQ</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-8809-5582</orcidid><orcidid>https://orcid.org/0000-0001-5141-1502</orcidid><orcidid>https://orcid.org/0000-0001-7689-933X</orcidid><orcidid>https://orcid.org/0000-0003-2776-8952</orcidid><orcidid>https://orcid.org/0000-0002-9477-0094</orcidid><orcidid>https://orcid.org/0000-0002-7006-1285</orcidid></search><sort><creationdate>20211011</creationdate><title>Imputation of missing values for electronic health record laboratory data</title><author>Li, Jiang ; Yan, Xiaowei S. ; Chaudhary, Durgesh ; Avula, Venkatesh ; Mudiganti, Satish ; Husby, Hannah ; Shahjouei, Shima ; Afshar, Ardavan ; Stewart, Walter F. ; Yeasin, Mohammed ; Zand, Ramin ; Abedi, Vida</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>631/114/1305</topic><topic>631/114/1314</topic><topic>692/699/75</topic><topic>692/700/139/1420</topic><topic>Algorithms</topic><topic>Biomedicine</topic><topic>Biotechnology</topic><topic>Comorbidity</topic><topic>Data analysis</topic><topic>Digital technology</topic><topic>Electronic health records</topic><topic>Heart failure</topic><topic>Ischemia</topic><topic>Medical laboratories</topic><topic>Medicine</topic><topic>Medicine &amp; Public Health</topic><topic>Stroke</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Jiang</creatorcontrib><creatorcontrib>Yan, Xiaowei S.</creatorcontrib><creatorcontrib>Chaudhary, Durgesh</creatorcontrib><creatorcontrib>Avula, Venkatesh</creatorcontrib><creatorcontrib>Mudiganti, Satish</creatorcontrib><creatorcontrib>Husby, Hannah</creatorcontrib><creatorcontrib>Shahjouei, Shima</creatorcontrib><creatorcontrib>Afshar, Ardavan</creatorcontrib><creatorcontrib>Stewart, Walter F.</creatorcontrib><creatorcontrib>Yeasin, Mohammed</creatorcontrib><creatorcontrib>Zand, Ramin</creatorcontrib><creatorcontrib>Abedi, Vida</creatorcontrib><collection>SpringerOpen</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Proquest Nursing &amp; Allied Health Source</collection><collection>ProQuest - Health &amp; Medical Complete保健、医学与药学数据库</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>Coronavirus Research Database</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>NPJ digital medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Jiang</au><au>Yan, Xiaowei S.</au><au>Chaudhary, Durgesh</au><au>Avula, Venkatesh</au><au>Mudiganti, Satish</au><au>Husby, Hannah</au><au>Shahjouei, Shima</au><au>Afshar, Ardavan</au><au>Stewart, Walter F.</au><au>Yeasin, Mohammed</au><au>Zand, Ramin</au><au>Abedi, Vida</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imputation of missing values for electronic health record laboratory data</atitle><jtitle>NPJ digital medicine</jtitle><stitle>npj Digit. Med</stitle><date>2021-10-11</date><risdate>2021</risdate><volume>4</volume><issue>1</issue><spage>147</spage><epage>147</epage><pages>147-147</pages><artnum>147</artnum><issn>2398-6352</issn><eissn>2398-6352</eissn><abstract>Laboratory data from Electronic Health Records (EHR) are often used in prediction models where estimation bias and model performance from missingness can be mitigated using imputation methods. We demonstrate the utility of imputation in two real-world EHR-derived cohorts of ischemic stroke from Geisinger and of heart failure from Sutter Health to: (1) characterize the patterns of missingness in laboratory variables; (2) simulate two missing mechanisms, arbitrary and monotone; (3) compare cross-sectional and multi-level multivariate missing imputation algorithms applied to laboratory data; (4) assess whether incorporation of latent information, derived from comorbidity data, can improve the performance of the algorithms. The latter was based on a case study of hemoglobin A1c under a univariate missing imputation framework. Overall, the pattern of missingness in EHR laboratory variables was not at random and was highly associated with patients’ comorbidity data; and the multi-level imputation algorithm showed smaller imputation error than the cross-sectional method.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>34635760</pmid><doi>10.1038/s41746-021-00518-0</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-8809-5582</orcidid><orcidid>https://orcid.org/0000-0001-5141-1502</orcidid><orcidid>https://orcid.org/0000-0001-7689-933X</orcidid><orcidid>https://orcid.org/0000-0003-2776-8952</orcidid><orcidid>https://orcid.org/0000-0002-9477-0094</orcidid><orcidid>https://orcid.org/0000-0002-7006-1285</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2398-6352
ispartof NPJ digital medicine, 2021-10, Vol.4 (1), p.147-147, Article 147
issn 2398-6352
2398-6352
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_4e6264ab5c08484c90c458f0c54c9f7a
source Open Access: PubMed Central; Publicly Available Content (ProQuest); Alma/SFX Local Collection; Coronavirus Research Database; Springer Nature - nature.com Journals - Fully Open Access
subjects 631/114/1305
631/114/1314
692/699/75
692/700/139/1420
Algorithms
Biomedicine
Biotechnology
Comorbidity
Data analysis
Digital technology
Electronic health records
Heart failure
Ischemia
Medical laboratories
Medicine
Medicine & Public Health
Stroke
Variables
title Imputation of missing values for electronic health record laboratory data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T10%3A21%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imputation%20of%20missing%20values%20for%20electronic%20health%20record%20laboratory%20data&rft.jtitle=NPJ%20digital%20medicine&rft.au=Li,%20Jiang&rft.date=2021-10-11&rft.volume=4&rft.issue=1&rft.spage=147&rft.epage=147&rft.pages=147-147&rft.artnum=147&rft.issn=2398-6352&rft.eissn=2398-6352&rft_id=info:doi/10.1038/s41746-021-00518-0&rft_dat=%3Cproquest_doaj_%3E2581285822%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c517t-57acdd73d611f7e3ab97b09af29a6ceba42fe5fa1bee446516078a498969f9093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2580811832&rft_id=info:pmid/34635760&rfr_iscdi=true