Loading…

The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality

To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing...

Full description

Saved in:

Bibliographic Details
Published in:	Population health metrics 2024-06, Vol.22 (1), p.13-13
Main Authors:	Hurst, Mackenzie, O'Neill, Meghan, Pagalan, Lief, Diemert, Lori M, Rosella, Laura C
Format:	Article
Language:	English
Subjects:	Adult Analysis Data Interpretation, Statistical Female Humans Imputation methods Male Middle Aged Missing data Missing observations (Statistics) Models, Statistical Mortality, Premature Perforamance measures Population health Prediction model Prediction models Risk Assessment - methods
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	13
container_issue	1
container_start_page	13
container_title	Population health metrics
container_volume	22
creator	Hurst, Mackenzie O'Neill, Meghan Pagalan, Lief Diemert, Lori M Rosella, Laura C
description	To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing values. Six performance measures were compared to access predictive accuracy (Nagelkerke R , integrated brier score), discrimination (Harrell's c-index, discrimination slope) and calibration (calibration in the large, calibration slope). The highest proportion of missingness for a single variable was 10.86% for the female model and 8.24% for the male model. Comparing the performance measures for complete case, mode, single and multiple imputation: the Nagelkerke R values for the female model was 0.1084, 0.1116, 0.1120 and 0.111-0.1120 with the male model exhibited similar variation of 0.1050, 0.1078, 0.1078 and 0.1078-0.1081. Harrell's c-index also demonstrated small variation with values of 0.8666, 0.8719, 0.8719 and 0.8711-0.8719 for the female model and 0.8549, 0.8548, 0.8550 and 0.8550-0.8553 for the male model. In the scenarios examined in this study, mode imputation performed well when using a population health survey compared to single and multiple imputation when predictive performance measures is the main model goal. To generate unbiased hazard ratios, multiple imputation methods were superior. This study shows the need to consider the best imputation approach for a predictive model development given the conditions of missing data and the goals of the analysis.
doi_str_mv	10.1186/s12963-024-00331-3
format	article
fullrecord	<record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_cc2d9200c8b048c4ab83dbcbfffa0925</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A798107186</galeid><doaj_id>oai_doaj_org_article_cc2d9200c8b048c4ab83dbcbfffa0925</doaj_id><sourcerecordid>A798107186</sourcerecordid><originalsourceid>FETCH-LOGICAL-d408t-93c684b2d3322660f8c373dc2de7028af4441c3f0183dc61563cd12635b7d6123</originalsourceid><addsrcrecordid>eNptkstu1DAUhiMEohd4ARbIEhu6SPEticOmqiouI1VCgrKOHPt4xiWJg-2g9il4Zc7MFNSRkBc--v3_n3yOXRSvGD1nTNXvEuNtLUrKZUmpEKwUT4pjJhtVNm0lnz6qj4qTlG4p5Ryl58WRUErVjZTHxe-bDRA_ztpkEhyx3jmIMOWttmSdfZjICHkTbCJYQsp-1BkS0ZMlY7AwkBmiC3HUk4H3KBO40-M8AFmSn9ZEk-jTDzJHsN7scbsURrYiwpYIqMWsB5_vXxTPnB4SvHzYT4vvHz_cXH0ur798Wl1dXpdWUpXLVphayZ5bITiva-qUEY2whltoKFfaSSmZEY4yhWrNqloYy3gtqr6xNePitFjtuTbo226O2FW874L23U4Icd3pmL0ZoDNIbTmlRvVUKiN1j8ze9M45TVteIetiz5qXfgRrcHxRDwfQw5PJb7p1-NUxfEZW7QhvHwgx_FxwyN3ok4Fh0BOEJXWCNrRpRcUVWt_srWuNd_OTC4g0W3t32bSK0Qa_BrrO_-PCZWH0JkzgPOoHgbODAHoy3OW1XlLqVt--HnpfP-73X6N_f5X4A19P0Hc</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3070793528</pqid></control><display><type>article</type><title>The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Hurst, Mackenzie ; O'Neill, Meghan ; Pagalan, Lief ; Diemert, Lori M ; Rosella, Laura C</creator><creatorcontrib>Hurst, Mackenzie ; O'Neill, Meghan ; Pagalan, Lief ; Diemert, Lori M ; Rosella, Laura C</creatorcontrib><description>To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing values. Six performance measures were compared to access predictive accuracy (Nagelkerke R , integrated brier score), discrimination (Harrell's c-index, discrimination slope) and calibration (calibration in the large, calibration slope). The highest proportion of missingness for a single variable was 10.86% for the female model and 8.24% for the male model. Comparing the performance measures for complete case, mode, single and multiple imputation: the Nagelkerke R values for the female model was 0.1084, 0.1116, 0.1120 and 0.111-0.1120 with the male model exhibited similar variation of 0.1050, 0.1078, 0.1078 and 0.1078-0.1081. Harrell's c-index also demonstrated small variation with values of 0.8666, 0.8719, 0.8719 and 0.8711-0.8719 for the female model and 0.8549, 0.8548, 0.8550 and 0.8550-0.8553 for the male model. In the scenarios examined in this study, mode imputation performed well when using a population health survey compared to single and multiple imputation when predictive performance measures is the main model goal. To generate unbiased hazard ratios, multiple imputation methods were superior. This study shows the need to consider the best imputation approach for a predictive model development given the conditions of missing data and the goals of the analysis.</description><identifier>ISSN: 1478-7954</identifier><identifier>EISSN: 1478-7954</identifier><identifier>DOI: 10.1186/s12963-024-00331-3</identifier><identifier>PMID: 38886744</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Adult ; Analysis ; Data Interpretation, Statistical ; Female ; Humans ; Imputation methods ; Male ; Middle Aged ; Missing data ; Missing observations (Statistics) ; Models, Statistical ; Mortality, Premature ; Perforamance measures ; Population health ; Prediction model ; Prediction models ; Risk Assessment - methods</subject><ispartof>Population health metrics, 2024-06, Vol.22 (1), p.13-13</ispartof><rights>2024. The Author(s).</rights><rights>COPYRIGHT 2024 BioMed Central Ltd.</rights><rights>The Author(s) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11181525/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC11181525/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,37013,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/38886744$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Hurst, Mackenzie</creatorcontrib><creatorcontrib>O'Neill, Meghan</creatorcontrib><creatorcontrib>Pagalan, Lief</creatorcontrib><creatorcontrib>Diemert, Lori M</creatorcontrib><creatorcontrib>Rosella, Laura C</creatorcontrib><title>The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality</title><title>Population health metrics</title><addtitle>Popul Health Metr</addtitle><description>To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing values. Six performance measures were compared to access predictive accuracy (Nagelkerke R , integrated brier score), discrimination (Harrell's c-index, discrimination slope) and calibration (calibration in the large, calibration slope). The highest proportion of missingness for a single variable was 10.86% for the female model and 8.24% for the male model. Comparing the performance measures for complete case, mode, single and multiple imputation: the Nagelkerke R values for the female model was 0.1084, 0.1116, 0.1120 and 0.111-0.1120 with the male model exhibited similar variation of 0.1050, 0.1078, 0.1078 and 0.1078-0.1081. Harrell's c-index also demonstrated small variation with values of 0.8666, 0.8719, 0.8719 and 0.8711-0.8719 for the female model and 0.8549, 0.8548, 0.8550 and 0.8550-0.8553 for the male model. In the scenarios examined in this study, mode imputation performed well when using a population health survey compared to single and multiple imputation when predictive performance measures is the main model goal. To generate unbiased hazard ratios, multiple imputation methods were superior. This study shows the need to consider the best imputation approach for a predictive model development given the conditions of missing data and the goals of the analysis.</description><subject>Adult</subject><subject>Analysis</subject><subject>Data Interpretation, Statistical</subject><subject>Female</subject><subject>Humans</subject><subject>Imputation methods</subject><subject>Male</subject><subject>Middle Aged</subject><subject>Missing data</subject><subject>Missing observations (Statistics)</subject><subject>Models, Statistical</subject><subject>Mortality, Premature</subject><subject>Perforamance measures</subject><subject>Population health</subject><subject>Prediction model</subject><subject>Prediction models</subject><subject>Risk Assessment - methods</subject><issn>1478-7954</issn><issn>1478-7954</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNptkstu1DAUhiMEohd4ARbIEhu6SPEticOmqiouI1VCgrKOHPt4xiWJg-2g9il4Zc7MFNSRkBc--v3_n3yOXRSvGD1nTNXvEuNtLUrKZUmpEKwUT4pjJhtVNm0lnz6qj4qTlG4p5Ryl58WRUErVjZTHxe-bDRA_ztpkEhyx3jmIMOWttmSdfZjICHkTbCJYQsp-1BkS0ZMlY7AwkBmiC3HUk4H3KBO40-M8AFmSn9ZEk-jTDzJHsN7scbsURrYiwpYIqMWsB5_vXxTPnB4SvHzYT4vvHz_cXH0ur798Wl1dXpdWUpXLVphayZ5bITiva-qUEY2whltoKFfaSSmZEY4yhWrNqloYy3gtqr6xNePitFjtuTbo226O2FW874L23U4Icd3pmL0ZoDNIbTmlRvVUKiN1j8ze9M45TVteIetiz5qXfgRrcHxRDwfQw5PJb7p1-NUxfEZW7QhvHwgx_FxwyN3ok4Fh0BOEJXWCNrRpRcUVWt_srWuNd_OTC4g0W3t32bSK0Qa_BrrO_-PCZWH0JkzgPOoHgbODAHoy3OW1XlLqVt--HnpfP-73X6N_f5X4A19P0Hc</recordid><startdate>20240617</startdate><enddate>20240617</enddate><creator>Hurst, Mackenzie</creator><creator>O'Neill, Meghan</creator><creator>Pagalan, Lief</creator><creator>Diemert, Lori M</creator><creator>Rosella, Laura C</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>ISR</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20240617</creationdate><title>The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality</title><author>Hurst, Mackenzie ; O'Neill, Meghan ; Pagalan, Lief ; Diemert, Lori M ; Rosella, Laura C</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-d408t-93c684b2d3322660f8c373dc2de7028af4441c3f0183dc61563cd12635b7d6123</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adult</topic><topic>Analysis</topic><topic>Data Interpretation, Statistical</topic><topic>Female</topic><topic>Humans</topic><topic>Imputation methods</topic><topic>Male</topic><topic>Middle Aged</topic><topic>Missing data</topic><topic>Missing observations (Statistics)</topic><topic>Models, Statistical</topic><topic>Mortality, Premature</topic><topic>Perforamance measures</topic><topic>Population health</topic><topic>Prediction model</topic><topic>Prediction models</topic><topic>Risk Assessment - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hurst, Mackenzie</creatorcontrib><creatorcontrib>O'Neill, Meghan</creatorcontrib><creatorcontrib>Pagalan, Lief</creatorcontrib><creatorcontrib>Diemert, Lori M</creatorcontrib><creatorcontrib>Rosella, Laura C</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>Gale In Context: Science</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Population health metrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hurst, Mackenzie</au><au>O'Neill, Meghan</au><au>Pagalan, Lief</au><au>Diemert, Lori M</au><au>Rosella, Laura C</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality</atitle><jtitle>Population health metrics</jtitle><addtitle>Popul Health Metr</addtitle><date>2024-06-17</date><risdate>2024</risdate><volume>22</volume><issue>1</issue><spage>13</spage><epage>13</epage><pages>13-13</pages><issn>1478-7954</issn><eissn>1478-7954</eissn><abstract>To compare how different imputation methods affect the estimates and performance of a prediction model for premature mortality. Sex-specific Weibull accelerated failure time survival models were run on four separate datasets using complete case, mode, single and multiple imputation to impute missing values. Six performance measures were compared to access predictive accuracy (Nagelkerke R , integrated brier score), discrimination (Harrell's c-index, discrimination slope) and calibration (calibration in the large, calibration slope). The highest proportion of missingness for a single variable was 10.86% for the female model and 8.24% for the male model. Comparing the performance measures for complete case, mode, single and multiple imputation: the Nagelkerke R values for the female model was 0.1084, 0.1116, 0.1120 and 0.111-0.1120 with the male model exhibited similar variation of 0.1050, 0.1078, 0.1078 and 0.1078-0.1081. Harrell's c-index also demonstrated small variation with values of 0.8666, 0.8719, 0.8719 and 0.8711-0.8719 for the female model and 0.8549, 0.8548, 0.8550 and 0.8550-0.8553 for the male model. In the scenarios examined in this study, mode imputation performed well when using a population health survey compared to single and multiple imputation when predictive performance measures is the main model goal. To generate unbiased hazard ratios, multiple imputation methods were superior. This study shows the need to consider the best imputation approach for a predictive model development given the conditions of missing data and the goals of the analysis.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>38886744</pmid><doi>10.1186/s12963-024-00331-3</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1478-7954
ispartof	Population health metrics, 2024-06, Vol.22 (1), p.13-13
issn	1478-7954 1478-7954
language	eng
recordid	cdi_doaj_primary_oai_doaj_org_article_cc2d9200c8b048c4ab83dbcbfffa0925
source	Publicly Available Content Database; PubMed Central
subjects	Adult Analysis Data Interpretation, Statistical Female Humans Imputation methods Male Middle Aged Missing data Missing observations (Statistics) Models, Statistical Mortality, Premature Perforamance measures Population health Prediction model Prediction models Risk Assessment - methods
title	The impact of different imputation methods on estimates and model performance: an example using a risk prediction model for premature mortality
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T20%3A33%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20impact%20of%20different%20imputation%20methods%20on%20estimates%20and%20model%20performance:%20an%20example%20using%20a%20risk%20prediction%20model%20for%20premature%20mortality&rft.jtitle=Population%20health%20metrics&rft.au=Hurst,%20Mackenzie&rft.date=2024-06-17&rft.volume=22&rft.issue=1&rft.spage=13&rft.epage=13&rft.pages=13-13&rft.issn=1478-7954&rft.eissn=1478-7954&rft_id=info:doi/10.1186/s12963-024-00331-3&rft_dat=%3Cgale_doaj_%3EA798107186%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-d408t-93c684b2d3322660f8c373dc2de7028af4441c3f0183dc61563cd12635b7d6123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3070793528&rft_id=info:pmid/38886744&rft_galeid=A798107186&rfr_iscdi=true