Loading…

A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations

Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety...

Full description

Saved in:
Bibliographic Details
Published in:Statistics in medicine 2020-04, Vol.39 (8), p.1156-1166
Main Authors: Slade, Emily, Naylor, Melissa G.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3
cites cdi_FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3
container_end_page 1166
container_issue 8
container_start_page 1156
container_title Statistics in medicine
container_volume 39
creator Slade, Emily
Naylor, Melissa G.
description Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree‐based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well‐established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree‐based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree‐based imputation in MICE.
doi_str_mv 10.1002/sim.8468
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9136914</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2348797473</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3</originalsourceid><addsrcrecordid>eNp1kc9KHTEUh0Op6K0KPoEEuulmNH9mJpmNIFJbweKidR0yyUlvZGYyJjOWu-sj-Iw-iblqrQquDuR85-MXfgjtUXJACWGHyfcHsqzlB7SgpBEFYZX8iBaECVHUglZb6FNKV4RQWjGxibY4bRrBpVwge4yd9hGb0I86-hQGHByeIsDd39tWJ7BYDxbnne5hit7gPJbBJuwH3M_d5McOsO_HedKTz9ftCpul9kM-hOv54S3toA2nuwS7T3MbXZ5-_XXyvTi_-HZ2cnxemDKHKRpX87a1jgriLCeWtbAOD3nnBKuFaY0WYLWVklVCV9Ia0hAqmHVSO2r4Njp69I5z24M1MExRd2qMvtdxpYL26vVm8Ev1O9yohvK6oWUWfHkSxHA9Q5pU75OBrtMDhDkpxkspGlEKntHPb9CrMMchfy9TWSY5r-V_oYkhpQjuOQwlal2dytWpdXUZ3X8Z_hn811UGikfgj-9g9a5I_Tz78SC8B8akpgo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2369183368</pqid></control><display><type>article</type><title>A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Slade, Emily ; Naylor, Melissa G.</creator><creatorcontrib>Slade, Emily ; Naylor, Melissa G.</creatorcontrib><description>Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree‐based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well‐established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree‐based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree‐based imputation in MICE.</description><identifier>ISSN: 0277-6715</identifier><identifier>EISSN: 1097-0258</identifier><identifier>DOI: 10.1002/sim.8468</identifier><identifier>PMID: 31997388</identifier><language>eng</language><publisher>Hoboken, USA: John Wiley &amp; Sons, Inc</publisher><subject>Algorithms ; Bias ; Computer Simulation ; Data Interpretation, Statistical ; Epidemiology ; imputation ; interaction ; Medical research ; Missing data ; regression tree ; Statistical inference</subject><ispartof>Statistics in medicine, 2020-04, Vol.39 (8), p.1156-1166</ispartof><rights>2020 John Wiley &amp; Sons, Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3</citedby><cites>FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3</cites><orcidid>0000-0002-1654-3822</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31997388$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Slade, Emily</creatorcontrib><creatorcontrib>Naylor, Melissa G.</creatorcontrib><title>A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations</title><title>Statistics in medicine</title><addtitle>Stat Med</addtitle><description>Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree‐based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well‐established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree‐based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree‐based imputation in MICE.</description><subject>Algorithms</subject><subject>Bias</subject><subject>Computer Simulation</subject><subject>Data Interpretation, Statistical</subject><subject>Epidemiology</subject><subject>imputation</subject><subject>interaction</subject><subject>Medical research</subject><subject>Missing data</subject><subject>regression tree</subject><subject>Statistical inference</subject><issn>0277-6715</issn><issn>1097-0258</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp1kc9KHTEUh0Op6K0KPoEEuulmNH9mJpmNIFJbweKidR0yyUlvZGYyJjOWu-sj-Iw-iblqrQquDuR85-MXfgjtUXJACWGHyfcHsqzlB7SgpBEFYZX8iBaECVHUglZb6FNKV4RQWjGxibY4bRrBpVwge4yd9hGb0I86-hQGHByeIsDd39tWJ7BYDxbnne5hit7gPJbBJuwH3M_d5McOsO_HedKTz9ftCpul9kM-hOv54S3toA2nuwS7T3MbXZ5-_XXyvTi_-HZ2cnxemDKHKRpX87a1jgriLCeWtbAOD3nnBKuFaY0WYLWVklVCV9Ia0hAqmHVSO2r4Njp69I5z24M1MExRd2qMvtdxpYL26vVm8Ev1O9yohvK6oWUWfHkSxHA9Q5pU75OBrtMDhDkpxkspGlEKntHPb9CrMMchfy9TWSY5r-V_oYkhpQjuOQwlal2dytWpdXUZ3X8Z_hn811UGikfgj-9g9a5I_Tz78SC8B8akpgo</recordid><startdate>20200415</startdate><enddate>20200415</enddate><creator>Slade, Emily</creator><creator>Naylor, Melissa G.</creator><general>John Wiley &amp; Sons, Inc</general><general>Wiley Subscription Services, Inc</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>K9.</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-1654-3822</orcidid></search><sort><creationdate>20200415</creationdate><title>A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations</title><author>Slade, Emily ; Naylor, Melissa G.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Bias</topic><topic>Computer Simulation</topic><topic>Data Interpretation, Statistical</topic><topic>Epidemiology</topic><topic>imputation</topic><topic>interaction</topic><topic>Medical research</topic><topic>Missing data</topic><topic>regression tree</topic><topic>Statistical inference</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Slade, Emily</creatorcontrib><creatorcontrib>Naylor, Melissa G.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Statistics in medicine</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Slade, Emily</au><au>Naylor, Melissa G.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations</atitle><jtitle>Statistics in medicine</jtitle><addtitle>Stat Med</addtitle><date>2020-04-15</date><risdate>2020</risdate><volume>39</volume><issue>8</issue><spage>1156</spage><epage>1166</epage><pages>1156-1166</pages><issn>0277-6715</issn><eissn>1097-0258</eissn><abstract>Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree‐based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well‐established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree‐based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree‐based imputation in MICE.</abstract><cop>Hoboken, USA</cop><pub>John Wiley &amp; Sons, Inc</pub><pmid>31997388</pmid><doi>10.1002/sim.8468</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-1654-3822</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0277-6715
ispartof Statistics in medicine, 2020-04, Vol.39 (8), p.1156-1166
issn 0277-6715
1097-0258
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_9136914
source Wiley-Blackwell Read & Publish Collection
subjects Algorithms
Bias
Computer Simulation
Data Interpretation, Statistical
Epidemiology
imputation
interaction
Medical research
Missing data
regression tree
Statistical inference
title A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T03%3A41%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20fair%20comparison%20of%20tree%E2%80%90based%20and%20parametric%20methods%20in%20multiple%20imputation%20by%20chained%20equations&rft.jtitle=Statistics%20in%20medicine&rft.au=Slade,%20Emily&rft.date=2020-04-15&rft.volume=39&rft.issue=8&rft.spage=1156&rft.epage=1166&rft.pages=1156-1166&rft.issn=0277-6715&rft.eissn=1097-0258&rft_id=info:doi/10.1002/sim.8468&rft_dat=%3Cproquest_pubme%3E2348797473%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4388-9f63bbdf170fd30d2be6715e388f7267cbca7edad88257a58dc090172df8af1c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2369183368&rft_id=info:pmid/31997388&rfr_iscdi=true