Loading…

Review: A gentle introduction to imputation of missing values

In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out...

Full description

Saved in:
Bibliographic Details
Published in:Journal of clinical epidemiology 2006-10, Vol.59 (10), p.1087-1091
Main Authors: Donders, A. Rogier T., van der Heijden, Geert J.M.G., Stijnen, Theo, Moons, Karel G.M.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73
cites cdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73
container_end_page 1091
container_issue 10
container_start_page 1087
container_title Journal of clinical epidemiology
container_volume 59
creator Donders, A. Rogier T.
van der Heijden, Geert J.M.G.
Stijnen, Theo
Moons, Karel G.M.
description In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.
doi_str_mv 10.1016/j.jclinepi.2006.01.014
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68861816</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435606001971</els_id><sourcerecordid>2734461811</sourcerecordid><originalsourceid>FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</originalsourceid><addsrcrecordid>eNqFkFtLwzAUgIMoOqd_YRQE31qT5tJWEBzDGwwE0eeQpqcjpWtq0k7892ZuIvgiHDgn8J1LPoRmBCcEE3HVJI1uTQe9SVKMRYJJCHaAJiTP8pgXKTlEE5wXPGaUixN06n2DMclwxo_RCRFFHvBigm5eYGPg4zqaRyvohhYi0w3OVqMejO2iwUZm3Y-D-n7ZOlob7023ijaqHcGfoaNatR7O93mK3u7vXheP8fL54WkxX8aasXSIK6AKqkJrxqmiAtKyUpxiwRTHqS4zxlNIK05YTWteQEkzkYLWUIWiKMqMTtHlbm7v7HvYO8hwh4a2VR3Y0UuR54LkRATw4g_Y2NF14TZJMKUkpyLDgRI7SjvrvYNa9s6slfsMkNzqlY380Su3eiUmIVhonO3Hj-Uaqt-2vc8A3O4ACDaCWCe9NtCFrxgHepCVNf_t-AJO5Y6R</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1033183670</pqid></control><display><type>article</type><title>Review: A gentle introduction to imputation of missing values</title><source>ScienceDirect Freedom Collection</source><creator>Donders, A. Rogier T. ; van der Heijden, Geert J.M.G. ; Stijnen, Theo ; Moons, Karel G.M.</creator><creatorcontrib>Donders, A. Rogier T. ; van der Heijden, Geert J.M.G. ; Stijnen, Theo ; Moons, Karel G.M.</creatorcontrib><description>In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/j.jclinepi.2006.01.014</identifier><identifier>PMID: 16980149</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Algorithms ; Bias ; Data Interpretation, Statistical ; Epidemiology ; Humans ; Indicator method ; Logistic Models ; Methods ; Missing data ; Multiple imputation ; Precision ; R&amp;D ; Research &amp; development ; Research Design ; Single imputation ; Studies</subject><ispartof>Journal of clinical epidemiology, 2006-10, Vol.59 (10), p.1087-1091</ispartof><rights>2006 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</citedby><cites>FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16980149$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Donders, A. Rogier T.</creatorcontrib><creatorcontrib>van der Heijden, Geert J.M.G.</creatorcontrib><creatorcontrib>Stijnen, Theo</creatorcontrib><creatorcontrib>Moons, Karel G.M.</creatorcontrib><title>Review: A gentle introduction to imputation of missing values</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.</description><subject>Algorithms</subject><subject>Bias</subject><subject>Data Interpretation, Statistical</subject><subject>Epidemiology</subject><subject>Humans</subject><subject>Indicator method</subject><subject>Logistic Models</subject><subject>Methods</subject><subject>Missing data</subject><subject>Multiple imputation</subject><subject>Precision</subject><subject>R&amp;D</subject><subject>Research &amp; development</subject><subject>Research Design</subject><subject>Single imputation</subject><subject>Studies</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNqFkFtLwzAUgIMoOqd_YRQE31qT5tJWEBzDGwwE0eeQpqcjpWtq0k7892ZuIvgiHDgn8J1LPoRmBCcEE3HVJI1uTQe9SVKMRYJJCHaAJiTP8pgXKTlEE5wXPGaUixN06n2DMclwxo_RCRFFHvBigm5eYGPg4zqaRyvohhYi0w3OVqMejO2iwUZm3Y-D-n7ZOlob7023ijaqHcGfoaNatR7O93mK3u7vXheP8fL54WkxX8aasXSIK6AKqkJrxqmiAtKyUpxiwRTHqS4zxlNIK05YTWteQEkzkYLWUIWiKMqMTtHlbm7v7HvYO8hwh4a2VR3Y0UuR54LkRATw4g_Y2NF14TZJMKUkpyLDgRI7SjvrvYNa9s6slfsMkNzqlY380Su3eiUmIVhonO3Hj-Uaqt-2vc8A3O4ACDaCWCe9NtCFrxgHepCVNf_t-AJO5Y6R</recordid><startdate>20061001</startdate><enddate>20061001</enddate><creator>Donders, A. Rogier T.</creator><creator>van der Heijden, Geert J.M.G.</creator><creator>Stijnen, Theo</creator><creator>Moons, Karel G.M.</creator><general>Elsevier Inc</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>20061001</creationdate><title>Review: A gentle introduction to imputation of missing values</title><author>Donders, A. Rogier T. ; van der Heijden, Geert J.M.G. ; Stijnen, Theo ; Moons, Karel G.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Bias</topic><topic>Data Interpretation, Statistical</topic><topic>Epidemiology</topic><topic>Humans</topic><topic>Indicator method</topic><topic>Logistic Models</topic><topic>Methods</topic><topic>Missing data</topic><topic>Multiple imputation</topic><topic>Precision</topic><topic>R&amp;D</topic><topic>Research &amp; development</topic><topic>Research Design</topic><topic>Single imputation</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Donders, A. Rogier T.</creatorcontrib><creatorcontrib>van der Heijden, Geert J.M.G.</creatorcontrib><creatorcontrib>Stijnen, Theo</creatorcontrib><creatorcontrib>Moons, Karel G.M.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>ProQuest_Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>ProQuest Health Management</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest_Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Donders, A. Rogier T.</au><au>van der Heijden, Geert J.M.G.</au><au>Stijnen, Theo</au><au>Moons, Karel G.M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Review: A gentle introduction to imputation of missing values</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2006-10-01</date><risdate>2006</risdate><volume>59</volume><issue>10</issue><spage>1087</spage><epage>1091</epage><pages>1087-1091</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>16980149</pmid><doi>10.1016/j.jclinepi.2006.01.014</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0895-4356
ispartof Journal of clinical epidemiology, 2006-10, Vol.59 (10), p.1087-1091
issn 0895-4356
1878-5921
language eng
recordid cdi_proquest_miscellaneous_68861816
source ScienceDirect Freedom Collection
subjects Algorithms
Bias
Data Interpretation, Statistical
Epidemiology
Humans
Indicator method
Logistic Models
Methods
Missing data
Multiple imputation
Precision
R&D
Research & development
Research Design
Single imputation
Studies
title Review: A gentle introduction to imputation of missing values
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T08%3A16%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Review:%20A%20gentle%20introduction%20to%20imputation%20of%20missing%20values&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Donders,%20A.%20Rogier%20T.&rft.date=2006-10-01&rft.volume=59&rft.issue=10&rft.spage=1087&rft.epage=1091&rft.pages=1087-1091&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/j.jclinepi.2006.01.014&rft_dat=%3Cproquest_cross%3E2734461811%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1033183670&rft_id=info:pmid/16980149&rfr_iscdi=true