Loading…
Review: A gentle introduction to imputation of missing values
In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out...
Saved in:
Published in: | Journal of clinical epidemiology 2006-10, Vol.59 (10), p.1087-1091 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73 |
---|---|
cites | cdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73 |
container_end_page | 1091 |
container_issue | 10 |
container_start_page | 1087 |
container_title | Journal of clinical epidemiology |
container_volume | 59 |
creator | Donders, A. Rogier T. van der Heijden, Geert J.M.G. Stijnen, Theo Moons, Karel G.M. |
description | In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates. |
doi_str_mv | 10.1016/j.jclinepi.2006.01.014 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_68861816</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0895435606001971</els_id><sourcerecordid>2734461811</sourcerecordid><originalsourceid>FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</originalsourceid><addsrcrecordid>eNqFkFtLwzAUgIMoOqd_YRQE31qT5tJWEBzDGwwE0eeQpqcjpWtq0k7892ZuIvgiHDgn8J1LPoRmBCcEE3HVJI1uTQe9SVKMRYJJCHaAJiTP8pgXKTlEE5wXPGaUixN06n2DMclwxo_RCRFFHvBigm5eYGPg4zqaRyvohhYi0w3OVqMejO2iwUZm3Y-D-n7ZOlob7023ijaqHcGfoaNatR7O93mK3u7vXheP8fL54WkxX8aasXSIK6AKqkJrxqmiAtKyUpxiwRTHqS4zxlNIK05YTWteQEkzkYLWUIWiKMqMTtHlbm7v7HvYO8hwh4a2VR3Y0UuR54LkRATw4g_Y2NF14TZJMKUkpyLDgRI7SjvrvYNa9s6slfsMkNzqlY380Su3eiUmIVhonO3Hj-Uaqt-2vc8A3O4ACDaCWCe9NtCFrxgHepCVNf_t-AJO5Y6R</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1033183670</pqid></control><display><type>article</type><title>Review: A gentle introduction to imputation of missing values</title><source>ScienceDirect Freedom Collection</source><creator>Donders, A. Rogier T. ; van der Heijden, Geert J.M.G. ; Stijnen, Theo ; Moons, Karel G.M.</creator><creatorcontrib>Donders, A. Rogier T. ; van der Heijden, Geert J.M.G. ; Stijnen, Theo ; Moons, Karel G.M.</creatorcontrib><description>In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.</description><identifier>ISSN: 0895-4356</identifier><identifier>EISSN: 1878-5921</identifier><identifier>DOI: 10.1016/j.jclinepi.2006.01.014</identifier><identifier>PMID: 16980149</identifier><language>eng</language><publisher>United States: Elsevier Inc</publisher><subject>Algorithms ; Bias ; Data Interpretation, Statistical ; Epidemiology ; Humans ; Indicator method ; Logistic Models ; Methods ; Missing data ; Multiple imputation ; Precision ; R&D ; Research & development ; Research Design ; Single imputation ; Studies</subject><ispartof>Journal of clinical epidemiology, 2006-10, Vol.59 (10), p.1087-1091</ispartof><rights>2006 Elsevier Inc.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</citedby><cites>FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16980149$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Donders, A. Rogier T.</creatorcontrib><creatorcontrib>van der Heijden, Geert J.M.G.</creatorcontrib><creatorcontrib>Stijnen, Theo</creatorcontrib><creatorcontrib>Moons, Karel G.M.</creatorcontrib><title>Review: A gentle introduction to imputation of missing values</title><title>Journal of clinical epidemiology</title><addtitle>J Clin Epidemiol</addtitle><description>In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.</description><subject>Algorithms</subject><subject>Bias</subject><subject>Data Interpretation, Statistical</subject><subject>Epidemiology</subject><subject>Humans</subject><subject>Indicator method</subject><subject>Logistic Models</subject><subject>Methods</subject><subject>Missing data</subject><subject>Multiple imputation</subject><subject>Precision</subject><subject>R&D</subject><subject>Research & development</subject><subject>Research Design</subject><subject>Single imputation</subject><subject>Studies</subject><issn>0895-4356</issn><issn>1878-5921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNqFkFtLwzAUgIMoOqd_YRQE31qT5tJWEBzDGwwE0eeQpqcjpWtq0k7892ZuIvgiHDgn8J1LPoRmBCcEE3HVJI1uTQe9SVKMRYJJCHaAJiTP8pgXKTlEE5wXPGaUixN06n2DMclwxo_RCRFFHvBigm5eYGPg4zqaRyvohhYi0w3OVqMejO2iwUZm3Y-D-n7ZOlob7023ijaqHcGfoaNatR7O93mK3u7vXheP8fL54WkxX8aasXSIK6AKqkJrxqmiAtKyUpxiwRTHqS4zxlNIK05YTWteQEkzkYLWUIWiKMqMTtHlbm7v7HvYO8hwh4a2VR3Y0UuR54LkRATw4g_Y2NF14TZJMKUkpyLDgRI7SjvrvYNa9s6slfsMkNzqlY380Su3eiUmIVhonO3Hj-Uaqt-2vc8A3O4ACDaCWCe9NtCFrxgHepCVNf_t-AJO5Y6R</recordid><startdate>20061001</startdate><enddate>20061001</enddate><creator>Donders, A. Rogier T.</creator><creator>van der Heijden, Geert J.M.G.</creator><creator>Stijnen, Theo</creator><creator>Moons, Karel G.M.</creator><general>Elsevier Inc</general><general>Elsevier Limited</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7QL</scope><scope>7QP</scope><scope>7RV</scope><scope>7T2</scope><scope>7T7</scope><scope>7TK</scope><scope>7U7</scope><scope>7U9</scope><scope>7X7</scope><scope>7XB</scope><scope>88C</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>8G5</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>C1K</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>GUQSH</scope><scope>H94</scope><scope>K9.</scope><scope>KB0</scope><scope>M0S</scope><scope>M0T</scope><scope>M1P</scope><scope>M2O</scope><scope>M7N</scope><scope>MBDVC</scope><scope>NAPCQ</scope><scope>P64</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope></search><sort><creationdate>20061001</creationdate><title>Review: A gentle introduction to imputation of missing values</title><author>Donders, A. Rogier T. ; van der Heijden, Geert J.M.G. ; Stijnen, Theo ; Moons, Karel G.M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Bias</topic><topic>Data Interpretation, Statistical</topic><topic>Epidemiology</topic><topic>Humans</topic><topic>Indicator method</topic><topic>Logistic Models</topic><topic>Methods</topic><topic>Missing data</topic><topic>Multiple imputation</topic><topic>Precision</topic><topic>R&D</topic><topic>Research & development</topic><topic>Research Design</topic><topic>Single imputation</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Donders, A. Rogier T.</creatorcontrib><creatorcontrib>van der Heijden, Geert J.M.G.</creatorcontrib><creatorcontrib>Stijnen, Theo</creatorcontrib><creatorcontrib>Moons, Karel G.M.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Calcium & Calcified Tissue Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Health and Safety Science Abstracts (Full archive)</collection><collection>Industrial and Applied Microbiology Abstracts (Microbiology A)</collection><collection>Neurosciences Abstracts</collection><collection>Toxicology Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>ProQuest_Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Healthcare Administration Database (Alumni)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Research Library (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>Research Library Prep</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>ProQuest Health Management</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest_Research Library</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Research Library (Corporate)</collection><collection>Nursing & Allied Health Premium</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><jtitle>Journal of clinical epidemiology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Donders, A. Rogier T.</au><au>van der Heijden, Geert J.M.G.</au><au>Stijnen, Theo</au><au>Moons, Karel G.M.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Review: A gentle introduction to imputation of missing values</atitle><jtitle>Journal of clinical epidemiology</jtitle><addtitle>J Clin Epidemiol</addtitle><date>2006-10-01</date><risdate>2006</risdate><volume>59</volume><issue>10</issue><spage>1087</spage><epage>1091</epage><pages>1087-1091</pages><issn>0895-4356</issn><eissn>1878-5921</eissn><abstract>In most situations, simple techniques for handling missing data (such as complete case analysis, overall mean imputation, and the missing-indicator method) produce biased results, whereas imputation techniques yield valid results without complicating the analysis once the imputations are carried out. Imputation techniques are based on the idea that any subject in a study sample can be replaced by a new randomly chosen subject from the same source population. Imputation of missing data on a variable is replacing that missing by a value that is drawn from an estimate of the distribution of this variable. In single imputation, only one estimate is used. In multiple imputation, various estimates are used, reflecting the uncertainty in the estimation of this distribution. Under the general conditions of so-called missing at random and missing completely at random, both single and multiple imputations result in unbiased estimates of study associations. But single imputation results in too small estimated standard errors, whereas multiple imputation results in correctly estimated standard errors and confidence intervals. In this article we explain why all this is the case, and use a simple simulation study to demonstrate our explanations. We also explain and illustrate why two frequently used methods to handle missing data, i.e., overall mean imputation and the missing-indicator method, almost always result in biased estimates.</abstract><cop>United States</cop><pub>Elsevier Inc</pub><pmid>16980149</pmid><doi>10.1016/j.jclinepi.2006.01.014</doi><tpages>5</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0895-4356 |
ispartof | Journal of clinical epidemiology, 2006-10, Vol.59 (10), p.1087-1091 |
issn | 0895-4356 1878-5921 |
language | eng |
recordid | cdi_proquest_miscellaneous_68861816 |
source | ScienceDirect Freedom Collection |
subjects | Algorithms Bias Data Interpretation, Statistical Epidemiology Humans Indicator method Logistic Models Methods Missing data Multiple imputation Precision R&D Research & development Research Design Single imputation Studies |
title | Review: A gentle introduction to imputation of missing values |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T08%3A16%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Review:%20A%20gentle%20introduction%20to%20imputation%20of%20missing%20values&rft.jtitle=Journal%20of%20clinical%20epidemiology&rft.au=Donders,%20A.%20Rogier%20T.&rft.date=2006-10-01&rft.volume=59&rft.issue=10&rft.spage=1087&rft.epage=1091&rft.pages=1087-1091&rft.issn=0895-4356&rft.eissn=1878-5921&rft_id=info:doi/10.1016/j.jclinepi.2006.01.014&rft_dat=%3Cproquest_cross%3E2734461811%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c442t-de3aed9cc453a36e2bda53064a502cb7452e2d514f3f59eb3762ecced37699b73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1033183670&rft_id=info:pmid/16980149&rfr_iscdi=true |