Loading…

A Comparison of Imputation Techniques for Handling Missing Data

Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a sim...

Full description

Saved in:
Bibliographic Details
Published in:Western journal of nursing research 2002-11, Vol.24 (7), p.815-829
Main Authors: Musil, Carol M., Warner, Camille B., Yobas, Piyanee Klainin, Jones, Susan L.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43
cites cdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43
container_end_page 829
container_issue 7
container_start_page 815
container_title Western journal of nursing research
container_volume 24
creator Musil, Carol M.
Warner, Camille B.
Yobas, Piyanee Klainin
Jones, Susan L.
description Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.
doi_str_mv 10.1177/019394502762477004
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_72685135</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_019394502762477004</sage_id><sourcerecordid>57684570</sourcerecordid><originalsourceid>FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</originalsourceid><addsrcrecordid>eNqFkUtLw0AUhQdRbK3-ARcSXLiLnfdjJaU-Wqi4qeswmUxqSpKpM8nCf--EFgoKurpc-M65594LwDWC9wgJMYVIEUUZxIJjKgSE9ASMEWM4lZTxUzAegDQSagQuQthCCDFF-ByMEKZYSiXG4GGWzF2z074Krk1cmSybXd_prord2pqPtvrsbUhK55OFbou6ajfJaxXCUB91py_BWanrYK8OdQLen5_W80W6entZzmer1BAlupRzojFV3GBlIJXQ4oIhm5fCCEUERBQamhtpYEFKnmtsINZWoZxJyYnVlEzA3d53592QqMuaKhhb17q1rg-ZwFwyRNi_IBM8XkfACN7-ALeu921cIsMxD-ZcogjhPWS8C8HbMtv5qtH-K0MwG56Q_X5CFN0cnPu8scVRcrh6BKZ7IOiNPY79w_Ib9CSMnQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>214026681</pqid></control><display><type>article</type><title>A Comparison of Imputation Techniques for Handling Missing Data</title><source>Applied Social Sciences Index &amp; Abstracts (ASSIA)</source><source>SAGE</source><creator>Musil, Carol M. ; Warner, Camille B. ; Yobas, Piyanee Klainin ; Jones, Susan L.</creator><creatorcontrib>Musil, Carol M. ; Warner, Camille B. ; Yobas, Piyanee Klainin ; Jones, Susan L.</creatorcontrib><description>Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.</description><identifier>ISSN: 0193-9459</identifier><identifier>EISSN: 1552-8456</identifier><identifier>DOI: 10.1177/019394502762477004</identifier><identifier>PMID: 12428897</identifier><language>eng</language><publisher>Thousand Oaks, CA: Sage Publications</publisher><subject>Algorithms ; Clinical Nursing Research - methods ; Comparative analysis ; Data Collection - statistics &amp; numerical data ; Humans ; Imputation ; Information ; Methodology ; Missing data ; Nursing ; Regression Analysis ; Statistical analysis</subject><ispartof>Western journal of nursing research, 2002-11, Vol.24 (7), p.815-829</ispartof><rights>Copyright SAGE PUBLICATIONS, INC. Nov 2002</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</citedby><cites>FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,30999,31000,79364</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12428897$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Musil, Carol M.</creatorcontrib><creatorcontrib>Warner, Camille B.</creatorcontrib><creatorcontrib>Yobas, Piyanee Klainin</creatorcontrib><creatorcontrib>Jones, Susan L.</creatorcontrib><title>A Comparison of Imputation Techniques for Handling Missing Data</title><title>Western journal of nursing research</title><addtitle>West J Nurs Res</addtitle><description>Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.</description><subject>Algorithms</subject><subject>Clinical Nursing Research - methods</subject><subject>Comparative analysis</subject><subject>Data Collection - statistics &amp; numerical data</subject><subject>Humans</subject><subject>Imputation</subject><subject>Information</subject><subject>Methodology</subject><subject>Missing data</subject><subject>Nursing</subject><subject>Regression Analysis</subject><subject>Statistical analysis</subject><issn>0193-9459</issn><issn>1552-8456</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><sourceid>7QJ</sourceid><recordid>eNqFkUtLw0AUhQdRbK3-ARcSXLiLnfdjJaU-Wqi4qeswmUxqSpKpM8nCf--EFgoKurpc-M65594LwDWC9wgJMYVIEUUZxIJjKgSE9ASMEWM4lZTxUzAegDQSagQuQthCCDFF-ByMEKZYSiXG4GGWzF2z074Krk1cmSybXd_prord2pqPtvrsbUhK55OFbou6ajfJaxXCUB91py_BWanrYK8OdQLen5_W80W6entZzmer1BAlupRzojFV3GBlIJXQ4oIhm5fCCEUERBQamhtpYEFKnmtsINZWoZxJyYnVlEzA3d53592QqMuaKhhb17q1rg-ZwFwyRNi_IBM8XkfACN7-ALeu921cIsMxD-ZcogjhPWS8C8HbMtv5qtH-K0MwG56Q_X5CFN0cnPu8scVRcrh6BKZ7IOiNPY79w_Ib9CSMnQ</recordid><startdate>20021101</startdate><enddate>20021101</enddate><creator>Musil, Carol M.</creator><creator>Warner, Camille B.</creator><creator>Yobas, Piyanee Klainin</creator><creator>Jones, Susan L.</creator><general>Sage Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QJ</scope><scope>ASE</scope><scope>FPQ</scope><scope>K6X</scope><scope>K9.</scope><scope>NAPCQ</scope><scope>7X8</scope></search><sort><creationdate>20021101</creationdate><title>A Comparison of Imputation Techniques for Handling Missing Data</title><author>Musil, Carol M. ; Warner, Camille B. ; Yobas, Piyanee Klainin ; Jones, Susan L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Algorithms</topic><topic>Clinical Nursing Research - methods</topic><topic>Comparative analysis</topic><topic>Data Collection - statistics &amp; numerical data</topic><topic>Humans</topic><topic>Imputation</topic><topic>Information</topic><topic>Methodology</topic><topic>Missing data</topic><topic>Nursing</topic><topic>Regression Analysis</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Musil, Carol M.</creatorcontrib><creatorcontrib>Warner, Camille B.</creatorcontrib><creatorcontrib>Yobas, Piyanee Klainin</creatorcontrib><creatorcontrib>Jones, Susan L.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Applied Social Sciences Index &amp; Abstracts (ASSIA)</collection><collection>British Nursing Index</collection><collection>British Nursing Index (BNI) (1985 to Present)</collection><collection>British Nursing Index</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>MEDLINE - Academic</collection><jtitle>Western journal of nursing research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Musil, Carol M.</au><au>Warner, Camille B.</au><au>Yobas, Piyanee Klainin</au><au>Jones, Susan L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comparison of Imputation Techniques for Handling Missing Data</atitle><jtitle>Western journal of nursing research</jtitle><addtitle>West J Nurs Res</addtitle><date>2002-11-01</date><risdate>2002</risdate><volume>24</volume><issue>7</issue><spage>815</spage><epage>829</epage><pages>815-829</pages><issn>0193-9459</issn><eissn>1552-8456</eissn><abstract>Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.</abstract><cop>Thousand Oaks, CA</cop><pub>Sage Publications</pub><pmid>12428897</pmid><doi>10.1177/019394502762477004</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0193-9459
ispartof Western journal of nursing research, 2002-11, Vol.24 (7), p.815-829
issn 0193-9459
1552-8456
language eng
recordid cdi_proquest_miscellaneous_72685135
source Applied Social Sciences Index & Abstracts (ASSIA); SAGE
subjects Algorithms
Clinical Nursing Research - methods
Comparative analysis
Data Collection - statistics & numerical data
Humans
Imputation
Information
Methodology
Missing data
Nursing
Regression Analysis
Statistical analysis
title A Comparison of Imputation Techniques for Handling Missing Data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A25%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comparison%20of%20Imputation%20Techniques%20for%20Handling%20Missing%20Data&rft.jtitle=Western%20journal%20of%20nursing%20research&rft.au=Musil,%20Carol%20M.&rft.date=2002-11-01&rft.volume=24&rft.issue=7&rft.spage=815&rft.epage=829&rft.pages=815-829&rft.issn=0193-9459&rft.eissn=1552-8456&rft_id=info:doi/10.1177/019394502762477004&rft_dat=%3Cproquest_cross%3E57684570%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=214026681&rft_id=info:pmid/12428897&rft_sage_id=10.1177_019394502762477004&rfr_iscdi=true