Loading…
A Comparison of Imputation Techniques for Handling Missing Data
Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a sim...
Saved in:
Published in: | Western journal of nursing research 2002-11, Vol.24 (7), p.815-829 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43 |
---|---|
cites | cdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43 |
container_end_page | 829 |
container_issue | 7 |
container_start_page | 815 |
container_title | Western journal of nursing research |
container_volume | 24 |
creator | Musil, Carol M. Warner, Camille B. Yobas, Piyanee Klainin Jones, Susan L. |
description | Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables. |
doi_str_mv | 10.1177/019394502762477004 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_72685135</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_019394502762477004</sage_id><sourcerecordid>57684570</sourcerecordid><originalsourceid>FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</originalsourceid><addsrcrecordid>eNqFkUtLw0AUhQdRbK3-ARcSXLiLnfdjJaU-Wqi4qeswmUxqSpKpM8nCf--EFgoKurpc-M65594LwDWC9wgJMYVIEUUZxIJjKgSE9ASMEWM4lZTxUzAegDQSagQuQthCCDFF-ByMEKZYSiXG4GGWzF2z074Krk1cmSybXd_prord2pqPtvrsbUhK55OFbou6ajfJaxXCUB91py_BWanrYK8OdQLen5_W80W6entZzmer1BAlupRzojFV3GBlIJXQ4oIhm5fCCEUERBQamhtpYEFKnmtsINZWoZxJyYnVlEzA3d53592QqMuaKhhb17q1rg-ZwFwyRNi_IBM8XkfACN7-ALeu921cIsMxD-ZcogjhPWS8C8HbMtv5qtH-K0MwG56Q_X5CFN0cnPu8scVRcrh6BKZ7IOiNPY79w_Ib9CSMnQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>214026681</pqid></control><display><type>article</type><title>A Comparison of Imputation Techniques for Handling Missing Data</title><source>Applied Social Sciences Index & Abstracts (ASSIA)</source><source>SAGE</source><creator>Musil, Carol M. ; Warner, Camille B. ; Yobas, Piyanee Klainin ; Jones, Susan L.</creator><creatorcontrib>Musil, Carol M. ; Warner, Camille B. ; Yobas, Piyanee Klainin ; Jones, Susan L.</creatorcontrib><description>Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.</description><identifier>ISSN: 0193-9459</identifier><identifier>EISSN: 1552-8456</identifier><identifier>DOI: 10.1177/019394502762477004</identifier><identifier>PMID: 12428897</identifier><language>eng</language><publisher>Thousand Oaks, CA: Sage Publications</publisher><subject>Algorithms ; Clinical Nursing Research - methods ; Comparative analysis ; Data Collection - statistics & numerical data ; Humans ; Imputation ; Information ; Methodology ; Missing data ; Nursing ; Regression Analysis ; Statistical analysis</subject><ispartof>Western journal of nursing research, 2002-11, Vol.24 (7), p.815-829</ispartof><rights>Copyright SAGE PUBLICATIONS, INC. Nov 2002</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</citedby><cites>FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,30999,31000,79364</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/12428897$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Musil, Carol M.</creatorcontrib><creatorcontrib>Warner, Camille B.</creatorcontrib><creatorcontrib>Yobas, Piyanee Klainin</creatorcontrib><creatorcontrib>Jones, Susan L.</creatorcontrib><title>A Comparison of Imputation Techniques for Handling Missing Data</title><title>Western journal of nursing research</title><addtitle>West J Nurs Res</addtitle><description>Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.</description><subject>Algorithms</subject><subject>Clinical Nursing Research - methods</subject><subject>Comparative analysis</subject><subject>Data Collection - statistics & numerical data</subject><subject>Humans</subject><subject>Imputation</subject><subject>Information</subject><subject>Methodology</subject><subject>Missing data</subject><subject>Nursing</subject><subject>Regression Analysis</subject><subject>Statistical analysis</subject><issn>0193-9459</issn><issn>1552-8456</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2002</creationdate><recordtype>article</recordtype><sourceid>7QJ</sourceid><recordid>eNqFkUtLw0AUhQdRbK3-ARcSXLiLnfdjJaU-Wqi4qeswmUxqSpKpM8nCf--EFgoKurpc-M65594LwDWC9wgJMYVIEUUZxIJjKgSE9ASMEWM4lZTxUzAegDQSagQuQthCCDFF-ByMEKZYSiXG4GGWzF2z074Krk1cmSybXd_prord2pqPtvrsbUhK55OFbou6ajfJaxXCUB91py_BWanrYK8OdQLen5_W80W6entZzmer1BAlupRzojFV3GBlIJXQ4oIhm5fCCEUERBQamhtpYEFKnmtsINZWoZxJyYnVlEzA3d53592QqMuaKhhb17q1rg-ZwFwyRNi_IBM8XkfACN7-ALeu921cIsMxD-ZcogjhPWS8C8HbMtv5qtH-K0MwG56Q_X5CFN0cnPu8scVRcrh6BKZ7IOiNPY79w_Ib9CSMnQ</recordid><startdate>20021101</startdate><enddate>20021101</enddate><creator>Musil, Carol M.</creator><creator>Warner, Camille B.</creator><creator>Yobas, Piyanee Klainin</creator><creator>Jones, Susan L.</creator><general>Sage Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QJ</scope><scope>ASE</scope><scope>FPQ</scope><scope>K6X</scope><scope>K9.</scope><scope>NAPCQ</scope><scope>7X8</scope></search><sort><creationdate>20021101</creationdate><title>A Comparison of Imputation Techniques for Handling Missing Data</title><author>Musil, Carol M. ; Warner, Camille B. ; Yobas, Piyanee Klainin ; Jones, Susan L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2002</creationdate><topic>Algorithms</topic><topic>Clinical Nursing Research - methods</topic><topic>Comparative analysis</topic><topic>Data Collection - statistics & numerical data</topic><topic>Humans</topic><topic>Imputation</topic><topic>Information</topic><topic>Methodology</topic><topic>Missing data</topic><topic>Nursing</topic><topic>Regression Analysis</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Musil, Carol M.</creatorcontrib><creatorcontrib>Warner, Camille B.</creatorcontrib><creatorcontrib>Yobas, Piyanee Klainin</creatorcontrib><creatorcontrib>Jones, Susan L.</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Applied Social Sciences Index & Abstracts (ASSIA)</collection><collection>British Nursing Index</collection><collection>British Nursing Index (BNI) (1985 to Present)</collection><collection>British Nursing Index</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Nursing & Allied Health Premium</collection><collection>MEDLINE - Academic</collection><jtitle>Western journal of nursing research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Musil, Carol M.</au><au>Warner, Camille B.</au><au>Yobas, Piyanee Klainin</au><au>Jones, Susan L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Comparison of Imputation Techniques for Handling Missing Data</atitle><jtitle>Western journal of nursing research</jtitle><addtitle>West J Nurs Res</addtitle><date>2002-11-01</date><risdate>2002</risdate><volume>24</volume><issue>7</issue><spage>815</spage><epage>829</epage><pages>815-829</pages><issn>0193-9459</issn><eissn>1552-8456</eissn><abstract>Researchers are commonly faced with the problem of missing data. This article presents theoretical and empirical information for the selection and application of approaches for handling missing data on a single variable. An actual data set of 492 cases with no missing values was used to create a simulated yet realistic data set with missing at random (MAR) data. The authors compare and contrast five approaches (listwise deletion, mean substitution, simple regression, regression with an error term, and the expectation maximization [EM] algorithm) for dealing with missing data, and compare the effects of each method on descriptive statistics and correlation coefficients for the imputed data (n = 96) and the entire sample (n = 492) when imputed data are included. All methods had limitations, although our findings suggest that mean substitution was the least effective and that regression with an error term and the EM algorithm produced estimates closest to those of the original variables.</abstract><cop>Thousand Oaks, CA</cop><pub>Sage Publications</pub><pmid>12428897</pmid><doi>10.1177/019394502762477004</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0193-9459 |
ispartof | Western journal of nursing research, 2002-11, Vol.24 (7), p.815-829 |
issn | 0193-9459 1552-8456 |
language | eng |
recordid | cdi_proquest_miscellaneous_72685135 |
source | Applied Social Sciences Index & Abstracts (ASSIA); SAGE |
subjects | Algorithms Clinical Nursing Research - methods Comparative analysis Data Collection - statistics & numerical data Humans Imputation Information Methodology Missing data Nursing Regression Analysis Statistical analysis |
title | A Comparison of Imputation Techniques for Handling Missing Data |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T13%3A25%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Comparison%20of%20Imputation%20Techniques%20for%20Handling%20Missing%20Data&rft.jtitle=Western%20journal%20of%20nursing%20research&rft.au=Musil,%20Carol%20M.&rft.date=2002-11-01&rft.volume=24&rft.issue=7&rft.spage=815&rft.epage=829&rft.pages=815-829&rft.issn=0193-9459&rft.eissn=1552-8456&rft_id=info:doi/10.1177/019394502762477004&rft_dat=%3Cproquest_cross%3E57684570%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c397t-663a2496c29c0480e2d51ebf7c79370140c4bc8c0d3f6ba2c02ae91b58863ea43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=214026681&rft_id=info:pmid/12428897&rft_sage_id=10.1177_019394502762477004&rfr_iscdi=true |