Loading…

Comparison of imputation methods for handling missing categorical data with univariate pattern

This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR...

Full description

Saved in:
Bibliographic Details
Published in:Revista de métodos cuantitativos para la economía y la empresa 2014, Vol.17, p.101-120
Main Author: Torres Munguia, Juan Armando
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 120
container_issue
container_start_page 101
container_title Revista de métodos cuantitativos para la economía y la empresa
container_volume 17
creator Torres Munguia, Juan Armando
description This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.
format article
fullrecord <record><control><sourceid>proquest_econi</sourceid><recordid>TN_cdi_proquest_miscellaneous_1803808330</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1803808330</sourcerecordid><originalsourceid>FETCH-LOGICAL-e184t-a5f1647a7676557e4f15ef6980a3e81408512607248ed88a01aeac4b459aadf13</originalsourceid><addsrcrecordid>eNpdjk1LxDAQhosouKz7E4SAFy-FZPM1PcriFyx4UfBkGdtkN0ub1CTVv28WPYhzeWaYh5f3pFowAFVLpl5P_-zn1SqlAz2OprppFtXbJowTRpeCJ8ESN05zxuzKNZq8D30iNkSyR98Pzu_I6FI6ssNsdiG6DgfSY0by5fKezN59lqzyIxPmbKK_qM4sDsmsfrmsXu5unzcP9fbp_nFzs60NA5FrlJYpoVErraTURlgmjVUNUOQGmKAg2VpRvRZgegCkDA124l3IBrG3jC-r65_cKYaP2aTclqadGQb0JsypZUA5UOCcFvXqn3oIc_SlXcsaClqBFE2xLn8s0wXvUntEyiG2jHHQnH8Dy-5qZg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1908768549</pqid></control><display><type>article</type><title>Comparison of imputation methods for handling missing categorical data with univariate pattern</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><source>ABI/INFORM global</source><creator>Torres Munguia, Juan Armando</creator><creatorcontrib>Torres Munguia, Juan Armando</creatorcontrib><description>This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.</description><identifier>ISSN: 1886-516X</identifier><identifier>EISSN: 1886-516X</identifier><language>eng</language><publisher>Sevilla: Universidad Pablo de Olavide</publisher><subject>Addictions ; Data collection ; Forests ; Habits ; Hot-deck ; Imputation methods ; Missing categorical data ; Polytomous regression ; Random forests ; Regression analysis ; Smoking ; Smoking habits</subject><ispartof>Revista de métodos cuantitativos para la economía y la empresa, 2014, Vol.17, p.101-120</ispartof><rights>Copyright Universidad Pablo de Olavide, Dept de Economia, Metodos Cuantitativos e Historia 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/1908768549?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4010,11667,12826,25731,33200,33201,36037,36038,36989,36990,44339,44566</link.rule.ids></links><search><creatorcontrib>Torres Munguia, Juan Armando</creatorcontrib><title>Comparison of imputation methods for handling missing categorical data with univariate pattern</title><title>Revista de métodos cuantitativos para la economía y la empresa</title><description>This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.</description><subject>Addictions</subject><subject>Data collection</subject><subject>Forests</subject><subject>Habits</subject><subject>Hot-deck</subject><subject>Imputation methods</subject><subject>Missing categorical data</subject><subject>Polytomous regression</subject><subject>Random forests</subject><subject>Regression analysis</subject><subject>Smoking</subject><subject>Smoking habits</subject><issn>1886-516X</issn><issn>1886-516X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><sourceid>M0C</sourceid><sourceid>PIMPY</sourceid><recordid>eNpdjk1LxDAQhosouKz7E4SAFy-FZPM1PcriFyx4UfBkGdtkN0ub1CTVv28WPYhzeWaYh5f3pFowAFVLpl5P_-zn1SqlAz2OprppFtXbJowTRpeCJ8ESN05zxuzKNZq8D30iNkSyR98Pzu_I6FI6ssNsdiG6DgfSY0by5fKezN59lqzyIxPmbKK_qM4sDsmsfrmsXu5unzcP9fbp_nFzs60NA5FrlJYpoVErraTURlgmjVUNUOQGmKAg2VpRvRZgegCkDA124l3IBrG3jC-r65_cKYaP2aTclqadGQb0JsypZUA5UOCcFvXqn3oIc_SlXcsaClqBFE2xLn8s0wXvUntEyiG2jHHQnH8Dy-5qZg</recordid><startdate>2014</startdate><enddate>2014</enddate><creator>Torres Munguia, Juan Armando</creator><general>Universidad Pablo de Olavide</general><general>Universidad Pablo de Olavide, Dept de Economia, Metodos Cuantitativos e Historia</general><scope>OT2</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8BJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>FRNLG</scope><scope>F~G</scope><scope>JBE</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>M0C</scope><scope>PIMPY</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>2014</creationdate><title>Comparison of imputation methods for handling missing categorical data with univariate pattern</title><author>Torres Munguia, Juan Armando</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-e184t-a5f1647a7676557e4f15ef6980a3e81408512607248ed88a01aeac4b459aadf13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Addictions</topic><topic>Data collection</topic><topic>Forests</topic><topic>Habits</topic><topic>Hot-deck</topic><topic>Imputation methods</topic><topic>Missing categorical data</topic><topic>Polytomous regression</topic><topic>Random forests</topic><topic>Regression analysis</topic><topic>Smoking</topic><topic>Smoking habits</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Torres Munguia, Juan Armando</creatorcontrib><collection>EconStor</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>International Bibliography of the Social Sciences</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM global</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Revista de métodos cuantitativos para la economía y la empresa</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Torres Munguia, Juan Armando</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of imputation methods for handling missing categorical data with univariate pattern</atitle><jtitle>Revista de métodos cuantitativos para la economía y la empresa</jtitle><date>2014</date><risdate>2014</risdate><volume>17</volume><spage>101</spage><epage>120</epage><pages>101-120</pages><issn>1886-516X</issn><eissn>1886-516X</eissn><abstract>This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.</abstract><cop>Sevilla</cop><pub>Universidad Pablo de Olavide</pub><tpages>20</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1886-516X
ispartof Revista de métodos cuantitativos para la economía y la empresa, 2014, Vol.17, p.101-120
issn 1886-516X
1886-516X
language eng
recordid cdi_proquest_miscellaneous_1803808330
source International Bibliography of the Social Sciences (IBSS); Publicly Available Content Database (Proquest) (PQ_SDU_P3); ABI/INFORM global
subjects Addictions
Data collection
Forests
Habits
Hot-deck
Imputation methods
Missing categorical data
Polytomous regression
Random forests
Regression analysis
Smoking
Smoking habits
title Comparison of imputation methods for handling missing categorical data with univariate pattern
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T12%3A39%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_econi&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20imputation%20methods%20for%20handling%20missing%20categorical%20data%20with%20univariate%20pattern&rft.jtitle=Revista%20de%20me%CC%81todos%20cuantitativos%20para%20la%20economi%CC%81a%20y%20la%20empresa&rft.au=Torres%20Munguia,%20Juan%20Armando&rft.date=2014&rft.volume=17&rft.spage=101&rft.epage=120&rft.pages=101-120&rft.issn=1886-516X&rft.eissn=1886-516X&rft_id=info:doi/&rft_dat=%3Cproquest_econi%3E1803808330%3C/proquest_econi%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-e184t-a5f1647a7676557e4f15ef6980a3e81408512607248ed88a01aeac4b459aadf13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1908768549&rft_id=info:pmid/&rfr_iscdi=true