Loading…
Comparison of imputation methods for handling missing categorical data with univariate pattern
This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR...
Saved in:
Published in: | Revista de métodos cuantitativos para la economía y la empresa 2014, Vol.17, p.101-120 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 120 |
container_issue | |
container_start_page | 101 |
container_title | Revista de métodos cuantitativos para la economía y la empresa |
container_volume | 17 |
creator | Torres Munguia, Juan Armando |
description | This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches. |
format | article |
fullrecord | <record><control><sourceid>proquest_econi</sourceid><recordid>TN_cdi_proquest_miscellaneous_1803808330</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1803808330</sourcerecordid><originalsourceid>FETCH-LOGICAL-e184t-a5f1647a7676557e4f15ef6980a3e81408512607248ed88a01aeac4b459aadf13</originalsourceid><addsrcrecordid>eNpdjk1LxDAQhosouKz7E4SAFy-FZPM1PcriFyx4UfBkGdtkN0ub1CTVv28WPYhzeWaYh5f3pFowAFVLpl5P_-zn1SqlAz2OprppFtXbJowTRpeCJ8ESN05zxuzKNZq8D30iNkSyR98Pzu_I6FI6ssNsdiG6DgfSY0by5fKezN59lqzyIxPmbKK_qM4sDsmsfrmsXu5unzcP9fbp_nFzs60NA5FrlJYpoVErraTURlgmjVUNUOQGmKAg2VpRvRZgegCkDA124l3IBrG3jC-r65_cKYaP2aTclqadGQb0JsypZUA5UOCcFvXqn3oIc_SlXcsaClqBFE2xLn8s0wXvUntEyiG2jHHQnH8Dy-5qZg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1908768549</pqid></control><display><type>article</type><title>Comparison of imputation methods for handling missing categorical data with univariate pattern</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><source>ABI/INFORM global</source><creator>Torres Munguia, Juan Armando</creator><creatorcontrib>Torres Munguia, Juan Armando</creatorcontrib><description>This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.</description><identifier>ISSN: 1886-516X</identifier><identifier>EISSN: 1886-516X</identifier><language>eng</language><publisher>Sevilla: Universidad Pablo de Olavide</publisher><subject>Addictions ; Data collection ; Forests ; Habits ; Hot-deck ; Imputation methods ; Missing categorical data ; Polytomous regression ; Random forests ; Regression analysis ; Smoking ; Smoking habits</subject><ispartof>Revista de métodos cuantitativos para la economía y la empresa, 2014, Vol.17, p.101-120</ispartof><rights>Copyright Universidad Pablo de Olavide, Dept de Economia, Metodos Cuantitativos e Historia 2014</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/1908768549?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4010,11667,12826,25731,33200,33201,36037,36038,36989,36990,44339,44566</link.rule.ids></links><search><creatorcontrib>Torres Munguia, Juan Armando</creatorcontrib><title>Comparison of imputation methods for handling missing categorical data with univariate pattern</title><title>Revista de métodos cuantitativos para la economía y la empresa</title><description>This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.</description><subject>Addictions</subject><subject>Data collection</subject><subject>Forests</subject><subject>Habits</subject><subject>Hot-deck</subject><subject>Imputation methods</subject><subject>Missing categorical data</subject><subject>Polytomous regression</subject><subject>Random forests</subject><subject>Regression analysis</subject><subject>Smoking</subject><subject>Smoking habits</subject><issn>1886-516X</issn><issn>1886-516X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><sourceid>M0C</sourceid><sourceid>PIMPY</sourceid><recordid>eNpdjk1LxDAQhosouKz7E4SAFy-FZPM1PcriFyx4UfBkGdtkN0ub1CTVv28WPYhzeWaYh5f3pFowAFVLpl5P_-zn1SqlAz2OprppFtXbJowTRpeCJ8ESN05zxuzKNZq8D30iNkSyR98Pzu_I6FI6ssNsdiG6DgfSY0by5fKezN59lqzyIxPmbKK_qM4sDsmsfrmsXu5unzcP9fbp_nFzs60NA5FrlJYpoVErraTURlgmjVUNUOQGmKAg2VpRvRZgegCkDA124l3IBrG3jC-r65_cKYaP2aTclqadGQb0JsypZUA5UOCcFvXqn3oIc_SlXcsaClqBFE2xLn8s0wXvUntEyiG2jHHQnH8Dy-5qZg</recordid><startdate>2014</startdate><enddate>2014</enddate><creator>Torres Munguia, Juan Armando</creator><general>Universidad Pablo de Olavide</general><general>Universidad Pablo de Olavide, Dept de Economia, Metodos Cuantitativos e Historia</general><scope>OT2</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8BJ</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FQK</scope><scope>FRNLG</scope><scope>F~G</scope><scope>JBE</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>M0C</scope><scope>PIMPY</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PYYUZ</scope><scope>Q9U</scope></search><sort><creationdate>2014</creationdate><title>Comparison of imputation methods for handling missing categorical data with univariate pattern</title><author>Torres Munguia, Juan Armando</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-e184t-a5f1647a7676557e4f15ef6980a3e81408512607248ed88a01aeac4b459aadf13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Addictions</topic><topic>Data collection</topic><topic>Forests</topic><topic>Habits</topic><topic>Hot-deck</topic><topic>Imputation methods</topic><topic>Missing categorical data</topic><topic>Polytomous regression</topic><topic>Random forests</topic><topic>Regression analysis</topic><topic>Smoking</topic><topic>Smoking habits</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Torres Munguia, Juan Armando</creatorcontrib><collection>EconStor</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest Business Premium Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>International Bibliography of the Social Sciences</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM global</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ABI/INFORM Collection China</collection><collection>ProQuest Central Basic</collection><jtitle>Revista de métodos cuantitativos para la economía y la empresa</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Torres Munguia, Juan Armando</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of imputation methods for handling missing categorical data with univariate pattern</atitle><jtitle>Revista de métodos cuantitativos para la economía y la empresa</jtitle><date>2014</date><risdate>2014</risdate><volume>17</volume><spage>101</spage><epage>120</epage><pages>101-120</pages><issn>1886-516X</issn><eissn>1886-516X</eissn><abstract>This paper examines the sample proportions estimates in the presence of univariate missing categorical data. A database about smoking habits (2011 National Addiction Survey of Mexico) was used to create simulated yet realistic datasets at rates 5% and 15% of missingness, each for MCAR, MAR and MNAR mechanisms. Then the performance of six methods for addressing missingness is evaluated: listwise, mode imputation, random imputation, hot-deck, imputation by polytomous regression and random forests. Results showed that the most effective methods for dealing with missing categorical data in most of the scenarios assessed in this paper were hot-deck and polytomous regression approaches.</abstract><cop>Sevilla</cop><pub>Universidad Pablo de Olavide</pub><tpages>20</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1886-516X |
ispartof | Revista de métodos cuantitativos para la economía y la empresa, 2014, Vol.17, p.101-120 |
issn | 1886-516X 1886-516X |
language | eng |
recordid | cdi_proquest_miscellaneous_1803808330 |
source | International Bibliography of the Social Sciences (IBSS); Publicly Available Content Database (Proquest) (PQ_SDU_P3); ABI/INFORM global |
subjects | Addictions Data collection Forests Habits Hot-deck Imputation methods Missing categorical data Polytomous regression Random forests Regression analysis Smoking Smoking habits |
title | Comparison of imputation methods for handling missing categorical data with univariate pattern |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T12%3A39%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_econi&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20imputation%20methods%20for%20handling%20missing%20categorical%20data%20with%20univariate%20pattern&rft.jtitle=Revista%20de%20me%CC%81todos%20cuantitativos%20para%20la%20economi%CC%81a%20y%20la%20empresa&rft.au=Torres%20Munguia,%20Juan%20Armando&rft.date=2014&rft.volume=17&rft.spage=101&rft.epage=120&rft.pages=101-120&rft.issn=1886-516X&rft.eissn=1886-516X&rft_id=info:doi/&rft_dat=%3Cproquest_econi%3E1803808330%3C/proquest_econi%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-e184t-a5f1647a7676557e4f15ef6980a3e81408512607248ed88a01aeac4b459aadf13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1908768549&rft_id=info:pmid/&rfr_iscdi=true |