Loading…

A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency

In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could pr...

Full description

Saved in:
Bibliographic Details
Published in:Stats (Basel, Switzerland) Switzerland), 2022-09, Vol.5 (3), p.689-713
Main Authors: Ren, Weijia, Li, Jianzhu, Erciulescu, Andreea, Krenzke, Tom, Mohadjer, Leyla
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3
cites cdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3
container_end_page 713
container_issue 3
container_start_page 689
container_title Stats (Basel, Switzerland)
container_volume 5
creator Ren, Weijia
Li, Jianzhu
Erciulescu, Andreea
Krenzke, Tom
Mohadjer, Leyla
description In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.
doi_str_mv 10.3390/stats5030041
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_675a1bac48b14dbb9720bbe55a0128b7</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A745093763</galeid><doaj_id>oai_doaj_org_article_675a1bac48b14dbb9720bbe55a0128b7</doaj_id><sourcerecordid>A745093763</sourcerecordid><originalsourceid>FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</originalsourceid><addsrcrecordid>eNpNkUFr3DAQhU1poSHJrT9A0Gs3HVmSLR3NkraBhBbSlt6MNBpttHitraQ95N_XW5cS5jDD483Hg9c07zjcCGHgY6m2FgUCQPJXzUWrer4xoH69fnG_ba5L2QNA23dGannR0MB-2hytm4g90kRYY5rZA9Wn5FlImT0e7DSxIZNlt6XGg10NydMU5x1LgdUnYt9yChEjzfh8lgZ_mirbpsOR6lm7at4EOxW6_rcvmx-fbr9vv2zuv36-2w73G5TQ1Q15FVAZhxKNkcGJDgVSB4Zb4RG49tpT67VT1rRGY0ASCOC84SilCOKyuVu5Ptn9eMxL3Pw8JhvHv0LKu9HmGnGiseuV5c6i1I5L75zpW3COlLLAW-36hfV-ZR1z-n2iUsd9OuV5iT-2Pe-U5i3oxXWzunZ2gcY5pJotLuPpEDHNFOKiD71UYETfieXhw_qAOZWSKfyPyWE8Fzm-LFL8AXI3kYk</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2716581208</pqid></control><display><type>article</type><title>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</title><source>Publicly Available Content Database</source><source>ABI/INFORM Global</source><creator>Ren, Weijia ; Li, Jianzhu ; Erciulescu, Andreea ; Krenzke, Tom ; Mohadjer, Leyla</creator><creatorcontrib>Ren, Weijia ; Li, Jianzhu ; Erciulescu, Andreea ; Krenzke, Tom ; Mohadjer, Leyla</creatorcontrib><description>In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.</description><identifier>ISSN: 2571-905X</identifier><identifier>EISSN: 2571-905X</identifier><identifier>DOI: 10.3390/stats5030041</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>adult competency ; Adult literacy ; Censuses ; Cognition &amp; reasoning ; cross-validation ; Data entry ; Estimates ; Estimation theory ; Feature selection ; Functional literacy ; Generalized linear models ; Methods ; multiple data sources ; multivariate LASSO ; small area estimation ; Variables</subject><ispartof>Stats (Basel, Switzerland), 2022-09, Vol.5 (3), p.689-713</ispartof><rights>COPYRIGHT 2022 MDPI AG</rights><rights>2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</citedby><cites>FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</cites><orcidid>0000-0002-7076-786X ; 0000-0002-0848-1991</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2716581208/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2716581208?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,11687,25752,27923,27924,36059,37011,44362,44589,74666,74897</link.rule.ids></links><search><creatorcontrib>Ren, Weijia</creatorcontrib><creatorcontrib>Li, Jianzhu</creatorcontrib><creatorcontrib>Erciulescu, Andreea</creatorcontrib><creatorcontrib>Krenzke, Tom</creatorcontrib><creatorcontrib>Mohadjer, Leyla</creatorcontrib><title>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</title><title>Stats (Basel, Switzerland)</title><description>In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.</description><subject>adult competency</subject><subject>Adult literacy</subject><subject>Censuses</subject><subject>Cognition &amp; reasoning</subject><subject>cross-validation</subject><subject>Data entry</subject><subject>Estimates</subject><subject>Estimation theory</subject><subject>Feature selection</subject><subject>Functional literacy</subject><subject>Generalized linear models</subject><subject>Methods</subject><subject>multiple data sources</subject><subject>multivariate LASSO</subject><subject>small area estimation</subject><subject>Variables</subject><issn>2571-905X</issn><issn>2571-905X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNkUFr3DAQhU1poSHJrT9A0Gs3HVmSLR3NkraBhBbSlt6MNBpttHitraQ95N_XW5cS5jDD483Hg9c07zjcCGHgY6m2FgUCQPJXzUWrer4xoH69fnG_ba5L2QNA23dGannR0MB-2hytm4g90kRYY5rZA9Wn5FlImT0e7DSxIZNlt6XGg10NydMU5x1LgdUnYt9yChEjzfh8lgZ_mirbpsOR6lm7at4EOxW6_rcvmx-fbr9vv2zuv36-2w73G5TQ1Q15FVAZhxKNkcGJDgVSB4Zb4RG49tpT67VT1rRGY0ASCOC84SilCOKyuVu5Ptn9eMxL3Pw8JhvHv0LKu9HmGnGiseuV5c6i1I5L75zpW3COlLLAW-36hfV-ZR1z-n2iUsd9OuV5iT-2Pe-U5i3oxXWzunZ2gcY5pJotLuPpEDHNFOKiD71UYETfieXhw_qAOZWSKfyPyWE8Fzm-LFL8AXI3kYk</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Ren, Weijia</creator><creator>Li, Jianzhu</creator><creator>Erciulescu, Andreea</creator><creator>Krenzke, Tom</creator><creator>Mohadjer, Leyla</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>M0C</scope><scope>PIMPY</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7076-786X</orcidid><orcidid>https://orcid.org/0000-0002-0848-1991</orcidid></search><sort><creationdate>20220901</creationdate><title>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</title><author>Ren, Weijia ; Li, Jianzhu ; Erciulescu, Andreea ; Krenzke, Tom ; Mohadjer, Leyla</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>adult competency</topic><topic>Adult literacy</topic><topic>Censuses</topic><topic>Cognition &amp; reasoning</topic><topic>cross-validation</topic><topic>Data entry</topic><topic>Estimates</topic><topic>Estimation theory</topic><topic>Feature selection</topic><topic>Functional literacy</topic><topic>Generalized linear models</topic><topic>Methods</topic><topic>multiple data sources</topic><topic>multivariate LASSO</topic><topic>small area estimation</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ren, Weijia</creatorcontrib><creatorcontrib>Li, Jianzhu</creatorcontrib><creatorcontrib>Erciulescu, Andreea</creatorcontrib><creatorcontrib>Krenzke, Tom</creatorcontrib><creatorcontrib>Mohadjer, Leyla</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Global</collection><collection>Publicly Available Content Database</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Directory of Open Access Journals</collection><jtitle>Stats (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ren, Weijia</au><au>Li, Jianzhu</au><au>Erciulescu, Andreea</au><au>Krenzke, Tom</au><au>Mohadjer, Leyla</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</atitle><jtitle>Stats (Basel, Switzerland)</jtitle><date>2022-09-01</date><risdate>2022</risdate><volume>5</volume><issue>3</issue><spage>689</spage><epage>713</epage><pages>689-713</pages><issn>2571-905X</issn><eissn>2571-905X</eissn><abstract>In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/stats5030041</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0002-7076-786X</orcidid><orcidid>https://orcid.org/0000-0002-0848-1991</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2571-905X
ispartof Stats (Basel, Switzerland), 2022-09, Vol.5 (3), p.689-713
issn 2571-905X
2571-905X
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_675a1bac48b14dbb9720bbe55a0128b7
source Publicly Available Content Database; ABI/INFORM Global
subjects adult competency
Adult literacy
Censuses
Cognition & reasoning
cross-validation
Data entry
Estimates
Estimation theory
Feature selection
Functional literacy
Generalized linear models
Methods
multiple data sources
multivariate LASSO
small area estimation
Variables
title A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T23%3A47%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Variable%20Selection%20Method%20for%20Small%20Area%20Estimation%20Modeling%20of%20the%20Proficiency%20of%20Adult%20Competency&rft.jtitle=Stats%20(Basel,%20Switzerland)&rft.au=Ren,%20Weijia&rft.date=2022-09-01&rft.volume=5&rft.issue=3&rft.spage=689&rft.epage=713&rft.pages=689-713&rft.issn=2571-905X&rft.eissn=2571-905X&rft_id=info:doi/10.3390/stats5030041&rft_dat=%3Cgale_doaj_%3EA745093763%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2716581208&rft_id=info:pmid/&rft_galeid=A745093763&rfr_iscdi=true