Loading…
A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency
In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could pr...
Saved in:
Published in: | Stats (Basel, Switzerland) Switzerland), 2022-09, Vol.5 (3), p.689-713 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3 |
---|---|
cites | cdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3 |
container_end_page | 713 |
container_issue | 3 |
container_start_page | 689 |
container_title | Stats (Basel, Switzerland) |
container_volume | 5 |
creator | Ren, Weijia Li, Jianzhu Erciulescu, Andreea Krenzke, Tom Mohadjer, Leyla |
description | In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models. |
doi_str_mv | 10.3390/stats5030041 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_675a1bac48b14dbb9720bbe55a0128b7</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A745093763</galeid><doaj_id>oai_doaj_org_article_675a1bac48b14dbb9720bbe55a0128b7</doaj_id><sourcerecordid>A745093763</sourcerecordid><originalsourceid>FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</originalsourceid><addsrcrecordid>eNpNkUFr3DAQhU1poSHJrT9A0Gs3HVmSLR3NkraBhBbSlt6MNBpttHitraQ95N_XW5cS5jDD483Hg9c07zjcCGHgY6m2FgUCQPJXzUWrer4xoH69fnG_ba5L2QNA23dGannR0MB-2hytm4g90kRYY5rZA9Wn5FlImT0e7DSxIZNlt6XGg10NydMU5x1LgdUnYt9yChEjzfh8lgZ_mirbpsOR6lm7at4EOxW6_rcvmx-fbr9vv2zuv36-2w73G5TQ1Q15FVAZhxKNkcGJDgVSB4Zb4RG49tpT67VT1rRGY0ASCOC84SilCOKyuVu5Ptn9eMxL3Pw8JhvHv0LKu9HmGnGiseuV5c6i1I5L75zpW3COlLLAW-36hfV-ZR1z-n2iUsd9OuV5iT-2Pe-U5i3oxXWzunZ2gcY5pJotLuPpEDHNFOKiD71UYETfieXhw_qAOZWSKfyPyWE8Fzm-LFL8AXI3kYk</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2716581208</pqid></control><display><type>article</type><title>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</title><source>Publicly Available Content Database</source><source>ABI/INFORM Global</source><creator>Ren, Weijia ; Li, Jianzhu ; Erciulescu, Andreea ; Krenzke, Tom ; Mohadjer, Leyla</creator><creatorcontrib>Ren, Weijia ; Li, Jianzhu ; Erciulescu, Andreea ; Krenzke, Tom ; Mohadjer, Leyla</creatorcontrib><description>In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.</description><identifier>ISSN: 2571-905X</identifier><identifier>EISSN: 2571-905X</identifier><identifier>DOI: 10.3390/stats5030041</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>adult competency ; Adult literacy ; Censuses ; Cognition & reasoning ; cross-validation ; Data entry ; Estimates ; Estimation theory ; Feature selection ; Functional literacy ; Generalized linear models ; Methods ; multiple data sources ; multivariate LASSO ; small area estimation ; Variables</subject><ispartof>Stats (Basel, Switzerland), 2022-09, Vol.5 (3), p.689-713</ispartof><rights>COPYRIGHT 2022 MDPI AG</rights><rights>2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</citedby><cites>FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</cites><orcidid>0000-0002-7076-786X ; 0000-0002-0848-1991</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2716581208/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2716581208?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,11687,25752,27923,27924,36059,37011,44362,44589,74666,74897</link.rule.ids></links><search><creatorcontrib>Ren, Weijia</creatorcontrib><creatorcontrib>Li, Jianzhu</creatorcontrib><creatorcontrib>Erciulescu, Andreea</creatorcontrib><creatorcontrib>Krenzke, Tom</creatorcontrib><creatorcontrib>Mohadjer, Leyla</creatorcontrib><title>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</title><title>Stats (Basel, Switzerland)</title><description>In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.</description><subject>adult competency</subject><subject>Adult literacy</subject><subject>Censuses</subject><subject>Cognition & reasoning</subject><subject>cross-validation</subject><subject>Data entry</subject><subject>Estimates</subject><subject>Estimation theory</subject><subject>Feature selection</subject><subject>Functional literacy</subject><subject>Generalized linear models</subject><subject>Methods</subject><subject>multiple data sources</subject><subject>multivariate LASSO</subject><subject>small area estimation</subject><subject>Variables</subject><issn>2571-905X</issn><issn>2571-905X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpNkUFr3DAQhU1poSHJrT9A0Gs3HVmSLR3NkraBhBbSlt6MNBpttHitraQ95N_XW5cS5jDD483Hg9c07zjcCGHgY6m2FgUCQPJXzUWrer4xoH69fnG_ba5L2QNA23dGannR0MB-2hytm4g90kRYY5rZA9Wn5FlImT0e7DSxIZNlt6XGg10NydMU5x1LgdUnYt9yChEjzfh8lgZ_mirbpsOR6lm7at4EOxW6_rcvmx-fbr9vv2zuv36-2w73G5TQ1Q15FVAZhxKNkcGJDgVSB4Zb4RG49tpT67VT1rRGY0ASCOC84SilCOKyuVu5Ptn9eMxL3Pw8JhvHv0LKu9HmGnGiseuV5c6i1I5L75zpW3COlLLAW-36hfV-ZR1z-n2iUsd9OuV5iT-2Pe-U5i3oxXWzunZ2gcY5pJotLuPpEDHNFOKiD71UYETfieXhw_qAOZWSKfyPyWE8Fzm-LFL8AXI3kYk</recordid><startdate>20220901</startdate><enddate>20220901</enddate><creator>Ren, Weijia</creator><creator>Li, Jianzhu</creator><creator>Erciulescu, Andreea</creator><creator>Krenzke, Tom</creator><creator>Mohadjer, Leyla</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>K60</scope><scope>K6~</scope><scope>L.-</scope><scope>M0C</scope><scope>PIMPY</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7076-786X</orcidid><orcidid>https://orcid.org/0000-0002-0848-1991</orcidid></search><sort><creationdate>20220901</creationdate><title>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</title><author>Ren, Weijia ; Li, Jianzhu ; Erciulescu, Andreea ; Krenzke, Tom ; Mohadjer, Leyla</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>adult competency</topic><topic>Adult literacy</topic><topic>Censuses</topic><topic>Cognition & reasoning</topic><topic>cross-validation</topic><topic>Data entry</topic><topic>Estimates</topic><topic>Estimation theory</topic><topic>Feature selection</topic><topic>Functional literacy</topic><topic>Generalized linear models</topic><topic>Methods</topic><topic>multiple data sources</topic><topic>multivariate LASSO</topic><topic>small area estimation</topic><topic>Variables</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ren, Weijia</creatorcontrib><creatorcontrib>Li, Jianzhu</creatorcontrib><creatorcontrib>Erciulescu, Andreea</creatorcontrib><creatorcontrib>Krenzke, Tom</creatorcontrib><creatorcontrib>Mohadjer, Leyla</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ABI/INFORM Collection</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Global</collection><collection>Publicly Available Content Database</collection><collection>One Business</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Directory of Open Access Journals</collection><jtitle>Stats (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ren, Weijia</au><au>Li, Jianzhu</au><au>Erciulescu, Andreea</au><au>Krenzke, Tom</au><au>Mohadjer, Leyla</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency</atitle><jtitle>Stats (Basel, Switzerland)</jtitle><date>2022-09-01</date><risdate>2022</risdate><volume>5</volume><issue>3</issue><spage>689</spage><epage>713</epage><pages>689-713</pages><issn>2571-905X</issn><eissn>2571-905X</eissn><abstract>In statistical modeling, it is crucial to have consistent variables that are the most relevant to the outcome variable(s) of interest in the model. With the increasing richness of data from multiple sources, the size of the pool of potential variables is escalating. Some variables, however, could provide redundant information, add noise to the estimation, or waste the degrees of freedom in the model. Therefore, variable selection is needed as a parsimonious process that aims to identify a minimal set of covariates for maximum predictive power. This study illustrated the variable selection methods considered and used in the small area estimation (SAE) modeling of measures related to the proficiency of adult competency that were constructed using survey data collected in the first cycle of the PIAAC. The developed variable selection process consisted of two phases: phase 1 identified a small set of variables that were consistently highly correlated with the outcomes through methods such as correlation matrix and multivariate LASSO analysis; phase 2 utilized a k-fold cross-validation process to select a final set of variables to be used in the final SAE models.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/stats5030041</doi><tpages>25</tpages><orcidid>https://orcid.org/0000-0002-7076-786X</orcidid><orcidid>https://orcid.org/0000-0002-0848-1991</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2571-905X |
ispartof | Stats (Basel, Switzerland), 2022-09, Vol.5 (3), p.689-713 |
issn | 2571-905X 2571-905X |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_675a1bac48b14dbb9720bbe55a0128b7 |
source | Publicly Available Content Database; ABI/INFORM Global |
subjects | adult competency Adult literacy Censuses Cognition & reasoning cross-validation Data entry Estimates Estimation theory Feature selection Functional literacy Generalized linear models Methods multiple data sources multivariate LASSO small area estimation Variables |
title | A Variable Selection Method for Small Area Estimation Modeling of the Proficiency of Adult Competency |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T23%3A47%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Variable%20Selection%20Method%20for%20Small%20Area%20Estimation%20Modeling%20of%20the%20Proficiency%20of%20Adult%20Competency&rft.jtitle=Stats%20(Basel,%20Switzerland)&rft.au=Ren,%20Weijia&rft.date=2022-09-01&rft.volume=5&rft.issue=3&rft.spage=689&rft.epage=713&rft.pages=689-713&rft.issn=2571-905X&rft.eissn=2571-905X&rft_id=info:doi/10.3390/stats5030041&rft_dat=%3Cgale_doaj_%3EA745093763%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c406t-ed5fc59bc4c994fb36c3ce6091a3dc018d8de2d8b5a9298cfce3c00bd91c443f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2716581208&rft_id=info:pmid/&rft_galeid=A745093763&rfr_iscdi=true |