Loading…

Semi-Supervised Linear Regression

We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors ( ), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skille...

Full description

Saved in:
Bibliographic Details
Published in:Journal of the American Statistical Association 2022-10, Vol.117 (540), p.2238-2251
Main Authors: Azriel, David, Brown, Lawrence D., Sklar, Michael, Berk, Richard, Buja, Andreas, Zhao, Linda
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13
cites cdi_FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13
container_end_page 2251
container_issue 540
container_start_page 2238
container_title Journal of the American Statistical Association
container_volume 117
creator Azriel, David
Brown, Lawrence D.
Sklar, Michael
Berk, Richard
Buja, Andreas
Zhao, Linda
description We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors ( ), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least-square estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties. The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of ; otherwise, they are asymptotically equivalent. The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.
doi_str_mv 10.1080/01621459.2021.1915320
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1080_01621459_2021_1915320</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2760342998</sourcerecordid><originalsourceid>FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13</originalsourceid><addsrcrecordid>eNp9kF1LwzAUhoMoOKc_QZh43XpO0jTNnTL8goHgFLwLaZpIxtbMpFP2723pvPXcvBx43nPgIeQSIUeo4AawpFhwmVOgmKNEzigckUmfIqOi-Dgmk4HJBuiUnKW0gn5EVU3I1dJufLbcbW389sk2s4VvrY6zV_sZbUo-tOfkxOl1sheHnJL3h_u3-VO2eHl8nt8tMlNw7DLtuJFaYMHAcI61pMJqU2Ndm4Jx0EIY2W-0dBWjTSVQUCjrhmpwHKhBNiXX491tDF87mzq1CrvY9i8VFSWwgkpZ9RQfKRNDStE6tY1-o-NeIajBhvqzoQYb6mCj792OPd-6EDf6J8R1ozq9X4foom6NT4r9f-IXke5kFw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2760342998</pqid></control><display><type>article</type><title>Semi-Supervised Linear Regression</title><source>International Bibliography of the Social Sciences (IBSS)</source><source>Taylor and Francis Science and Technology Collection</source><creator>Azriel, David ; Brown, Lawrence D. ; Sklar, Michael ; Berk, Richard ; Buja, Andreas ; Zhao, Linda</creator><creatorcontrib>Azriel, David ; Brown, Lawrence D. ; Sklar, Michael ; Berk, Richard ; Buja, Andreas ; Zhao, Linda</creatorcontrib><description>We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors ( ), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least-square estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties. The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of ; otherwise, they are asymptotically equivalent. The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.</description><identifier>ISSN: 0162-1459</identifier><identifier>EISSN: 1537-274X</identifier><identifier>DOI: 10.1080/01621459.2021.1915320</identifier><language>eng</language><publisher>Alexandria: Taylor &amp; Francis</publisher><subject>Asymptotic methods ; Asymptotic properties ; Estimates ; Homeless people ; Linear regression ; Misspecified models ; Regression analysis ; Semi-supervised learning ; Simulation ; Statistical methods ; Statistics</subject><ispartof>Journal of the American Statistical Association, 2022-10, Vol.117 (540), p.2238-2251</ispartof><rights>2021 American Statistical Association 2021</rights><rights>2021 American Statistical Association</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13</citedby><cites>FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13</cites><orcidid>0000-0002-9569-576X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,33223</link.rule.ids></links><search><creatorcontrib>Azriel, David</creatorcontrib><creatorcontrib>Brown, Lawrence D.</creatorcontrib><creatorcontrib>Sklar, Michael</creatorcontrib><creatorcontrib>Berk, Richard</creatorcontrib><creatorcontrib>Buja, Andreas</creatorcontrib><creatorcontrib>Zhao, Linda</creatorcontrib><title>Semi-Supervised Linear Regression</title><title>Journal of the American Statistical Association</title><description>We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors ( ), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least-square estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties. The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of ; otherwise, they are asymptotically equivalent. The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.</description><subject>Asymptotic methods</subject><subject>Asymptotic properties</subject><subject>Estimates</subject><subject>Homeless people</subject><subject>Linear regression</subject><subject>Misspecified models</subject><subject>Regression analysis</subject><subject>Semi-supervised learning</subject><subject>Simulation</subject><subject>Statistical methods</subject><subject>Statistics</subject><issn>0162-1459</issn><issn>1537-274X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>8BJ</sourceid><recordid>eNp9kF1LwzAUhoMoOKc_QZh43XpO0jTNnTL8goHgFLwLaZpIxtbMpFP2723pvPXcvBx43nPgIeQSIUeo4AawpFhwmVOgmKNEzigckUmfIqOi-Dgmk4HJBuiUnKW0gn5EVU3I1dJufLbcbW389sk2s4VvrY6zV_sZbUo-tOfkxOl1sheHnJL3h_u3-VO2eHl8nt8tMlNw7DLtuJFaYMHAcI61pMJqU2Ndm4Jx0EIY2W-0dBWjTSVQUCjrhmpwHKhBNiXX491tDF87mzq1CrvY9i8VFSWwgkpZ9RQfKRNDStE6tY1-o-NeIajBhvqzoQYb6mCj792OPd-6EDf6J8R1ozq9X4foom6NT4r9f-IXke5kFw</recordid><startdate>20221002</startdate><enddate>20221002</enddate><creator>Azriel, David</creator><creator>Brown, Lawrence D.</creator><creator>Sklar, Michael</creator><creator>Berk, Richard</creator><creator>Buja, Andreas</creator><creator>Zhao, Linda</creator><general>Taylor &amp; Francis</general><general>Taylor &amp; Francis Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8BJ</scope><scope>FQK</scope><scope>JBE</scope><scope>K9.</scope><orcidid>https://orcid.org/0000-0002-9569-576X</orcidid></search><sort><creationdate>20221002</creationdate><title>Semi-Supervised Linear Regression</title><author>Azriel, David ; Brown, Lawrence D. ; Sklar, Michael ; Berk, Richard ; Buja, Andreas ; Zhao, Linda</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Asymptotic methods</topic><topic>Asymptotic properties</topic><topic>Estimates</topic><topic>Homeless people</topic><topic>Linear regression</topic><topic>Misspecified models</topic><topic>Regression analysis</topic><topic>Semi-supervised learning</topic><topic>Simulation</topic><topic>Statistical methods</topic><topic>Statistics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Azriel, David</creatorcontrib><creatorcontrib>Brown, Lawrence D.</creatorcontrib><creatorcontrib>Sklar, Michael</creatorcontrib><creatorcontrib>Berk, Richard</creatorcontrib><creatorcontrib>Buja, Andreas</creatorcontrib><creatorcontrib>Zhao, Linda</creatorcontrib><collection>CrossRef</collection><collection>International Bibliography of the Social Sciences (IBSS)</collection><collection>International Bibliography of the Social Sciences</collection><collection>International Bibliography of the Social Sciences</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><jtitle>Journal of the American Statistical Association</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Azriel, David</au><au>Brown, Lawrence D.</au><au>Sklar, Michael</au><au>Berk, Richard</au><au>Buja, Andreas</au><au>Zhao, Linda</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Semi-Supervised Linear Regression</atitle><jtitle>Journal of the American Statistical Association</jtitle><date>2022-10-02</date><risdate>2022</risdate><volume>117</volume><issue>540</issue><spage>2238</spage><epage>2251</epage><pages>2238-2251</pages><issn>0162-1459</issn><eissn>1537-274X</eissn><abstract>We study a regression problem where for some part of the data we observe both the label variable (Y) and the predictors ( ), while for other part of the data only the predictors are given. Such a problem arises, for example, when observations of the label variable are costly and may require a skilled human agent. When the conditional expectation is not exactly linear, one can consider the best linear approximation to the conditional expectation, which can be estimated consistently by the least-square estimates (LSE). The latter depends only on the labeled data. We suggest improved alternative estimates to the LSE that use also the unlabeled data. Our estimation method can be easily implemented and has simply described asymptotic properties. The new estimates asymptotically dominate the usual standard procedures under certain non-linearity condition of ; otherwise, they are asymptotically equivalent. The performance of the new estimator for small sample size is investigated in an extensive simulation study. A real data example of inferring homeless population is used to illustrate the new methodology.</abstract><cop>Alexandria</cop><pub>Taylor &amp; Francis</pub><doi>10.1080/01621459.2021.1915320</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-9569-576X</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0162-1459
ispartof Journal of the American Statistical Association, 2022-10, Vol.117 (540), p.2238-2251
issn 0162-1459
1537-274X
language eng
recordid cdi_crossref_primary_10_1080_01621459_2021_1915320
source International Bibliography of the Social Sciences (IBSS); Taylor and Francis Science and Technology Collection
subjects Asymptotic methods
Asymptotic properties
Estimates
Homeless people
Linear regression
Misspecified models
Regression analysis
Semi-supervised learning
Simulation
Statistical methods
Statistics
title Semi-Supervised Linear Regression
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T14%3A41%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Semi-Supervised%20Linear%20Regression&rft.jtitle=Journal%20of%20the%20American%20Statistical%20Association&rft.au=Azriel,%20David&rft.date=2022-10-02&rft.volume=117&rft.issue=540&rft.spage=2238&rft.epage=2251&rft.pages=2238-2251&rft.issn=0162-1459&rft.eissn=1537-274X&rft_id=info:doi/10.1080/01621459.2021.1915320&rft_dat=%3Cproquest_cross%3E2760342998%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c451t-af5c9a71430c551b927eacb1bbc4350a77c9b1b26f832d8717206bd2a0f502c13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2760342998&rft_id=info:pmid/&rfr_iscdi=true