Loading…

Approximate median regression for complex survey data with skewed response

The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical metho...

Full description

Saved in:
Bibliographic Details
Published in:Biometrics 2016-12, Vol.72 (4), p.1336-1347
Main Authors: Fraser, Raphael André, Lipsitz, Stuart R., Sinha, Debajyoti, Fitzmaurice, Garrett M., Pan, Yi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83
cites cdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83
container_end_page 1347
container_issue 4
container_start_page 1336
container_title Biometrics
container_volume 72
creator Fraser, Raphael André
Lipsitz, Stuart R.
Sinha, Debajyoti
Fitzmaurice, Garrett M.
Pan, Yi
description The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.
doi_str_mv 10.1111/biom.12517
format article
fullrecord <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5055849</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>44695351</jstor_id><sourcerecordid>44695351</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</originalsourceid><addsrcrecordid>eNp9kUtv1DAUhSMEotPChj0oEhuElOL3Y1OpVG0pKq2EeImN5dhO62kSBzvTmfn3eEg7Aha9G8s63z06V6coXkCwD_O8q33o9iGikD8qZpASWAGCwONiBgBgFSbwx06xm9I8fyUF6GmxgzhgiDI0Kz4eDkMMK9_p0ZWds173ZXRX0aXkQ182IZYmdEPrVmVaxFu3Lq0edbn043WZbtzS2YynIfTJPSueNLpN7vndu1d8PTn-cvShOr88PTs6PK8MxYxXTkhpqQZQE2y1FggwRoDEQlhLgZTGNLp2xtZEWG44NKipreFYAktZ3Qi8VxxMvsOizomN68eoWzXEfERcq6C9-lfp_bW6CreKAkoFkdngzZ1BDL8WLo2q88m4ttW9C4ukoEAsD-Ygo6__Q-dhEft8noISYY5zfPQgJSgGlHC5yf12okwMKUXXbCNDoDZFqk2R6k-RGX7195Fb9L65DMAJWPrWrR-wUu_PLj_dm76cduZpDHG7QwiTFFOY9WrSfRrdaqvreKMYx5yq7xen6pv4fPKT4QtF8W8XUsIK</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1853054798</pqid></control><display><type>article</type><title>Approximate median regression for complex survey data with skewed response</title><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Oxford Journals Online</source><source>SPORTDiscus with Full Text</source><creator>Fraser, Raphael André ; Lipsitz, Stuart R. ; Sinha, Debajyoti ; Fitzmaurice, Garrett M. ; Pan, Yi</creator><creatorcontrib>Fraser, Raphael André ; Lipsitz, Stuart R. ; Sinha, Debajyoti ; Fitzmaurice, Garrett M. ; Pan, Yi</creatorcontrib><description>The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.</description><identifier>ISSN: 0006-341X</identifier><identifier>EISSN: 1541-0420</identifier><identifier>DOI: 10.1111/biom.12517</identifier><identifier>PMID: 27062562</identifier><identifier>CODEN: BIOMA5</identifier><language>eng</language><publisher>United States: Blackwell Publishing Ltd</publisher><subject>BIOMETRIC PRACTICE ; Cancer ; Clinical Laboratory Services - statistics &amp; numerical data ; Complex survey ; Complex variables ; Data processing ; Estimation ; Genetic transformation ; Humans ; Iodine ; Iodine - urine ; Mathematical models ; Mean square values ; Median regression ; Models, Statistical ; Nutrition ; Parameter estimation ; Population (statistical) ; Quantile regression ; Regression Analysis ; Resampling ; Risk analysis ; Risk factors ; Sandwich estimator ; Statistical methods ; Surveys and Questionnaires ; Transform-both-sides ; Transformations (mathematics) ; Urinalysis ; Weighting</subject><ispartof>Biometrics, 2016-12, Vol.72 (4), p.1336-1347</ispartof><rights>Copyright © 2016 International Biometric Society</rights><rights>2016, The International Biometric Society</rights><rights>2016, The International Biometric Society.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</citedby><cites>FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/44695351$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/44695351$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,58238,58471</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27062562$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Fraser, Raphael André</creatorcontrib><creatorcontrib>Lipsitz, Stuart R.</creatorcontrib><creatorcontrib>Sinha, Debajyoti</creatorcontrib><creatorcontrib>Fitzmaurice, Garrett M.</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><title>Approximate median regression for complex survey data with skewed response</title><title>Biometrics</title><addtitle>Biom</addtitle><description>The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.</description><subject>BIOMETRIC PRACTICE</subject><subject>Cancer</subject><subject>Clinical Laboratory Services - statistics &amp; numerical data</subject><subject>Complex survey</subject><subject>Complex variables</subject><subject>Data processing</subject><subject>Estimation</subject><subject>Genetic transformation</subject><subject>Humans</subject><subject>Iodine</subject><subject>Iodine - urine</subject><subject>Mathematical models</subject><subject>Mean square values</subject><subject>Median regression</subject><subject>Models, Statistical</subject><subject>Nutrition</subject><subject>Parameter estimation</subject><subject>Population (statistical)</subject><subject>Quantile regression</subject><subject>Regression Analysis</subject><subject>Resampling</subject><subject>Risk analysis</subject><subject>Risk factors</subject><subject>Sandwich estimator</subject><subject>Statistical methods</subject><subject>Surveys and Questionnaires</subject><subject>Transform-both-sides</subject><subject>Transformations (mathematics)</subject><subject>Urinalysis</subject><subject>Weighting</subject><issn>0006-341X</issn><issn>1541-0420</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kUtv1DAUhSMEotPChj0oEhuElOL3Y1OpVG0pKq2EeImN5dhO62kSBzvTmfn3eEg7Aha9G8s63z06V6coXkCwD_O8q33o9iGikD8qZpASWAGCwONiBgBgFSbwx06xm9I8fyUF6GmxgzhgiDI0Kz4eDkMMK9_p0ZWds173ZXRX0aXkQ182IZYmdEPrVmVaxFu3Lq0edbn043WZbtzS2YynIfTJPSueNLpN7vndu1d8PTn-cvShOr88PTs6PK8MxYxXTkhpqQZQE2y1FggwRoDEQlhLgZTGNLp2xtZEWG44NKipreFYAktZ3Qi8VxxMvsOizomN68eoWzXEfERcq6C9-lfp_bW6CreKAkoFkdngzZ1BDL8WLo2q88m4ttW9C4ukoEAsD-Ygo6__Q-dhEft8noISYY5zfPQgJSgGlHC5yf12okwMKUXXbCNDoDZFqk2R6k-RGX7195Fb9L65DMAJWPrWrR-wUu_PLj_dm76cduZpDHG7QwiTFFOY9WrSfRrdaqvreKMYx5yq7xen6pv4fPKT4QtF8W8XUsIK</recordid><startdate>201612</startdate><enddate>201612</enddate><creator>Fraser, Raphael André</creator><creator>Lipsitz, Stuart R.</creator><creator>Sinha, Debajyoti</creator><creator>Fitzmaurice, Garrett M.</creator><creator>Pan, Yi</creator><general>Blackwell Publishing Ltd</general><general>Wiley-Blackwell</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>201612</creationdate><title>Approximate median regression for complex survey data with skewed response</title><author>Fraser, Raphael André ; Lipsitz, Stuart R. ; Sinha, Debajyoti ; Fitzmaurice, Garrett M. ; Pan, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>BIOMETRIC PRACTICE</topic><topic>Cancer</topic><topic>Clinical Laboratory Services - statistics &amp; numerical data</topic><topic>Complex survey</topic><topic>Complex variables</topic><topic>Data processing</topic><topic>Estimation</topic><topic>Genetic transformation</topic><topic>Humans</topic><topic>Iodine</topic><topic>Iodine - urine</topic><topic>Mathematical models</topic><topic>Mean square values</topic><topic>Median regression</topic><topic>Models, Statistical</topic><topic>Nutrition</topic><topic>Parameter estimation</topic><topic>Population (statistical)</topic><topic>Quantile regression</topic><topic>Regression Analysis</topic><topic>Resampling</topic><topic>Risk analysis</topic><topic>Risk factors</topic><topic>Sandwich estimator</topic><topic>Statistical methods</topic><topic>Surveys and Questionnaires</topic><topic>Transform-both-sides</topic><topic>Transformations (mathematics)</topic><topic>Urinalysis</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fraser, Raphael André</creatorcontrib><creatorcontrib>Lipsitz, Stuart R.</creatorcontrib><creatorcontrib>Sinha, Debajyoti</creatorcontrib><creatorcontrib>Fitzmaurice, Garrett M.</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Biometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fraser, Raphael André</au><au>Lipsitz, Stuart R.</au><au>Sinha, Debajyoti</au><au>Fitzmaurice, Garrett M.</au><au>Pan, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximate median regression for complex survey data with skewed response</atitle><jtitle>Biometrics</jtitle><addtitle>Biom</addtitle><date>2016-12</date><risdate>2016</risdate><volume>72</volume><issue>4</issue><spage>1336</spage><epage>1347</epage><pages>1336-1347</pages><issn>0006-341X</issn><eissn>1541-0420</eissn><coden>BIOMA5</coden><abstract>The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.</abstract><cop>United States</cop><pub>Blackwell Publishing Ltd</pub><pmid>27062562</pmid><doi>10.1111/biom.12517</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0006-341X
ispartof Biometrics, 2016-12, Vol.72 (4), p.1336-1347
issn 0006-341X
1541-0420
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5055849
source JSTOR Archival Journals and Primary Sources Collection; Oxford Journals Online; SPORTDiscus with Full Text
subjects BIOMETRIC PRACTICE
Cancer
Clinical Laboratory Services - statistics & numerical data
Complex survey
Complex variables
Data processing
Estimation
Genetic transformation
Humans
Iodine
Iodine - urine
Mathematical models
Mean square values
Median regression
Models, Statistical
Nutrition
Parameter estimation
Population (statistical)
Quantile regression
Regression Analysis
Resampling
Risk analysis
Risk factors
Sandwich estimator
Statistical methods
Surveys and Questionnaires
Transform-both-sides
Transformations (mathematics)
Urinalysis
Weighting
title Approximate median regression for complex survey data with skewed response
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T18%3A33%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximate%20median%20regression%20for%20complex%20survey%20data%20with%20skewed%20response&rft.jtitle=Biometrics&rft.au=Fraser,%20Raphael%20Andr%C3%A9&rft.date=2016-12&rft.volume=72&rft.issue=4&rft.spage=1336&rft.epage=1347&rft.pages=1336-1347&rft.issn=0006-341X&rft.eissn=1541-0420&rft.coden=BIOMA5&rft_id=info:doi/10.1111/biom.12517&rft_dat=%3Cjstor_pubme%3E44695351%3C/jstor_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1853054798&rft_id=info:pmid/27062562&rft_jstor_id=44695351&rfr_iscdi=true