Loading…
Approximate median regression for complex survey data with skewed response
The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical metho...
Saved in:
Published in: | Biometrics 2016-12, Vol.72 (4), p.1336-1347 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83 |
---|---|
cites | cdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83 |
container_end_page | 1347 |
container_issue | 4 |
container_start_page | 1336 |
container_title | Biometrics |
container_volume | 72 |
creator | Fraser, Raphael André Lipsitz, Stuart R. Sinha, Debajyoti Fitzmaurice, Garrett M. Pan, Yi |
description | The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. |
doi_str_mv | 10.1111/biom.12517 |
format | article |
fullrecord | <record><control><sourceid>jstor_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5055849</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>44695351</jstor_id><sourcerecordid>44695351</sourcerecordid><originalsourceid>FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</originalsourceid><addsrcrecordid>eNp9kUtv1DAUhSMEotPChj0oEhuElOL3Y1OpVG0pKq2EeImN5dhO62kSBzvTmfn3eEg7Aha9G8s63z06V6coXkCwD_O8q33o9iGikD8qZpASWAGCwONiBgBgFSbwx06xm9I8fyUF6GmxgzhgiDI0Kz4eDkMMK9_p0ZWds173ZXRX0aXkQ182IZYmdEPrVmVaxFu3Lq0edbn043WZbtzS2YynIfTJPSueNLpN7vndu1d8PTn-cvShOr88PTs6PK8MxYxXTkhpqQZQE2y1FggwRoDEQlhLgZTGNLp2xtZEWG44NKipreFYAktZ3Qi8VxxMvsOizomN68eoWzXEfERcq6C9-lfp_bW6CreKAkoFkdngzZ1BDL8WLo2q88m4ttW9C4ukoEAsD-Ygo6__Q-dhEft8noISYY5zfPQgJSgGlHC5yf12okwMKUXXbCNDoDZFqk2R6k-RGX7195Fb9L65DMAJWPrWrR-wUu_PLj_dm76cduZpDHG7QwiTFFOY9WrSfRrdaqvreKMYx5yq7xen6pv4fPKT4QtF8W8XUsIK</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1853054798</pqid></control><display><type>article</type><title>Approximate median regression for complex survey data with skewed response</title><source>JSTOR Archival Journals and Primary Sources Collection</source><source>Oxford Journals Online</source><source>SPORTDiscus with Full Text</source><creator>Fraser, Raphael André ; Lipsitz, Stuart R. ; Sinha, Debajyoti ; Fitzmaurice, Garrett M. ; Pan, Yi</creator><creatorcontrib>Fraser, Raphael André ; Lipsitz, Stuart R. ; Sinha, Debajyoti ; Fitzmaurice, Garrett M. ; Pan, Yi</creatorcontrib><description>The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.</description><identifier>ISSN: 0006-341X</identifier><identifier>EISSN: 1541-0420</identifier><identifier>DOI: 10.1111/biom.12517</identifier><identifier>PMID: 27062562</identifier><identifier>CODEN: BIOMA5</identifier><language>eng</language><publisher>United States: Blackwell Publishing Ltd</publisher><subject>BIOMETRIC PRACTICE ; Cancer ; Clinical Laboratory Services - statistics & numerical data ; Complex survey ; Complex variables ; Data processing ; Estimation ; Genetic transformation ; Humans ; Iodine ; Iodine - urine ; Mathematical models ; Mean square values ; Median regression ; Models, Statistical ; Nutrition ; Parameter estimation ; Population (statistical) ; Quantile regression ; Regression Analysis ; Resampling ; Risk analysis ; Risk factors ; Sandwich estimator ; Statistical methods ; Surveys and Questionnaires ; Transform-both-sides ; Transformations (mathematics) ; Urinalysis ; Weighting</subject><ispartof>Biometrics, 2016-12, Vol.72 (4), p.1336-1347</ispartof><rights>Copyright © 2016 International Biometric Society</rights><rights>2016, The International Biometric Society</rights><rights>2016, The International Biometric Society.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</citedby><cites>FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/44695351$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/44695351$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>230,314,780,784,885,27924,27925,58238,58471</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/27062562$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Fraser, Raphael André</creatorcontrib><creatorcontrib>Lipsitz, Stuart R.</creatorcontrib><creatorcontrib>Sinha, Debajyoti</creatorcontrib><creatorcontrib>Fitzmaurice, Garrett M.</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><title>Approximate median regression for complex survey data with skewed response</title><title>Biometrics</title><addtitle>Biom</addtitle><description>The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.</description><subject>BIOMETRIC PRACTICE</subject><subject>Cancer</subject><subject>Clinical Laboratory Services - statistics & numerical data</subject><subject>Complex survey</subject><subject>Complex variables</subject><subject>Data processing</subject><subject>Estimation</subject><subject>Genetic transformation</subject><subject>Humans</subject><subject>Iodine</subject><subject>Iodine - urine</subject><subject>Mathematical models</subject><subject>Mean square values</subject><subject>Median regression</subject><subject>Models, Statistical</subject><subject>Nutrition</subject><subject>Parameter estimation</subject><subject>Population (statistical)</subject><subject>Quantile regression</subject><subject>Regression Analysis</subject><subject>Resampling</subject><subject>Risk analysis</subject><subject>Risk factors</subject><subject>Sandwich estimator</subject><subject>Statistical methods</subject><subject>Surveys and Questionnaires</subject><subject>Transform-both-sides</subject><subject>Transformations (mathematics)</subject><subject>Urinalysis</subject><subject>Weighting</subject><issn>0006-341X</issn><issn>1541-0420</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kUtv1DAUhSMEotPChj0oEhuElOL3Y1OpVG0pKq2EeImN5dhO62kSBzvTmfn3eEg7Aha9G8s63z06V6coXkCwD_O8q33o9iGikD8qZpASWAGCwONiBgBgFSbwx06xm9I8fyUF6GmxgzhgiDI0Kz4eDkMMK9_p0ZWds173ZXRX0aXkQ182IZYmdEPrVmVaxFu3Lq0edbn043WZbtzS2YynIfTJPSueNLpN7vndu1d8PTn-cvShOr88PTs6PK8MxYxXTkhpqQZQE2y1FggwRoDEQlhLgZTGNLp2xtZEWG44NKipreFYAktZ3Qi8VxxMvsOizomN68eoWzXEfERcq6C9-lfp_bW6CreKAkoFkdngzZ1BDL8WLo2q88m4ttW9C4ukoEAsD-Ygo6__Q-dhEft8noISYY5zfPQgJSgGlHC5yf12okwMKUXXbCNDoDZFqk2R6k-RGX7195Fb9L65DMAJWPrWrR-wUu_PLj_dm76cduZpDHG7QwiTFFOY9WrSfRrdaqvreKMYx5yq7xen6pv4fPKT4QtF8W8XUsIK</recordid><startdate>201612</startdate><enddate>201612</enddate><creator>Fraser, Raphael André</creator><creator>Lipsitz, Stuart R.</creator><creator>Sinha, Debajyoti</creator><creator>Fitzmaurice, Garrett M.</creator><creator>Pan, Yi</creator><general>Blackwell Publishing Ltd</general><general>Wiley-Blackwell</general><scope>BSCLL</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>201612</creationdate><title>Approximate median regression for complex survey data with skewed response</title><author>Fraser, Raphael André ; Lipsitz, Stuart R. ; Sinha, Debajyoti ; Fitzmaurice, Garrett M. ; Pan, Yi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>BIOMETRIC PRACTICE</topic><topic>Cancer</topic><topic>Clinical Laboratory Services - statistics & numerical data</topic><topic>Complex survey</topic><topic>Complex variables</topic><topic>Data processing</topic><topic>Estimation</topic><topic>Genetic transformation</topic><topic>Humans</topic><topic>Iodine</topic><topic>Iodine - urine</topic><topic>Mathematical models</topic><topic>Mean square values</topic><topic>Median regression</topic><topic>Models, Statistical</topic><topic>Nutrition</topic><topic>Parameter estimation</topic><topic>Population (statistical)</topic><topic>Quantile regression</topic><topic>Regression Analysis</topic><topic>Resampling</topic><topic>Risk analysis</topic><topic>Risk factors</topic><topic>Sandwich estimator</topic><topic>Statistical methods</topic><topic>Surveys and Questionnaires</topic><topic>Transform-both-sides</topic><topic>Transformations (mathematics)</topic><topic>Urinalysis</topic><topic>Weighting</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fraser, Raphael André</creatorcontrib><creatorcontrib>Lipsitz, Stuart R.</creatorcontrib><creatorcontrib>Sinha, Debajyoti</creatorcontrib><creatorcontrib>Fitzmaurice, Garrett M.</creatorcontrib><creatorcontrib>Pan, Yi</creatorcontrib><collection>Istex</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Biometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fraser, Raphael André</au><au>Lipsitz, Stuart R.</au><au>Sinha, Debajyoti</au><au>Fitzmaurice, Garrett M.</au><au>Pan, Yi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Approximate median regression for complex survey data with skewed response</atitle><jtitle>Biometrics</jtitle><addtitle>Biom</addtitle><date>2016-12</date><risdate>2016</risdate><volume>72</volume><issue>4</issue><spage>1336</spage><epage>1347</epage><pages>1336-1347</pages><issn>0006-341X</issn><eissn>1541-0420</eissn><coden>BIOMA5</coden><abstract>The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)-based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey.</abstract><cop>United States</cop><pub>Blackwell Publishing Ltd</pub><pmid>27062562</pmid><doi>10.1111/biom.12517</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0006-341X |
ispartof | Biometrics, 2016-12, Vol.72 (4), p.1336-1347 |
issn | 0006-341X 1541-0420 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_5055849 |
source | JSTOR Archival Journals and Primary Sources Collection; Oxford Journals Online; SPORTDiscus with Full Text |
subjects | BIOMETRIC PRACTICE Cancer Clinical Laboratory Services - statistics & numerical data Complex survey Complex variables Data processing Estimation Genetic transformation Humans Iodine Iodine - urine Mathematical models Mean square values Median regression Models, Statistical Nutrition Parameter estimation Population (statistical) Quantile regression Regression Analysis Resampling Risk analysis Risk factors Sandwich estimator Statistical methods Surveys and Questionnaires Transform-both-sides Transformations (mathematics) Urinalysis Weighting |
title | Approximate median regression for complex survey data with skewed response |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T18%3A33%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Approximate%20median%20regression%20for%20complex%20survey%20data%20with%20skewed%20response&rft.jtitle=Biometrics&rft.au=Fraser,%20Raphael%20Andr%C3%A9&rft.date=2016-12&rft.volume=72&rft.issue=4&rft.spage=1336&rft.epage=1347&rft.pages=1336-1347&rft.issn=0006-341X&rft.eissn=1541-0420&rft.coden=BIOMA5&rft_id=info:doi/10.1111/biom.12517&rft_dat=%3Cjstor_pubme%3E44695351%3C/jstor_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c5367-e899d5a01a43daa82066409388dd5099ccfabecdb48d7c71c2fbdc7390d56bf83%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1853054798&rft_id=info:pmid/27062562&rft_jstor_id=44695351&rfr_iscdi=true |