Loading…
Two-stage sampling from a prediction point of view when the cluster sizes are unknown
We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of t...
Saved in:
Published in: | Biometrika 2008-03, Vol.95 (1), p.187-204 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 204 |
container_issue | 1 |
container_start_page | 187 |
container_title | Biometrika |
container_volume | 95 |
creator | Bjørnstad, Jan F. Ytterstad, Elinor |
description | We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n0 of sampled clusters, they differ significantly, but for large n0, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 − α and are slightly less than 1 − α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 − 2α, being raised to 1 − α for a modified interval based on the distribution. |
doi_str_mv | 10.1093/biomet/asm098 |
format | article |
fullrecord | <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_journals_201695704</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><jstor_id>20441451</jstor_id><oup_id>10.1093/biomet/asm098</oup_id><sourcerecordid>20441451</sourcerecordid><originalsourceid>FETCH-LOGICAL-c410t-1f8b4acc7589b7e8ce0b0b5ab2d79660317ebf68420a5e4fc55751f297dee39c3</originalsourceid><addsrcrecordid>eNqFkc9rFDEUx4MouFaPHoUgCF6mTSa_JkcprWspemlRvIRM9k03253JmGS61r_eLLNsvXl4-fLIh-_35QWht5ScUqLZWetDD_nMpp7o5hlaUC55xQQlz9GCECIrxjl_iV6ltNm3UsgFur3ZhSplewc42X7c-uEOdzH02OIxwsq77MOAx-CHjEOHHzzs8G4NA85rwG47pQwRJ_8HErYR8DTcD2E3vEYvOrtN8OagJ-j28uLmfFldf_v85fzTdeU4JbmiXdNy65wSjW4VNA5IS1ph23qltJSEUQVtJxteEyuAd04IJWhXa7UCYNqxE_R-9h1j-DVBymYTpjiUSFMTKrVQhBeomiEXQ0oROjNG39v4aCgx-8WZeXFmXlzhr2Y-wgjuCIdpPHAPhlktyvFYqiakKeJL0VLjXhtV4rlZ576YfThMaJOz2y7awfl0NN1PSSnXhfs4cyXmv_O9m9FNyiH-Y8U55YI-vdeXz_l9vLfx3kjFlDDLHz_N5fcluxJSm6_sL0K4sGY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>201695704</pqid></control><display><type>article</type><title>Two-stage sampling from a prediction point of view when the cluster sizes are unknown</title><source>Oxford Journals Online</source><source>JSTOR</source><creator>Bjørnstad, Jan F. ; Ytterstad, Elinor</creator><creatorcontrib>Bjørnstad, Jan F. ; Ytterstad, Elinor</creatorcontrib><description>We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n0 of sampled clusters, they differ significantly, but for large n0, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 − α and are slightly less than 1 − α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 − 2α, being raised to 1 − α for a modified interval based on the distribution.</description><identifier>ISSN: 0006-3444</identifier><identifier>EISSN: 1464-3510</identifier><identifier>DOI: 10.1093/biomet/asm098</identifier><identifier>CODEN: BIOKAX</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Applications ; Biology, psychology, social sciences ; Estimating techniques ; Estimators ; Exact sciences and technology ; General topics ; Law of likelihood ; Mathematics ; Modeling ; Musical intervals ; Optimal predictor ; Optimization ; Parameter estimation ; Population ; Population model ; Prediction interval ; Predictions ; Predictive likelihood ; Predictive modeling ; Probability and statistics ; Probability theory and stochastic processes ; Random variables ; Sample size ; Sampling theory, sample surveys ; Sciences and techniques of general use ; Simulation ; Simulations ; Statistical methods ; Statistical variance ; Statistics ; Stochastic processes ; Studies ; Survey sampling ; Unbiased estimators</subject><ispartof>Biometrika, 2008-03, Vol.95 (1), p.187-204</ispartof><rights>Copyright 2008 Biometrika Trust</rights><rights>Oxford University Press © 2008 Biometrika Trust 2008</rights><rights>2008 INIST-CNRS</rights><rights>2008 Biometrika Trust</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/20441451$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/20441451$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,58213,58446</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=20161149$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttp://econpapers.repec.org/article/oupbiomet/v_3a95_3ay_3a2008_3ai_3a1_3ap_3a187-204.htm$$DView record in RePEc$$Hfree_for_read</backlink></links><search><creatorcontrib>Bjørnstad, Jan F.</creatorcontrib><creatorcontrib>Ytterstad, Elinor</creatorcontrib><title>Two-stage sampling from a prediction point of view when the cluster sizes are unknown</title><title>Biometrika</title><description>We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n0 of sampled clusters, they differ significantly, but for large n0, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 − α and are slightly less than 1 − α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 − 2α, being raised to 1 − α for a modified interval based on the distribution.</description><subject>Applications</subject><subject>Biology, psychology, social sciences</subject><subject>Estimating techniques</subject><subject>Estimators</subject><subject>Exact sciences and technology</subject><subject>General topics</subject><subject>Law of likelihood</subject><subject>Mathematics</subject><subject>Modeling</subject><subject>Musical intervals</subject><subject>Optimal predictor</subject><subject>Optimization</subject><subject>Parameter estimation</subject><subject>Population</subject><subject>Population model</subject><subject>Prediction interval</subject><subject>Predictions</subject><subject>Predictive likelihood</subject><subject>Predictive modeling</subject><subject>Probability and statistics</subject><subject>Probability theory and stochastic processes</subject><subject>Random variables</subject><subject>Sample size</subject><subject>Sampling theory, sample surveys</subject><subject>Sciences and techniques of general use</subject><subject>Simulation</subject><subject>Simulations</subject><subject>Statistical methods</subject><subject>Statistical variance</subject><subject>Statistics</subject><subject>Stochastic processes</subject><subject>Studies</subject><subject>Survey sampling</subject><subject>Unbiased estimators</subject><issn>0006-3444</issn><issn>1464-3510</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><recordid>eNqFkc9rFDEUx4MouFaPHoUgCF6mTSa_JkcprWspemlRvIRM9k03253JmGS61r_eLLNsvXl4-fLIh-_35QWht5ScUqLZWetDD_nMpp7o5hlaUC55xQQlz9GCECIrxjl_iV6ltNm3UsgFur3ZhSplewc42X7c-uEOdzH02OIxwsq77MOAx-CHjEOHHzzs8G4NA85rwG47pQwRJ_8HErYR8DTcD2E3vEYvOrtN8OagJ-j28uLmfFldf_v85fzTdeU4JbmiXdNy65wSjW4VNA5IS1ph23qltJSEUQVtJxteEyuAd04IJWhXa7UCYNqxE_R-9h1j-DVBymYTpjiUSFMTKrVQhBeomiEXQ0oROjNG39v4aCgx-8WZeXFmXlzhr2Y-wgjuCIdpPHAPhlktyvFYqiakKeJL0VLjXhtV4rlZ576YfThMaJOz2y7awfl0NN1PSSnXhfs4cyXmv_O9m9FNyiH-Y8U55YI-vdeXz_l9vLfx3kjFlDDLHz_N5fcluxJSm6_sL0K4sGY</recordid><startdate>20080301</startdate><enddate>20080301</enddate><creator>Bjørnstad, Jan F.</creator><creator>Ytterstad, Elinor</creator><general>Oxford University Press</general><general>Biometrika Trust, University College London</general><general>Oxford University Press for Biometrika Trust</general><general>Oxford Publishing Limited (England)</general><scope>BSCLL</scope><scope>IQODW</scope><scope>DKI</scope><scope>X2L</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope></search><sort><creationdate>20080301</creationdate><title>Two-stage sampling from a prediction point of view when the cluster sizes are unknown</title><author>Bjørnstad, Jan F. ; Ytterstad, Elinor</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c410t-1f8b4acc7589b7e8ce0b0b5ab2d79660317ebf68420a5e4fc55751f297dee39c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>Applications</topic><topic>Biology, psychology, social sciences</topic><topic>Estimating techniques</topic><topic>Estimators</topic><topic>Exact sciences and technology</topic><topic>General topics</topic><topic>Law of likelihood</topic><topic>Mathematics</topic><topic>Modeling</topic><topic>Musical intervals</topic><topic>Optimal predictor</topic><topic>Optimization</topic><topic>Parameter estimation</topic><topic>Population</topic><topic>Population model</topic><topic>Prediction interval</topic><topic>Predictions</topic><topic>Predictive likelihood</topic><topic>Predictive modeling</topic><topic>Probability and statistics</topic><topic>Probability theory and stochastic processes</topic><topic>Random variables</topic><topic>Sample size</topic><topic>Sampling theory, sample surveys</topic><topic>Sciences and techniques of general use</topic><topic>Simulation</topic><topic>Simulations</topic><topic>Statistical methods</topic><topic>Statistical variance</topic><topic>Statistics</topic><topic>Stochastic processes</topic><topic>Studies</topic><topic>Survey sampling</topic><topic>Unbiased estimators</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bjørnstad, Jan F.</creatorcontrib><creatorcontrib>Ytterstad, Elinor</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>RePEc IDEAS</collection><collection>RePEc</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><jtitle>Biometrika</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bjørnstad, Jan F.</au><au>Ytterstad, Elinor</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Two-stage sampling from a prediction point of view when the cluster sizes are unknown</atitle><jtitle>Biometrika</jtitle><date>2008-03-01</date><risdate>2008</risdate><volume>95</volume><issue>1</issue><spage>187</spage><epage>204</epage><pages>187-204</pages><issn>0006-3444</issn><eissn>1464-3510</eissn><coden>BIOKAX</coden><abstract>We consider the problem of estimating the population total in two-stage cluster sampling when cluster sizes are known only for the sampled clusters, making use of a population model arising from a variance component model. The problem can be considered as one of predicting the unobserved part Z of the total, and the concept of predictive likelihood is studied. Prediction intervals and a predictor for the population total are derived for the normal case, based on predictive likelihood. For a more general distribution-free model, by application of an analysis of variance approach instead of maximum likelihood for parameter estimation, the predictor obtained from the predictive likelihood is shown to be approximately uniformly optimal for large sample size and large number of clusters, in the sense of uniformly minimizing the mean-squared error in a partially linear class of model-unbiased predictors. Three prediction intervals for Z based on three similar predictive likelihoods are studied. For a small number n0 of sampled clusters, they differ significantly, but for large n0, the three intervals are practically identical. Model-based and design-based coverage properties of the prediction intervals are studied based on a comprehensive simulation study. The simulation study indicates that for large sample sizes, the coverage measures achieve approximately the nominal level 1 − α and are slightly less than 1 − α for moderately large sample sizes. For small sample sizes, the coverage measures are about 1 − 2α, being raised to 1 − α for a modified interval based on the distribution.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><doi>10.1093/biomet/asm098</doi><tpages>18</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0006-3444 |
ispartof | Biometrika, 2008-03, Vol.95 (1), p.187-204 |
issn | 0006-3444 1464-3510 |
language | eng |
recordid | cdi_proquest_journals_201695704 |
source | Oxford Journals Online; JSTOR |
subjects | Applications Biology, psychology, social sciences Estimating techniques Estimators Exact sciences and technology General topics Law of likelihood Mathematics Modeling Musical intervals Optimal predictor Optimization Parameter estimation Population Population model Prediction interval Predictions Predictive likelihood Predictive modeling Probability and statistics Probability theory and stochastic processes Random variables Sample size Sampling theory, sample surveys Sciences and techniques of general use Simulation Simulations Statistical methods Statistical variance Statistics Stochastic processes Studies Survey sampling Unbiased estimators |
title | Two-stage sampling from a prediction point of view when the cluster sizes are unknown |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T10%3A41%3A33IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Two-stage%20sampling%20from%20a%20prediction%20point%20of%20view%20when%20the%20cluster%20sizes%20are%20unknown&rft.jtitle=Biometrika&rft.au=Bj%C3%B8rnstad,%20Jan%20F.&rft.date=2008-03-01&rft.volume=95&rft.issue=1&rft.spage=187&rft.epage=204&rft.pages=187-204&rft.issn=0006-3444&rft.eissn=1464-3510&rft.coden=BIOKAX&rft_id=info:doi/10.1093/biomet/asm098&rft_dat=%3Cjstor_proqu%3E20441451%3C/jstor_proqu%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c410t-1f8b4acc7589b7e8ce0b0b5ab2d79660317ebf68420a5e4fc55751f297dee39c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=201695704&rft_id=info:pmid/&rft_jstor_id=20441451&rft_oup_id=10.1093/biomet/asm098&rfr_iscdi=true |