Hello; Our survey is structured as : To be investigated area is divided into 6 regions, within each region, one urban community and one rural community are randomly selected, then samples are randomly drawn from each selected uran and rural community. The problems is that in urban/rural stratum, we only have one sample. In this case, how to do bootstrap? Any comments or hints are greatly appreciated! Faye [[alternative HTML version deleted]]
Tim Hesterberg
2010-Nov-04 14:51 UTC
[R] How to do bootstrap for the complex sample design?
Faye wrote:>Our survey is structured as : To be investigated area is divided into >6 regions, within each region, one urban community and one rural >community are randomly selected, then samples are randomly drawn from >each selected uran and rural community. > >The problems is that in urban/rural stratum, we only have one sample. >In this case, how to do bootstrap?You are lucky that your sample size is 1. If it were 2 you would probably have proceeded without realizing that the answers were wrong. Suppose you had two samples in each stratum. If you proceed naturally, drawing bootstrap samples of size 2 from each stratum, this would underestimate variability by a factor of 2. In general the ordinary nonparametric bootstrap estimates of variability are biased downward by a factor of (n-1)/n -- exactly for the mean, approximately for other statistics. In multiple-sample and stratified situations, the bias depends on the stratum sizes. Three remedies are: * draw bootstrap samples of size n-1 * "bootknife" sampling - omit one observation (a jackknife sample), then draw a bootstrap sample of size n from that * bootstrap from a kernel density estimate, with kernel covariance equal to empirical covariance (with divisor n-1) / n. The latter two are described in Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930. http://home.comcast.net/~timhesterberg/articles/JSM04-bootknife.pdf All three are undefined for samples of size 1. You need to go to some other bootstrap, e.g. a parametric bootstrap with variability estimated from other data. Tim Hesterberg
Robert A LaBudde
2010-Nov-04 15:45 UTC
[R] How to do bootstrap for the complex sample design?
At 01:38 AM 11/4/2010, Fei xu wrote:>Hello; > >Our survey is structured as : To be investigated area is divided >into 6 regions, >within each region, one urban community and one rural community are >randomly selected, >then samples are randomly drawn from each selected uran and rural community. > >The problems is that in urban/rural stratum, we only have one sample. >In this case, how to do bootstrap? > >Any comments or hints are greatly appreciated! > >FayeJust make a table of your data, with each row corresponding to a measurement. You columns will be Region, UrbanCommunity, RuralCommunity and your response variables. Bootstrap resampling is just generating random row indices into this table, with replacement. I.e., index<- sample(1:N, N, replace=TRUE) Then your resample is myTable[index,]. Because you chose UrbanCommunity and RuralCommunity randomly, this shouldn't be a problem. The fact that you choose a subsample size of 1 means you won't be able to estimate within-region variances unless you make some serious assumptions (e.g., UrbanCommunity effect independent of Region effect). ===============================================================Robert A. LaBudde, PhD, PAS, Dpl. ACAFS e-mail: ral at lcfltd.com Least Cost Formulations, Ltd. URL: http://lcfltd.com/ 824 Timberlake Drive Tel: 757-467-0954 Virginia Beach, VA 23464-3239 Fax: 757-467-2947 "Vere scire est per causas scire"