I have an advanced question about bootstrapping. There are two datasets. In each bootstrap iteration, I would like to sample One observation per cluster from the first dataset. N observations with replacement from the second dataset. Right now I am using dplyr::sample_n() for first dataset, with this sampling embedded in the program that boot() from the boot package is running to sample the second dataset and produce the estimates. I would prefer to do the entire sampling in the boot() part as opposed to embedding the sample_n() statement. The reason is so that the "original" results will indeed be on the full data rather than on a particular sample from the first dataset. Any thoughts on how to implement? I think that this involves using strata and weights to "fool" boot to sample from a concatenation of the two datasets. The two datasets have entirely different contents (variable and numbers of observations. MWE follows: library(boot) library(car) library(dplyr) (first.df <- data.frame(cluster=gl(2,2,4),z=seq(1,2))) (second.df <- data.frame(y=1:2)) boot_script <- function(X,d) { zbar <- mean(sample_n(group_by(first.df,cluster),1)$z) return( c(zbar, zbar * mean(X[d,"y"]) )) } ## Results based on the original data (original.zbar <- mean(first.df$z)) mean(original.zbar * second.df[,"y"]) ## Bootstrapped results ## Problem: "Original" is itself based on a sampling for( i in c(1:10)) { b <- boot(second.df, boot_script, R=100) print(summary(b)) } Thank you very much. -- Michael Ash, Chair, Department of Economics Professor of Economics and Public Policy University of Massachusetts Amherst Email mash at econs.umass.edu Tel +1-413-545-4815 <(413)%20545-4815> Twitter https://twitter.com/ michaelaoash [[alternative HTML version deleted]]