jjh21
2009-Apr-13 00:48 UTC
[R] Clustered data with Design package--bootcov() vs. robcov()
Hi, I am trying to figure out exactly what the bootcov() function in the Design package is doing within the context of clustered data. From reading the documentation/source code it appears that using bootcov() with the cluster argument constructs standard errors by resampling whole clusters of observations with replacement rather than resampling individual observations. Is that right, and is there any more detailed documentation on the math behind this? Also, what is the difference between these two functions: bootcov(my.model, cluster.id) robcov(my.model, cluster.id) Thank you. -- View this message in context: http://www.nabble.com/Clustered-data-with-Design-package--bootcov%28%29-vs.-robcov%28%29-tp23016400p23016400.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2009-Apr-13 12:51 UTC
[R] Clustered data with Design package--bootcov() vs. robcov()
jjh21 wrote:> Hi, > > I am trying to figure out exactly what the bootcov() function in the Design > package is doing within the context of clustered data. From reading the > documentation/source code it appears that using bootcov() with the cluster > argument constructs standard errors by resampling whole clusters of > observations with replacement rather than resampling individual > observations. Is that right, and is there any more detailed documentation on > the math behind this? Also, what is the difference between these two > functions:Correct. Did you read the Feng et al reference in bootcov's help file or check the book that is related to the package?> > bootcov(my.model, cluster.id) > robcov(my.model, cluster.id)robcov does not use bootstrapping. It uses the cluster sandwich (Huber-White) variance-covariance estimator for which there are references in the help file (see especially Lin). Both robcov and bootcov work best when there is a large number of small clusters. If the clusters are somewhat large and greatly vary in size, expect to be in trouble and consider a full modeling approach (generalized least squares, mixed models, etc.). One advantage of robcov is that you get the same result every time, unlike bootstrapping. But even in the case of cluster sizes of one, the sandwich estimator can be inefficient (see the Gould paper) or can result in the "right" estimates of the "wrong" quantity (see a paper by Friedman in American Statistician). Frank> > Thank you.-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
jjh21
2009-May-23 10:35 UTC
[R] Clustered data with Design package--bootcov() vs. robcov()
Another question related to bootcov(): A reviewer is concerned with the fact that bootstrapping the standard errors does not give the same answers each time. What is a good way to address this concern? Could I bootstrap, say, 100 times and report the mean standard error of those 100 estimates? I am already doing 1,000 replications in the bootstrap, but of course the answer is still slightly different each time. Frank E Harrell Jr wrote:> > > robcov does not use bootstrapping. It uses the cluster sandwich > (Huber-White) variance-covariance estimator for which there are > references in the help file (see especially Lin). > > Both robcov and bootcov work best when there is a large number of small > clusters. If the clusters are somewhat large and greatly vary in size, > expect to be in trouble and consider a full modeling approach > (generalized least squares, mixed models, etc.). > > One advantage of robcov is that you get the same result every time, > unlike bootstrapping. But even in the case of cluster sizes of one, the > sandwich estimator can be inefficient (see the Gould paper) or can > result in the "right" estimates of the "wrong" quantity (see a paper by > Friedman in American Statistician). > > Frank > >> >> Thank you. > > > -- > Frank E Harrell Jr Professor and Chair School of Medicine > Department of Biostatistics Vanderbilt University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/Clustered-data-with-Design-package--bootcov%28%29-vs.-robcov%28%29-tp23016400p23683238.html Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2009-May-23 11:29 UTC
[R] Clustered data with Design package--bootcov() vs. robcov()
jjh21 wrote:> Another question related to bootcov(): > > A reviewer is concerned with the fact that bootstrapping the standard errors > does not give the same answers each time. What is a good way to address this > concern? Could I bootstrap, say, 100 times and report the mean standard > error of those 100 estimates? I am already doing 1,000 replications in the > bootstrap, but of course the answer is still slightly different each time.First, you can argue that everything we estimate has a margin of error and that the variation across different runs of the bootstrap is within the statistical precision of what can be estimated. Second, run the bootstrap with 10,000 replications and be done with it. Frank> > > > Frank E Harrell Jr wrote: >> >> robcov does not use bootstrapping. It uses the cluster sandwich >> (Huber-White) variance-covariance estimator for which there are >> references in the help file (see especially Lin). >> >> Both robcov and bootcov work best when there is a large number of small >> clusters. If the clusters are somewhat large and greatly vary in size, >> expect to be in trouble and consider a full modeling approach >> (generalized least squares, mixed models, etc.). >> >> One advantage of robcov is that you get the same result every time, >> unlike bootstrapping. But even in the case of cluster sizes of one, the >> sandwich estimator can be inefficient (see the Gould paper) or can >> result in the "right" estimates of the "wrong" quantity (see a paper by >> Friedman in American Statistician). >> >> Frank >> >>> Thank you. >> >> -- >> Frank E Harrell Jr Professor and Chair School of Medicine >> Department of Biostatistics Vanderbilt University >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> >-- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University