Hi, I understand bootstrap can be used to estimate 95% confidence interval for some statistics, e.g. variance, median, etc. I have someone suggesting that by resampling certain proportion of the total samples (e.g. 80%) without replacement, we can also get the estimate of confidence intervals. Here we have an example of 1000 obsevations, we would like to estimate 95% confidence intervals for odds ratio for a diagnostic test, can I use resampling 80% of the observations without replacement, instead of bootstrap, to do this? If not, why is it wrong to do it this way? Thanks
> I understand bootstrap can be used to estimate 95% > confidence interval for some statistics, e.g.^^^^^^^^^^ There's no such thing. You can estimate 95% CI's on population **parameters**, which is, I assume, what you mean. If you don't know what the difference is, stop here and consult a local statistician, as you are out of your depth. ----------- If you make it to here, I think you are referring to cross-validation vs resampling. Typically, X-validation is used to get an "honest" estimate of prediction error rather than confidence limits for a parameter. The correctness of bootstrapping for this purpose is based on asymptotic theory: loosely speaking, the data distribution approximates the population distribution; appropriate resampling (e.g. maybe stratified, moving blocks, ...) from the data corresponds to iid sampling (or whatever is appropriate..) from the population. It is actually a way to approximate the (itself approximate) asymptotic sampling distribution. AFAIK (experts, please correct) no such asymptotic theory holds for X-validation and so it would be problematic/wrong for CI's. -- Bert Gunter Genentech Non-Clinical Statistics South San Francisco, CA "The business of the statistician is to catalyze the scientific learning process." - George E. P. Box> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of array chip > Sent: Wednesday, April 06, 2005 10:19 AM > To: r-help at stat.math.ethz.ch > Subject: [R] bootstrap vs. resampleing > > Hi, > > I understand bootstrap can be used to estimate 95% > confidence interval for some statistics, e.g. > variance, median, etc. I have someone suggesting that > by resampling certain proportion of the total samples > (e.g. 80%) without replacement, we can also get the > estimate of confidence intervals. Here we have an > example of 1000 obsevations, we would like to estimate > 95% confidence intervals for odds ratio for a > diagnostic test, can I use resampling 80% of the > observations without replacement, instead of > bootstrap, to do this? If not, why is it wrong to do > it this way? > > Thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
What you're describing sounds like subsampling, about which John Hartigan has written a few papers. -roger array chip wrote:> Hi, > > I understand bootstrap can be used to estimate 95% > confidence interval for some statistics, e.g. > variance, median, etc. I have someone suggesting that > by resampling certain proportion of the total samples > (e.g. 80%) without replacement, we can also get the > estimate of confidence intervals. Here we have an > example of 1000 obsevations, we would like to estimate > 95% confidence intervals for odds ratio for a > diagnostic test, can I use resampling 80% of the > observations without replacement, instead of > bootstrap, to do this? If not, why is it wrong to do > it this way? > > Thanks > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
On Wed, 6 Apr 2005, array chip wrote:> Hi, > > I understand bootstrap can be used to estimate 95% > confidence interval for some statistics, e.g. > variance, median, etc. I have someone suggesting that > by resampling certain proportion of the total samples > (e.g. 80%) without replacement, we can also get the > estimate of confidence intervals. Here we have an > example of 1000 obsevations, we would like to estimate > 95% confidence intervals for odds ratio for a > diagnostic test, can I use resampling 80% of the > observations without replacement, instead of > bootstrap, to do this? If not, why is it wrong to do > it this way? >You can, provided you rescale correctly for the fact that you are working with a smaller sample. This is more like the jackknife, which also resamples a smaller number without replacement. There is quite a bit of literature on this sort of jackknife/bootstrap variant. One useful book is "The Jackknife and Bootstrap" by Shao and Tu. -thomas
Confidence intervals depend on the sample size - the bigger the sample the smaller the interval. Subsampling (resampling without replacement) gives smaller samples and underestimates confidence (overestimates confidence interval size) of parameters calculated on the original sample. Best Jens Oehlschl?gel P.S.: I guess signing a question with your name makes answers more likely --
I may be misunderstanding the question, but I believe you want a pointwise confidence band for the conditional odds function. The issue here is less bootstrap versus some other resampling plan, and more how to do it at all. For example, if no matter what "training" data you feed in, you always get the same conditional odds estimate, no resampling will (by itself) reveal this bias (and you will have a confidence band of width 0). You could however use resampling together with nonparametric estimation in a variety of ways to address this. If you assume your conditional odds estimation to be unbiased, you could resample and look at the empirical distribution of conditional odds ratio estimates at a given covariate or feature value. You have to figure out how this is related to the population distribution; this is easiest with the bootstrap since you have the same sample size. In this case the simplest procedure is to treat the bootstrap distribution as the population distribution, but there are many alternatives. See the book Thomas Lumley recommended by Jun Shao and Dongsheng Tu. They treat estimation of regression functions in several places; those remarks are relevant for your case as well. Reid Huntsinger -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of array chip Sent: Wednesday, April 06, 2005 1:19 PM To: r-help at stat.math.ethz.ch Subject: [R] bootstrap vs. resampleing Hi, I understand bootstrap can be used to estimate 95% confidence interval for some statistics, e.g. variance, median, etc. I have someone suggesting that by resampling certain proportion of the total samples (e.g. 80%) without replacement, we can also get the estimate of confidence intervals. Here we have an example of 1000 obsevations, we would like to estimate 95% confidence intervals for odds ratio for a diagnostic test, can I use resampling 80% of the observations without replacement, instead of bootstrap, to do this? If not, why is it wrong to do it this way? Thanks ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Possibly Parallel Threads
- When to use bootstrap confidence intervals?
- How to do bootstrap for the complex sample design?
- Bootstrap BCa confidence limits with your own resamples
- bootstrap bca confidence intervals for large number of statistics in one model; library("boot")
- Variance Component/ICC Confidence Intervals via Bootstrap or Jackknife