thr3ads.net - R help - [R] When to use bootstrap confidence intervals? [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Mark Seeto

2010-Aug-16 11:09 UTC

[R] When to use bootstrap confidence intervals?

Hello, I have a question regarding bootstrap confidence intervals.
Suppose we have a data set consisting of single measurements, and that
the measurements are independent but the distribution is unknown. If
we want a confidence interval for the population mean, when should a
bootstrap confidence interval be preferred over the elementary t
interval?

I was hoping the answer would be "always", but some simple simulations
suggest that this is incorrect. I simulated some data and calculated
95% elementary t intervals and 95% bootstrap BCA intervals (with the
boot package). I calculated the proportion of confidence intervals
lying entirely above the true mean, the proportion entirely below the
true mean, and the proportion containing the true mean. I used a
normal distribution and a t distribution with 3 df.

library(boot)
samplemean <- function(x, ind) mean(x[ind])

ci.norm <- function(sample.size, n.samples, mu=0, sigma=1, boot.reps) {
   t.under <- 0; t.over <- 0
   bca.under <- 0; bca.over <- 0
   for (k in 1:n.samples) {
     x <- rnorm(sample.size, mu, sigma)
     b <- boot(x, samplemean, R = boot.reps)
     bci <- boot.ci(b, type="bca")
     if (mu < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.under <- t.under + 1
     if (mu > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.over <- t.over + 1
     if (mu < bci$bca[4]) bca.under <- bca.under + 1
     if (mu > bci$bca[5]) bca.over <- bca.over + 1
   }
   return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples,
          bca = c(bca.under, bca.over, n.samples - (bca.under +
bca.over))/n.samples))
}

ci.t <- function(sample.size, n.samples, df, boot.reps) {
   t.under <- 0; t.over <- 0
   bca.under <- 0; bca.over <- 0
   for (k in 1:n.samples) {
     x <- rt(sample.size, df)
     b <- boot(x, samplemean, R = boot.reps)
     bci <- boot.ci(b, type="bca")
     if (0 < mean(x) - qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.under <- t.under + 1
     if (0 > mean(x) + qt(0.975, sample.size - 1)*sd(x)/sqrt(sample.size))
       t.over <- t.over + 1
     if (0 < bci$bca[4]) bca.under <- bca.under + 1
     if (0 > bci$bca[5]) bca.over <- bca.over + 1
   }
   return(list(t = c(t.under, t.over, n.samples - (t.under + t.over))/n.samples,
          bca = c(bca.under, bca.over, n.samples - (bca.under +
bca.over))/n.samples))
}

set.seed(1)
ci.norm(sample.size = 10, n.samples = 1000, boot.reps = 1000)
$t
[1] 0.019 0.026 0.955

$bca
[1] 0.049 0.059 0.892

ci.norm(sample.size = 20, n.samples = 1000, boot.reps = 1000)
$t
[1] 0.030 0.024 0.946

$bca
[1] 0.035 0.037 0.928

ci.t(sample.size = 10, n.samples = 1000, df = 3, boot.reps = 1000)
$t
[1] 0.018 0.022 0.960

$bca
[1] 0.055 0.076 0.869

Warning message:
In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints

ci.t(sample.size = 20, n.samples = 1000, df = 3, boot.reps = 1000)
$t
[1] 0.027 0.014 0.959

$bca
[1] 0.054 0.047 0.899

I don't understand the warning message, but for these examples, the
ordinary t interval appears to be better than the bootstrap BCA
interval. I would really appreciate any recommendations anyone can
give on when bootstrap confidence intervals should be used.

Thanks,
Mark
--
Mark Seeto
National Acoustic Laboratories, Australian Hearing

Shentu

2010-Aug-16 13:15 UTC

head link

[R] When to use bootstrap confidence intervals?

Just based on my limited understanding of bootstrapping and statistics in
general, bootstrapping is effective but not magical - you can't reasonably
expect any reliable inference to be drawn about the population based on a
sample of 10, without any distributional assumptions. Your t interval looks
good conditional on the fact that you know what distribution you used to
simulate the data.   

Mark Seeto wrote:> 
> Hello, I have a question regarding bootstrap confidence intervals.
> Suppose we have a data set consisting of single measurements, and that
> the measurements are independent but the distribution is unknown. If
> we want a confidence interval for the population mean, when should a
> bootstrap confidence interval be preferred over the elementary t
> interval?
> 
> I was hoping the answer would be "always", but some simple
simulations
> suggest that this is incorrect. I simulated some data and calculated
> 95% elementary t intervals and 95% bootstrap BCA intervals (with the
> boot package). I calculated the proportion of confidence intervals
> lying entirely above the true mean, the proportion entirely below the
> true mean, and the proportion containing the true mean. I used a
> normal distribution and a t distribution with 3 df.
> 
> 
> -- 
View this message in context:
http://r.789695.n4.nabble.com/When-to-use-bootstrap-confidence-intervals-tp2326695p2326865.html
Sent from the R help mailing list archive at Nabble.com.

Robert A LaBudde

2010-Aug-16 14:32 UTC

head link

[R] When to use bootstrap confidence intervals?

1. Bootstrap will do poorly for small sample sizes (less than 25 or 
so). Parametric methods (t) have the advantage of working down to a 
sample size of less than 5.

2. You need to have the number of resamples reasonably large, say 
10,000, for poorly behaved distributions. Otherwise extreme 
percentiles of the resampling distribution used in calculations are 
too inaccurate.

3. You examples are pathological. For the normal case, the t C.I. 
should be optimal, so no surprise there. For the Student t(df=3) 
case, you have a near singular case with only a couple of moments 
that exist. Try something skewed (lognormal) or bimodal (mixture) as 
a better example.

4. BCA general gives the best results, but only when the sample size 
is moderately large ( > 25) and the number of resamples is large 
(several thousand).

At 07:09 AM 8/16/2010, Mark Seeto wrote:>Hello, I have a question regarding bootstrap confidence intervals.
>Suppose we have a data set consisting of single measurements, and that
>the measurements are independent but the distribution is unknown. If
>we want a confidence interval for the population mean, when should a
>bootstrap confidence interval be preferred over the elementary t
>interval?
>
>I was hoping the answer would be "always", but some simple
simulations
>suggest that this is incorrect. I simulated some data and calculated
>95% elementary t intervals and 95% bootstrap BCA intervals (with the
>boot package). I calculated the proportion of confidence intervals
>lying entirely above the true mean, the proportion entirely below the
>true mean, and the proportion containing the true mean. I used a
>normal distribution and a t distribution with 3 df.
>
>library(boot)
>samplemean <- function(x, ind) mean(x[ind])
>
>ci.norm <- function(sample.size, n.samples, mu=0, sigma=1, boot.reps) {
>    t.under <- 0; t.over <- 0
>    bca.under <- 0; bca.over <- 0
>    for (k in 1:n.samples) {
>      x <- rnorm(sample.size, mu, sigma)
>      b <- boot(x, samplemean, R = boot.reps)
>      bci <- boot.ci(b, type="bca")
>      if (mu < mean(x) - qt(0.975, sample.size -
1)*sd(x)/sqrt(sample.size))
>        t.under <- t.under + 1
>      if (mu > mean(x) + qt(0.975, sample.size -
1)*sd(x)/sqrt(sample.size))
>        t.over <- t.over + 1
>      if (mu < bci$bca[4]) bca.under <- bca.under + 1
>      if (mu > bci$bca[5]) bca.over <- bca.over + 1
>    }
>    return(list(t = c(t.under, t.over, n.samples - (t.under + 
> t.over))/n.samples,
>           bca = c(bca.under, bca.over, n.samples - (bca.under +
>bca.over))/n.samples))
>}
>
>ci.t <- function(sample.size, n.samples, df, boot.reps) {
>    t.under <- 0; t.over <- 0
>    bca.under <- 0; bca.over <- 0
>    for (k in 1:n.samples) {
>      x <- rt(sample.size, df)
>      b <- boot(x, samplemean, R = boot.reps)
>      bci <- boot.ci(b, type="bca")
>      if (0 < mean(x) - qt(0.975, sample.size -
1)*sd(x)/sqrt(sample.size))
>        t.under <- t.under + 1
>      if (0 > mean(x) + qt(0.975, sample.size -
1)*sd(x)/sqrt(sample.size))
>        t.over <- t.over + 1
>      if (0 < bci$bca[4]) bca.under <- bca.under + 1
>      if (0 > bci$bca[5]) bca.over <- bca.over + 1
>    }
>    return(list(t = c(t.under, t.over, n.samples - (t.under + 
> t.over))/n.samples,
>           bca = c(bca.under, bca.over, n.samples - (bca.under +
>bca.over))/n.samples))
>}
>
>set.seed(1)
>ci.norm(sample.size = 10, n.samples = 1000, boot.reps = 1000)
>$t
>[1] 0.019 0.026 0.955
>
>$bca
>[1] 0.049 0.059 0.892
>
>ci.norm(sample.size = 20, n.samples = 1000, boot.reps = 1000)
>$t
>[1] 0.030 0.024 0.946
>
>$bca
>[1] 0.035 0.037 0.928
>
>ci.t(sample.size = 10, n.samples = 1000, df = 3, boot.reps = 1000)
>$t
>[1] 0.018 0.022 0.960
>
>$bca
>[1] 0.055 0.076 0.869
>
>Warning message:
>In norm.inter(t, adj.alpha) : extreme order statistics used as endpoints
>
>ci.t(sample.size = 20, n.samples = 1000, df = 3, boot.reps = 1000)
>$t
>[1] 0.027 0.014 0.959
>
>$bca
>[1] 0.054 0.047 0.899
>
>I don't understand the warning message, but for these examples, the
>ordinary t interval appears to be better than the bootstrap BCA
>interval. I would really appreciate any recommendations anyone can
>give on when bootstrap confidence intervals should be used.
>
>Thanks,
>Mark
>--
>Mark Seeto
>National Acoustic Laboratories, Australian Hearing
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
===============================================================Robert A.
LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"

Reasonably Related Threads

Search for more possibly parallel threads

R help - Aug 2010 - When to use bootstrap confidence intervals?

[R] When to use bootstrap confidence intervals?

[R] When to use bootstrap confidence intervals?

[R] When to use bootstrap confidence intervals?

Reasonably Related Threads