I have a question regarding bootstrap coverage. I am trying to understand the
benefits of using the bootstrap for small sample sets. To do this I created a
normal population and then picked 10 from the populations and applied both
traditional statistical methods and the Bootstrap (bcanon, 5000 bootstrap
samples) to calculate a 95% confidence interval of on the mean. I saved the
width of the confidence interval and how many misses, repeated this 1000 times,
and output the summary on the CI width and the number of misses for each method
(actual script below) . I had expected to see about 5% of the CI to miss the
actual mean. For the traditional method about 6% missed but for the Bootstrap
it was over 11%.
Traditional:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4531 1.1460 1.3550 1.3690 1.5900 2.4330
[1] 0.062
Bootstrap:
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.3661 0.9455 1.1290 1.1380 1.3220 2.0160
[1] 0.113
The bootstrap method consistently missed almost twice as many times as the
traditional method.
What am I missing? Is this the best that can be expected when working with a
sample set of only 10 pieces?
Although this experiment was done using a normal distribution, my real datasets
would be non-normal of an unknown distribution.
Thanks for any help,
Art Nuzzo
Motorola
***************************************************************************************
library(bootstrap)
CImiss = function(M, lower, upper) {lower > M || upper < M }
CIr = function(lower, upper) {upper - lower}
C = c(); B = c() # CI Range
Ccov = 0; Bcov = 0 # Number of Ci Misses
cnt = 1000; # reps
x = rnorm(10000) # create population
m = mean(x)
for (i in 1:cnt) {
s = sample(x,10,replace=F) # sample population
tresults = t.test(s)
attach(tresults)
C[i] = CIr(conf.int[1],conf.int[2])
if (CImiss(m,conf.int[1],conf.int[2])) Ccov = Ccov + 1
detach(tresults)
bcaresults <- bcanon(s,5000,mean,alpha=c(.025,.975))
attach(bcaresults)
B[i] = CIr(confpoints[1,2],confpoints[2,2])
if (CImiss(m,confpoints[1,2],confpoints[2,2])) Bcov = Bcov + 1
detach(bcaresults)
}
print(summary (C))
print(Ccov/cnt)
print(summary (B))
print(Bcov/cnt)
[[alternative HTML version deleted]]