Robert A. LaBudde
2012-Oct-02  18:54 UTC
[R] Possible error in BCa method for confidence intervals in package 'boot'
I'm using R 2.15.1 on a 64-bit machine with Windows 7 Home Premium.
Sample problem (screwy subscripted syntax is a relic of edited down a 
more complex script):
 > N <- 25
 > s <- rlnorm(N, 0, 1)
 > require("boot")
Loading required package: boot
 > v <- NULL # hold sample variance estimates
 > i <- 1
 > v[i] <- var(s) # get sample variance
 > nReal <- 10
 > varf <- function (x,i) { var(x[i]) }
 > fabc <- function (x, w) { # weighted average (biased) variance
+   sum(x^2 * w) / sum(w) - (sum(x * w) / sum(w))^2
+ }
 > p <- c(.25, .75, .2, .8, .15, .85, .1, .9, .05, .95, .025, .975, 
.005, .995)
 > cl <- c(0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.99)
 > b <- boot(s, varf, R = nReal) # bootstrap
 > bv <- NULL # hold bootstrap mean variance estimates
 > bias <- NULL #hold bias estimates
 > bv[i] <- mean(b$t) # bootstrap mean variance
 > bias[i] <- bv[i] - v[i] # bias estimate
 > bCI90 <- boot.ci(b, conf = 0.90)
Error in bca.ci(boot.out, conf, index[1L], L = L, t = t.o, t0 = t0.o,  :
   estimated adjustment 'a' is NA
In addition: Warning messages:
1: In norm.inter(t, (1 + c(conf, -conf))/2) :
   extreme order statistics used as endpoints
2: In boot.ci(b, conf = 0.9) :
   bootstrap variances needed for studentized intervals
3: In norm.inter(t, alpha) : extreme order statistics used as endpoints
 >
 > nReal <- 25
 > b <- boot(s, varf, R = nReal) # bootstrap
 > bv[i] <- mean(b$t) # bootstrap mean variance
 > bias[i] <- bv[i] - v[i] # bias estimate
 > bCI90 <- boot.ci(b, conf = 0.90)
Warning messages:
1: In boot.ci(b, conf = 0.9) :
   bootstrap variances needed for studentized intervals
2: In norm.inter(t, adj.alpha) :
   extreme order statistics used as endpoints
The problem is that doing 10 resamples generates an NA in the 
estimation of the 'acceleration' in the function abc.ci(), but doing 
25 resamples does not. This implies a connection between the number 
of resamples and the 'acceleration' which should not exist. 
('Acceleration' should be obtained from the original sample via 
jackknife as 1/6 the coefficient of skewness.)
Looking at the script for abc.ci(), there is an anomalous reference 
to 'n' in the invocation line, yet 'n' is not an argument, so
must be
defined more globally before the call. Yet 'n' is defined within the 
script as the length of 'data', which is referred to as the 
'bootstrap' vector in the comments, yet should be the original sample 
data. This confusion, plus the use of an argument 'eps' as a default 
0.001/n in the calculations makes me suspect the programming in the script.
The script apparently works correctly if the number of resamples 
equals or exceeds the number of original data, but not otherwise.
===============================================================Robert A.
LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947
"Vere scire est per causas scire"
