Jacob Wegelin
2007-Jan-26 20:26 UTC
[R] bootstrap bca confidence intervals for large number of statistics in one model; library("boot")
Sometimes one might like to obtain pointwise bootstrap bias-corrected, accelerated (BCA) confidence intervals for a large number of statistics computed from a single dataset. For instance, one might like to get (so as to plot graphically) bootstrap confidence bands for the fitted values in a regression model. (Example: Chiu S et al., Early Acceleration of Head Circumference in Children with Fragile X Syndrome and Autism. Journal of Developmental & Behavioral Pediatrics 2007;In press. In this paper we plot the estimated trajectories of head circumference for two different groups, computed by linear mixed-effects models, with confidence bands obtained by bootstrap.) Below is a toy example of how to do this using the "boot" library. We obtain BCA CIs for all three regression parameters and for the fitted values at 50 levels of the predictor. set.seed(1234567) x<-runif(150) y<-2/3 + pi * x^2 + runif(length(x))/2 plot(x,y) DAT<-data.frame(x,y) NEWDATA<-data.frame(x=seq(min(x), max(x), length=50)) library('boot') myfn<-function(data, whichrows) { TheseData<-data[whichrows,] thisLM<-lm( y~poly(x,2), data=TheseData) thisFit<-predict(thisLM, newdata=NEWDATA) c( coef(summary(thisLM))[,"Estimate"] , thisFit) } bootObj<-boot( data=DAT, statistic=myfn, R=1000 ) names(bootObj) dim(bootObj$t) sofar<-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] )) colnames(sofar)<-c("lo", "hi") NEWDATA<-cbind(NEWDATA, sofar[4:nrow(sofar),]) thecoefs<-sofar[1:3,] lines( NEWDATA$x, NEWDATA$lo, col='red') lines( NEWDATA$x, NEWDATA$hi, col='red') Note: To get boot.ci to run with type='bca' it seems necessary to have a large value of R. Otherwise boot.ci returns an error, in my (limited) experience. Thanks in advance for any critiques. (For instance, is there an easier or more elegant way?) Jacob Wegelin
Previous subject: bootstrap bca confidence intervals for large number of statistics in one model; library("boot") Jacob Wegelin asked for an easier way to do many bootstrap confidence intervals for regression output. The syntax would be easier with S+Resample, example below. You create an ordinary lm object, then do e.g. boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata=NEWDATA))) limits.bca(boot) ---- RESAMPLING FOR TEACHING ---- I would encourage people to consider using S+Resample for teaching. There is a free student version of S+, and S+Resample is easier for students -- easier syntax (in general, not just this example), plus a menu interface. There are free chapters and packages you can use to supplement a statistics course with resampling. See: http://www.insightful.com/Hesterberg/bootstrap/#Reference.introStat I'll give a workshop on resampling for teaching at the USCOTS conference (U.S. Conference On Teaching Statistics) on May 16. http://www.causeweb.org/uscots http://www.causeweb.org/workshop/hesterberg/ set.seed(0) x <- runif(150) y <- 2/3 + pi * x^2 + runif(length(x))/2 plot(x,y) DAT <- data.frame(x,y) NEWDATA <- data.frame(x=seq(min(x), max(x), length=50)) library(resample) fit <- lm(y ~ x + x^2, data=DAT) boot <- bootstrap(fit, c(coef(fit), predict(fit, newdata = NEWDATA))) CIs <- limits.bca(boot) lines(NEWDATA$x, CIs[4:53,1], col="red") lines(NEWDATA$x, CIs[4:53,4], col="red") CIs[1:3,] I used x + x^2 in the model rather than poly(x,2), because poly is data dependent, so regression coefficients cannot be used for new data, and confidence intervals for the coefficients are not meaningful. Comment - the default is 1000 bootstrap samples; that isn't really enough for BCa intervals, because the BCa calculations magnify the Monte Carlo standard error by roughly a factor of two. Jacob Wegelin wrote:>Sometimes one might like to obtain pointwise bootstrap bias-corrected, >accelerated (BCA) confidence intervals for a large number of statistics >computed from a single dataset. For instance, one might like to get >(so as to plot graphically) bootstrap confidence bands for the fitted >values in a regression model. > >(Example: Chiu S et al., Early Acceleration of Head Circumference in >Children with Fragile X Syndrome and Autism. Journal of Developmental & >Behavioral Pediatrics 2007;In press. In this paper we plot the >estimated trajectories of head circumference for two different >groups, computed by linear mixed-effects models, with confidence bands >obtained by bootstrap.) > >Below is a toy example of how to do this using the "boot" library. >We obtain BCA CIs for all three regression parameters and for the fitted >values at 50 levels of the predictor. > >set.seed(1234567) >x<-runif(150) >y<-2/3 + pi * x^2 + runif(length(x))/2 >plot(x,y) >DAT<-data.frame(x,y) >NEWDATA<-data.frame(x=seq(min(x), max(x), length=50)) >library('boot') >myfn<-function(data, whichrows) { > TheseData<-data[whichrows,] > thisLM<-lm( y~poly(x,2), data=TheseData) > thisFit<-predict(thisLM, newdata=NEWDATA) > c( > coef(summary(thisLM))[,"Estimate"] > , thisFit) >} >bootObj<-boot( data=DAT, statistic=myfn, R=1000 ) >names(bootObj) >dim(bootObj$t) >sofar<-t(sapply( 1:ncol(bootObj$t), function(thiscolumn) boot.ci(bootObj, type='bca', index=thiscolumn)$bca[4:5] )) >colnames(sofar)<-c("lo", "hi") >NEWDATA<-cbind(NEWDATA, sofar[4:nrow(sofar),]) >thecoefs<-sofar[1:3,] >lines( NEWDATA$x, NEWDATA$lo, col='red') >lines( NEWDATA$x, NEWDATA$hi, col='red') > >Note: To get boot.ci to run with type='bca' it seems necessary to have a >large value of R. Otherwise boot.ci returns an error, in my (limited) >experience. > >Thanks in advance for any critiques. (For instance, is there an easier or more elegant way?)Caveat - I helped create S+Resample. =======================================================| Tim Hesterberg Senior Research Scientist | | timh at insightful.com Insightful Corp. | | (206)802-2319 1700 Westlake Ave. N, Suite 500 | | (206)283-8691 (fax) Seattle, WA 98109-3044, U.S.A. | | www.insightful.com/Hesterberg | =======================================================Download S+Resample from www.insightful.com/downloads/libraries Bootstrap short courses: May 21-22 Philadelphia, Oct 10-11 San Francisco. 2-3 May UK, 3-4 Oct UK. http://www.insightful.com/services/training.asp Workshop on resampling for teaching: May 16 Ohio State http://www.causeweb.org/workshop/hesterberg/