I am attempting to use package boot to summarize and compare the performance of three models. I'm using R 2.13.0 in a Win32 environment. My statistic function returns a vector of 6 values, 3 of which are error rates for different models, and 3 are pairwise differences between those error rates. It looks like: multiEst<-function(dat,i) { .... c(E1,E2,E3,E2-E1,E3-E1,E3-E2); } then I call boot (using R=4 for simplicity of description) with: multiBoot=boot(data,multiEst,R=4) which gives reasonable results: Bootstrap Statistics : original bias std. error t1* 0.07 0.3775 0.04193249 t2* 0.08 0.3750 0.04654747 t3* 0.04 0.4200 0.05354126 t4* 0.01 -0.0025 0.00500000 t5* -0.03 0.0425 0.01500000 t6* -0.04 0.0450 0.01290994 and the resulting "t0" contains the expected estimates of the statistics,> multiBoot$t0[1] 0.07 0.08 0.04 0.01 -0.03 -0.04 however "t", which is supposed to contain bootstrap replicates of the statistic, doesn't. It looks like this:> multiBoot$t[,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.46 0.47 0.46 0.01 0.00 -0.01 [2,] 0.39 0.39 0.39 0.00 0.00 0.00 [3,] 0.45 0.46 0.47 0.01 0.02 0.01 [4,] 0.49 0.50 0.52 0.01 0.03 0.02 It is not clear where these columns come from --- they clearly do not resemble the estimates in "t0". If I define a separate statistic function for each desired estimate, the resulting "t" and "t0" are as expected, however it is important in this case that the separate estimates derive from the same bootstrap replicates. Any helpful suggestions? Or have I come upon a bug in the implementation? Note: the documentation provides the following definitions for these returned variables: t0 The observed value of statistic applied to data. t A matrix with R rows each of which is a bootstrap replicate of statistic. -- View this message in context: http://r.789695.n4.nabble.com/Unexp-behavior-from-boot-with-multiple-statistics-tp3493300p3493300.html Sent from the R help mailing list archive at Nabble.com.
Andrew Robinson
2011-May-03 23:11 UTC
[R] Unexp. behavior from boot with multiple statistics
Your interpretation of what the output is supposed to look like is actually correct. Take a look at the estimates of the bias in the BootStrap Statistics. You will see that they are the same as the difference between the location of colMeans of t and t0. I hope that this helps, Andrew On Tue, May 03, 2011 at 12:15:05PM -0700, algorimancer wrote:> I am attempting to use package boot to summarize and compare the performance > of three models. I'm using R 2.13.0 in a Win32 environment. > > My statistic function returns a vector of 6 values, 3 of which are error > rates for different models, and 3 are pairwise differences between those > error rates. It looks like: > > multiEst<-function(dat,i) > { > .... > c(E1,E2,E3,E2-E1,E3-E1,E3-E2); > } > > then I call boot (using R=4 for simplicity of description) with: > > multiBoot=boot(data,multiEst,R=4) > > which gives reasonable results: > > Bootstrap Statistics : > original bias std. error > t1* 0.07 0.3775 0.04193249 > t2* 0.08 0.3750 0.04654747 > t3* 0.04 0.4200 0.05354126 > t4* 0.01 -0.0025 0.00500000 > t5* -0.03 0.0425 0.01500000 > t6* -0.04 0.0450 0.01290994 > > and the resulting "t0" contains the expected estimates of the statistics, > > multiBoot$t0 > [1] 0.07 0.08 0.04 0.01 -0.03 -0.04 > > however "t", which is supposed to contain bootstrap replicates of the > statistic, doesn't. It looks like this: > > multiBoot$t > [,1] [,2] [,3] [,4] [,5] [,6] > [1,] 0.46 0.47 0.46 0.01 0.00 -0.01 > [2,] 0.39 0.39 0.39 0.00 0.00 0.00 > [3,] 0.45 0.46 0.47 0.01 0.02 0.01 > [4,] 0.49 0.50 0.52 0.01 0.03 0.02 > > It is not clear where these columns come from --- they clearly do not > resemble the estimates in "t0". > > If I define a separate statistic function for each desired estimate, the > resulting "t" and "t0" are as expected, however it is important in this case > that the separate estimates derive from the same bootstrap replicates. > > Any helpful suggestions? Or have I come upon a bug in the implementation? > > Note: the documentation provides the following definitions for these > returned variables: > > t0 The observed value of statistic applied to data. > t A matrix with R rows each of which is a bootstrap replicate of statistic. > > > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Unexp-behavior-from-boot-with-multiple-statistics-tp3493300p3493300.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/