I have distilled my bootstrap problem down to this bit of code, which calculates an estimate of the 95th percentile of 7500 random numbers drawn from a standard normal distribution: library(boot) per95 <- function( annual.data, b.index) { sample.data <- annual.data[b.index] return(quantile(sample.data,probs=c(0.95))) } m <- 10000 x <- rnorm(7500,0,1) B <- boot(data=x,statistic=per95,R=m) Error: cannot allocate vector of size 286.1 Mb This was result was observed with R 2.7.1 and 2.7.1patched when run on a Windows XP computer with 4Gb of memory. This does not seem to be an excessively large and complicated calculation, so is this an intentional limitation of the boot function, a result of bad choices on my part, or a bug? Tom -- View this message in context: http://www.nabble.com/Memory-Problems-with-a-Simple-Bootstrap---Part-II-tp18788083p18788083.html Sent from the R help mailing list archive at Nabble.com.
Prof Brian Ripley
2008-Aug-02 12:04 UTC
[R] Memory Problems with a Simple Bootstrap - Part II
On Sat, 2 Aug 2008, Tom La Bone wrote:> I have distilled my bootstrap problem down to this bit of code, which > calculates an estimate of the 95th percentile of 7500 random numbers drawn > from a standard normal distribution: > > library(boot) > per95 <- function( annual.data, b.index) { > sample.data <- annual.data[b.index] > return(quantile(sample.data,probs=c(0.95))) } > m <- 10000 > x <- rnorm(7500,0,1) > B <- boot(data=x,statistic=per95,R=m) > > Error: cannot allocate vector of size 286.1 Mb > > This was result was observed with R 2.7.1 and 2.7.1patched when run on a > Windows XP computer with 4Gb of memory. > > This does not seem to be an excessively large and complicated calculation, > so is this an intentional limitation of the boot function, a result of bad > choices on my part, or a bug?Use of a 32-bit OS was a bad choice on your part. On 64-bit Linux it runs fine in> gc()used (Mb) gc trigger (Mb) max used (Mb) Ncells 146670 7.9 350000 18.7 350000 18.7 Vcells 3189171 24.4 168442002 1285.2 193746905 1478.2 That's too much usage for a 2GB address space. boot() sets up an index array, in your case of size 7500x10000 or 600Mb. That dominates a 2Gb address space. What you could do is B <- replicate(10, boot(data=x,statistic=per95,R=1000), FALSE) Ball <- B[[1]] Ball$t <- do.call("rbind", lapply(B, "[[", "t")) that is, combine 10 independent runs (and that runs in ca 200Mb). BTW to Jim Holtman: adding a gc() call is not very helpful. R will run gc to get memory if it is running out, and whereas the pattern of gc calls can affect the fragmentation, it is pretty much random whether adding gc calls helps or hinders. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595