Henrik Bengtsson
2006-May-25 07:19 UTC
[Rd] save() saves extra stuff if object is not evaluated
Hi, it looks like save() is saving all contents of the calling environments if the object to be saved is *not* evaluated, although it is not that simple either. After many hours of troubleshooting, I'm still confused. Here is a reproducible example (also attached) with output. I let the code and the output talk for itself: peek <- function(file, from=1, to=500) { cat("--------------------------------------\n") cat(sprintf("%s: %d bytes\n", file, file.info(file)$size)) bfr <- suppressWarnings(readBin(file, what="character", n=to)) bfr <- gsub("(\001|\002|\003|\004|\005|\016|\020|\036|\a|\n|\t)", "", bfr); bfr <- bfr[nchar(bfr) > 0]; cat(bfr, sep="", "\n"); } saveCache <- function(file, y, sources=NULL, eval=FALSE) { if (eval) dummy <- is.null(sources) base::save(file=file, sources, compress=FALSE) } aVariableNotSaved <- double(1e6) main <- function() { # This 'big' variable is saved in case 1 below! big <- rep(letters, length.out=1e5) identifier <- "This string will be saved too!" y <- 1 file <- "a.RData" saveCache(y, file=file) peek(file) file <- "a-eval.RData" saveCache(y, file=file, eval=TRUE) peek(file) file <- "b-noy.RData" saveCache(file=file) peek(file) file <- "b-noy-eval.RData" saveCache(file=file, eval=TRUE) peek(file) } # 1. Call saveCache() outside main() eval(body(main)) # -------------------------------------- # a.RData: 238 bytes # RDX2Xsources?filea.RData y?n $ n?$eval???n? # -------------------------------------- # a-eval.RData: 58 bytes # RDX2Xsources?? # -------------------------------------- # b-noy.RData: 230 bytes # RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n? # -------------------------------------- # b-noy-eval.RData: 58 bytes # RDX2Xsources?? # 2. Call saveCache() from within main() main() # -------------------------------------- # a.RData: 900412 bytes # RDX2Xsources?filea.RData y? a.RData ?=identifierThis # string will be saved too!big??abcdefghijklmnopqrstuv # wxyzabcdefghijklmnopqrstuvwxyzabcdefg # -------------------------------------- # a-eval.RData: 58 bytes # RDX2Xsources?? # -------------------------------------- # b-noy.RData: 230 bytes # RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n? # -------------------------------------- # b-noy-eval.RData: 58 bytes # RDX2Xsources?? What is going on? I get this on both R v2.3.0 patched (2006-04-28 r37936) and R v2.3.1 beta (2006-05-23 r38179) on my WinXP (with Rterm --vanilla).
Luke Tierney
2006-May-25 09:56 UTC
[Rd] save() saves extra stuff if object is not evaluated
On Thu, 25 May 2006, Henrik Bengtsson wrote:> Hi, > > it looks like save() is saving all contents of the calling > environments if the object to be saved is *not* evaluated, although it > is not that simple either.No, it's exactly that simple. Serialization follows and writes out all reachable environments. Unevaluated promises contain the environments in which their evaluations are to occur; evaluated ones have this field set to R_NilValue to eliminate this no longer needed reference. There are two environments involved: the calling environment in which saveCache is called and the callee environment of the call to saveCache where the body of saveCache is evaluated. Because of lexical scope the enclosing environment of the callee environment is the closure environment of saveCache, which is .GlobalEnv. The call to saveCache creates a promise for evaluating the default value for 'source' _in the callee environment_. In the case with y the callee environment includes a value of y which is a promise referencing the calling environment (either .GlobalENv or the environment of the call to main). In the calls without y the value of y in the calling environment is the missing value indicator, not a promise. So only with y and no eval is there a reference to the calling environment that serialization then has to write out. Best, luke> After many hours of troubleshooting, I'm > still confused. Here is a reproducible example (also attached) with > output. I let the code and the output talk for itself: > > peek <- function(file, from=1, to=500) { > cat("--------------------------------------\n") > cat(sprintf("%s: %d bytes\n", file, file.info(file)$size)) > bfr <- suppressWarnings(readBin(file, what="character", n=to)) > bfr <- gsub("(\001|\002|\003|\004|\005|\016|\020|\036|\a|\n|\t)", "", bfr); > bfr <- bfr[nchar(bfr) > 0]; > cat(bfr, sep="", "\n"); > } > > saveCache <- function(file, y, sources=NULL, eval=FALSE) { > if (eval) > dummy <- is.null(sources) > base::save(file=file, sources, compress=FALSE) > } > > aVariableNotSaved <- double(1e6) > > main <- function() { > # This 'big' variable is saved in case 1 below! > big <- rep(letters, length.out=1e5) > identifier <- "This string will be saved too!" > > y <- 1 > > file <- "a.RData" > saveCache(y, file=file) > peek(file) > > file <- "a-eval.RData" > saveCache(y, file=file, eval=TRUE) > peek(file) > > file <- "b-noy.RData" > saveCache(file=file) > peek(file) > > file <- "b-noy-eval.RData" > saveCache(file=file, eval=TRUE) > peek(file) > } > > > # 1. Call saveCache() outside main() > eval(body(main)) > # -------------------------------------- > # a.RData: 238 bytes > # RDX2Xsources?filea.RData y?n $ n?$eval???n? > # -------------------------------------- > # a-eval.RData: 58 bytes > # RDX2Xsources?? > # -------------------------------------- > # b-noy.RData: 230 bytes > # RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n? > # -------------------------------------- > # b-noy-eval.RData: 58 bytes > # RDX2Xsources?? > > # 2. Call saveCache() from within main() > main() > # -------------------------------------- > # a.RData: 900412 bytes > # RDX2Xsources?filea.RData y? a.RData ?=identifierThis > # string will be saved too!big??abcdefghijklmnopqrstuv > # wxyzabcdefghijklmnopqrstuvwxyzabcdefg > # -------------------------------------- > # a-eval.RData: 58 bytes > # RDX2Xsources?? > # -------------------------------------- > # b-noy.RData: 230 bytes > # RDX2Xsources?file?b-noy.RData ?yv$ n?$eval???n? > # -------------------------------------- > # b-noy-eval.RData: 58 bytes > # RDX2Xsources?? > > What is going on? > > I get this on both R v2.3.0 patched (2006-04-28 r37936) and R v2.3.1 > beta (2006-05-23 r38179) on my WinXP (with Rterm --vanilla). >-- Luke Tierney Chair, Statistics and Actuarial Science Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke at stat.uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu