Thomas Alexander Gerds
2013-Apr-18 05:09 UTC
[Rd] how to control the environment of a formula
Dear List I have experienced that objects generated with one of my packages used a lot of space when saved on disc (object.size did not show this!). some debugging revealed that formula and call objects carried the full environment of subroutines along, including even stuff not needed by the formula or call. here is a sketch of the problem ,---- | test <- function(x){ | x <- rnorm(1000000) | out <- list() | out$f <- a~b | out | } | v <- test(1) | save(v,file="~/tmp/v.rda") | system("ls -lah ~/tmp/v.rda") | | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda `---- I tried to replace line 3 by ,---- | as.formula(a~b,env=emptyenv()) | or | as.formula(a~b,env=NULL) `---- without the desired effect. Instead adding either ,---- | environment(out$f) <- emptyenv() | or | environment(out$f) <- NULL `---- has the desired effect (i.e. the saved object size is shrunken). unfortunately there is a new problem: ,---- | test <- function(x){ | x <- rnorm(1000000) | out <- list() | out$f <- a~b | environment(out$f) <- emptyenv() | out | } | d <- data.frame(a=1,b=1) | v <- test(1) | model.frame(v$f,data=d) | | Error in eval(expr, envir, enclos) : could not find function "list" `---- Same with NULL in place of emptyenv() Finally using .GlobalEnv in place of emptyenv() seems to remove both problems. My questions: 1) why does the argument env of as.formula have no effect? 2) is there a better way to tell formula not to copy unrelated stuff into the associated environment? 3) why does object.size not show the size of the environments that formulas can carry along? Regards Thomas -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics University of Copenhagen, ?ster Farimagsgade 5, 1014 Copenhagen, Denmark Office: CSS-15.2.07 (Gamle Kommunehospital) tel: 35327914 (sec: 35327901) -- -- Thomas A. Gerds -- Assoc. Prof. Department of Biostatistics University of Copenhagen, ?ster Farimagsgade 5, 1014 Copenhagen, Denmark Office: CSS-15.2.07 (Gamle Kommunehospital) tel: 35327914 (sec: 35327901)
On 13-04-18 1:09 AM, Thomas Alexander Gerds wrote:> Dear List > > I have experienced that objects generated with one of my packages used > a lot of space when saved on disc (object.size did not show this!). > > some debugging revealed that formula and call objects carried the full > environment of subroutines along, including even stuff not needed by the > formula or call. here is a sketch of the problem > > ,---- > | test <- function(x){ > | x <- rnorm(1000000) > | out <- list() > | out$f <- a~b > | out > | } > | v <- test(1) > | save(v,file="~/tmp/v.rda") > | system("ls -lah ~/tmp/v.rda") > | > | -rw-rw-r-- 1 tag tag 7,4M Apr 18 06:41 /home/tag/tmp/v.rda > `---- > > I tried to replace line 3 by > > ,---- > | as.formula(a~b,env=emptyenv()) > | or > | as.formula(a~b,env=NULL) > `---- > > without the desired effect. Instead adding either > > ,---- > | environment(out$f) <- emptyenv() > | or > | environment(out$f) <- NULL > `---- > > has the desired effect (i.e. the saved object size is > shrunken). unfortunately there is a new problem: > > ,---- > | test <- function(x){ > | x <- rnorm(1000000) > | out <- list() > | out$f <- a~b > | environment(out$f) <- emptyenv() > | out > | } > | d <- data.frame(a=1,b=1) > | v <- test(1) > | model.frame(v$f,data=d) > | > | Error in eval(expr, envir, enclos) : could not find function "list" > `---- > > Same with NULL in place of emptyenv() > > Finally using .GlobalEnv in place of emptyenv() seems to remove both problems.But it will cause other, less obvious problems. In a formula, the symbols mean something. By setting the environment to .GlobalEnv you're changing the meaning. You'll get nonsense in certain cases when functions look up the meaning of those symbols and find the wrong thing. (I don't have an example at hand, but I imagine it would be easy to put one together with update().)> My questions: > > 1) why does the argument env of as.formula have no effect?Because the first argument already had an associated environment. You passed a ~ b, which is evaluated to a formula; calling as.formula on a formula does nothing. The env argument is only used when a new formula needs to be constructed. (You can see this in the source code; as.formula is a very simple function.)> 2) is there a better way to tell formula not to copy unrelated stuff > into the associated environment?Yes, delete it. For example, you could write your function as test <- function(x){ x <- rnorm(1000000) out <- list() out$f <- a~b rm(x) out }> 3) why does object.size not show the size of the environments that > formulas can carry along?Because many objects can share the same environment. See ?object.size for more details. Duncan Murdoch
Therneau, Terry M., Ph.D.
2013-Apr-19 12:41 UTC
[Rd] how to control the environment of a formula
I went through the same problem and discovery process 2 years ago with the survival package. With pspline() terms the return object from coxph includes a simple 6 line function for enhanced printout, which by default carried along another 30 irrelevant things some of which were huge. I personally think that setting environment(f) <- .Globalenv is the clearest and most simple solution. Note that R does not save the environment of functions defined at the top level; the prior line says to treat your function as "one of those". It works very well as long as your function is an actual function, i.e. It depends only on its input arguments. \begin {opinion} S started out as a pure functional language. That is, a function depends ONLY on its arguments. Many of the strengths of S/R flow directly from the simplicity and rigor that this gives. There is an adage in programming, going back to at least the earliest Fortran compilers, that all successful languages have a way to break their own rules; and S indeed had some hidden workarounds. Formalizing these non-functional back doors as R has done with environments is a good thing. However, the back doors should be used only with extreme reluctance. I cringe at each new "how to be sneaky" discussion on the mailing lists. The ''solution'' is rarely worth the long term price. \end{opinion} Terry Therneau [[alternative HTML version deleted]]