Marius Hofert
2016-Aug-29 20:52 UTC
[R] How to test existence of an environment and how to remove it (from within functions)?
Dear Duncan, Thanks a lot for your help. I tried to adapt your example to my MWE, but the subsequent calls of main() are 'too fast' now: new calls of main() should also 'reset' the environment (as a different x is generated then), that's why I tried to remove the environment .my_environ from within main(): ## Auxiliary function with caching aux <- local({ .my_environ <- new.env(hash = FALSE, parent = emptyenv()) # define the environment function(x) { ## Setting up the environment and caching if(exists("cached.obj", envir = .my_environ)) { # look-up (in case the object already exists) x.cols <- get("cached.obj", .my_environ) } else { # time-consuming part (+ cache) x.cols <- split(x, col(x)) Sys.sleep(1) assign("cached.obj", x.cols, envir = .my_environ) } ## Do something with the result from above (here: pick out two randomly ## chosen columns) x.cols[sample(1:1000, size = 2)] } }) ## Main function main <- function() { x <- matrix(rnorm(100*1000), ncol = 1000) res <- replicate(5, aux(x)) rm(.my_environ) # TODO: Trying to remove the environment res } ## Testing set.seed(271) system.time(main()) # => ~ 1s since the cached object is found system.time(main()) # => ~ 0s (instead of ~ 1s) system.time(main()) # => ~ 0s (instead of ~ 1s) Do you know a solution for this? Background information: This is indeed a problem from a package which draws many (sub)plots within a single plot. Each single (sub)plot needs to access the data for plotting but does not known about the other (sub)plots... Thought this might be interesting in general for caching results. Thanks & cheers, Marius On Mon, Aug 29, 2016 at 7:59 PM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:> On 29/08/2016 1:36 PM, Marius Hofert wrote: >> Hi, >> >> I have a function main() which calls another function aux() many times. aux() >> mostly does the same operations based on an object and thus I would like it to >> compute and store this object for each call from main() only once. >> >> Below are two versions of a MWE. The first one computes the right result (but is >> merely there for showing what I would like to have; well, apart from the >> environment .my_environ still floating around after main() is called). >> It works with an >> environment .my_environ in which the computed object is stored. The >> second MWE tries to set >> up the environment inside aux(), but neither the check of existence in >> aux() nor the >> removal of the whole environment in main() work (see 'TODO' below). How can this >> be achieved? >> > > If you create aux in a local() call, it can have persistent storage, > because local() creates an environment to hold it. For example, > > aux <- local({ > persistent <- NULL > function(x) { > if (!is.null(persistent)) > message("Previous arg was ", persistent) > persistent <<- x > } > }) > > Note that the assignment uses <<- to work in the local-created > environment rather than purely locally within the evaluation frame of > the call. You need to create the variable "persistent" there, or the > assignment would go to the global environment, which is bad. > > This gives > > > aux(1) > > aux(2) > Previous arg was 1 > > aux(3) > Previous arg was 2 > > Duncan Murdoch > >> Cheers, >> Marius >> >> >> ### Version 1: Setting up the environment in .GlobalEnv ######################## >> >> .my_environ <- new.env(hash = FALSE, parent = emptyenv()) # define the >> environment >> >> ## Auxiliary function with caching >> aux <- function(x) { >> ## Setting up the environment and caching >> if(exists("cached.obj", envir = .my_environ)) { # look-up (in case >> the object already exists) >> x.cols <- get("cached.obj", .my_environ) >> } else { # time-consuming part (+ cache) >> x.cols <- split(x, col(x)) >> Sys.sleep(1) >> assign("cached.obj", x.cols, envir = .my_environ) >> } >> ## Do something with the result from above (here: pick out two randomly >> ## chosen columns) >> x.cols[sample(1:1000, size = 2)] >> } >> >> ## Main function >> main <- function() { >> x <- matrix(rnorm(100*1000), ncol = 1000) >> res <- replicate(5, aux(x)) >> rm(cached.obj, envir = .my_environ) # only removing the *object* >> (but not the environment) >> res >> } >> >> ## Testing >> set.seed(271) >> system.time(main()) # => ~ 1s since the cached object is found >> >> >> ### Version 2: Trying to set up the environment inside aux() ################### >> >> ## Auxiliary function with caching >> aux <- function(x) { >> ## Setting up the environment and caching >> if(!exists(".my_environ", mode = "environmnent")) # TODO: How to >> check the existence of the environment? This is always TRUE... >> .my_environ <- new.env(hash = FALSE, parent = emptyenv()) # >> define the environment >> if(exists("cached.obj", envir = .my_environ)) { # look-up (in case >> the object already exists) >> x.cols <- get("cached.obj", .my_environ) >> } else { # time-consuming part (+ cache) >> x.cols <- split(x, col(x)) >> Sys.sleep(1) >> assign("cached.obj", x.cols, envir = .my_environ) >> } >> ## Do something with the result from above (here: pick out two randomly >> ## chosen columns) >> x.cols[sample(1:1000, size = 2)] >> } >> >> ## Main function >> main <- function() { >> x <- matrix(rnorm(100*1000), ncol = 1000) >> res <- replicate(5, aux(x)) >> rm(.my_environ) # TODO: How to properly remove the environment? >> res >> } >> >> ## Testing >> set.seed(271) >> system.time(main()) # => ~ 5s since (the cached object in) environment >> .my_environ is not found >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Duncan Murdoch
2016-Aug-29 22:34 UTC
[R] How to test existence of an environment and how to remove it (from within functions)?
On 29/08/2016 4:52 PM, Marius Hofert wrote:> Dear Duncan, > > Thanks a lot for your help. > > I tried to adapt your example to my MWE, but the subsequent calls of > main() are 'too fast' now: new calls of main() should also 'reset' the > environment (as a different x is generated then), that's why I tried > to remove the environment .my_environ from within main(): > > ## Auxiliary function with caching > aux <- local({ > .my_environ <- new.env(hash = FALSE, parent = emptyenv()) # define > the environment > function(x) { > ## Setting up the environment and caching > if(exists("cached.obj", envir = .my_environ)) { # look-up (in > case the object already exists)How do you know that's the "right" value? Just because something is cached doesn't mean it's the value corresponding to "x". You can use some sort of hash to cache values corresponding to particular x values, e.g. hash <- digest::digest(x) if (exists(hash, envir = .my_environ)) x.cols <- get(hash, .my_environ) The memoise package does all of this for you. You provide the slow function aux, then call aux <- memoise(aux) and suddenly it will magically remember previously calculated values. There are ways to get old values to timeout automatically, etc. Duncan Murdoch> x.cols <- get("cached.obj", .my_environ) > } else { # time-consuming part (+ cache) > x.cols <- split(x, col(x)) > Sys.sleep(1) > assign("cached.obj", x.cols, envir = .my_environ) > } > ## Do something with the result from above (here: pick out two randomly > ## chosen columns) > x.cols[sample(1:1000, size = 2)] > } > }) > > ## Main function > main <- function() { > x <- matrix(rnorm(100*1000), ncol = 1000) > res <- replicate(5, aux(x)) > rm(.my_environ) # TODO: Trying to remove the environment > res > } > > ## Testing > set.seed(271) > system.time(main()) # => ~ 1s since the cached object is found > system.time(main()) # => ~ 0s (instead of ~ 1s) > system.time(main()) # => ~ 0s (instead of ~ 1s) > > Do you know a solution for this? > > Background information: > This is indeed a problem from a package which draws many (sub)plots > within a single plot. Each single (sub)plot needs to access the data > for plotting but does not known about the other (sub)plots... Thought > this might be interesting in general for caching results. > > Thanks & cheers, > Marius > > > > On Mon, Aug 29, 2016 at 7:59 PM, Duncan Murdoch > <murdoch.duncan at gmail.com> wrote: >> On 29/08/2016 1:36 PM, Marius Hofert wrote: >>> Hi, >>> >>> I have a function main() which calls another function aux() many times. aux() >>> mostly does the same operations based on an object and thus I would like it to >>> compute and store this object for each call from main() only once. >>> >>> Below are two versions of a MWE. The first one computes the right result (but is >>> merely there for showing what I would like to have; well, apart from the >>> environment .my_environ still floating around after main() is called). >>> It works with an >>> environment .my_environ in which the computed object is stored. The >>> second MWE tries to set >>> up the environment inside aux(), but neither the check of existence in >>> aux() nor the >>> removal of the whole environment in main() work (see 'TODO' below). How can this >>> be achieved? >>> >> >> If you create aux in a local() call, it can have persistent storage, >> because local() creates an environment to hold it. For example, >> >> aux <- local({ >> persistent <- NULL >> function(x) { >> if (!is.null(persistent)) >> message("Previous arg was ", persistent) >> persistent <<- x >> } >> }) >> >> Note that the assignment uses <<- to work in the local-created >> environment rather than purely locally within the evaluation frame of >> the call. You need to create the variable "persistent" there, or the >> assignment would go to the global environment, which is bad. >> >> This gives >> >> > aux(1) >> > aux(2) >> Previous arg was 1 >> > aux(3) >> Previous arg was 2 >> >> Duncan Murdoch >> >>> Cheers, >>> Marius >>> >>> >>> ### Version 1: Setting up the environment in .GlobalEnv ######################## >>> >>> .my_environ <- new.env(hash = FALSE, parent = emptyenv()) # define the >>> environment >>> >>> ## Auxiliary function with caching >>> aux <- function(x) { >>> ## Setting up the environment and caching >>> if(exists("cached.obj", envir = .my_environ)) { # look-up (in case >>> the object already exists) >>> x.cols <- get("cached.obj", .my_environ) >>> } else { # time-consuming part (+ cache) >>> x.cols <- split(x, col(x)) >>> Sys.sleep(1) >>> assign("cached.obj", x.cols, envir = .my_environ) >>> } >>> ## Do something with the result from above (here: pick out two randomly >>> ## chosen columns) >>> x.cols[sample(1:1000, size = 2)] >>> } >>> >>> ## Main function >>> main <- function() { >>> x <- matrix(rnorm(100*1000), ncol = 1000) >>> res <- replicate(5, aux(x)) >>> rm(cached.obj, envir = .my_environ) # only removing the *object* >>> (but not the environment) >>> res >>> } >>> >>> ## Testing >>> set.seed(271) >>> system.time(main()) # => ~ 1s since the cached object is found >>> >>> >>> ### Version 2: Trying to set up the environment inside aux() ################### >>> >>> ## Auxiliary function with caching >>> aux <- function(x) { >>> ## Setting up the environment and caching >>> if(!exists(".my_environ", mode = "environmnent")) # TODO: How to >>> check the existence of the environment? This is always TRUE... >>> .my_environ <- new.env(hash = FALSE, parent = emptyenv()) # >>> define the environment >>> if(exists("cached.obj", envir = .my_environ)) { # look-up (in case >>> the object already exists) >>> x.cols <- get("cached.obj", .my_environ) >>> } else { # time-consuming part (+ cache) >>> x.cols <- split(x, col(x)) >>> Sys.sleep(1) >>> assign("cached.obj", x.cols, envir = .my_environ) >>> } >>> ## Do something with the result from above (here: pick out two randomly >>> ## chosen columns) >>> x.cols[sample(1:1000, size = 2)] >>> } >>> >>> ## Main function >>> main <- function() { >>> x <- matrix(rnorm(100*1000), ncol = 1000) >>> res <- replicate(5, aux(x)) >>> rm(.my_environ) # TODO: How to properly remove the environment? >>> res >>> } >>> >>> ## Testing >>> set.seed(271) >>> system.time(main()) # => ~ 5s since (the cached object in) environment >>> .my_environ is not found >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>