Dear all -
I need to have a function maintain a persistent lookup table of results for an
expensive calculation, a named vector or hash. I know that I can just keep the
table in the global environment. One problem with this approach is that the
function should be able to delete/recalculate the table and I don't like
side-effects in the global environment. This table really should be private.
What I don't know is:
-A- how can I keep the table in an environment that is private to the function
but persistent for the session?
-B- how can I store and reload such table?
-C- most importantly: is that the right strategy to initialize and maintain
state in a function in the first place?
For illustration ...
-----------------------------------
myDist <- function(a, b) {
# retrieve or calculate distances
if (!exists("Vals")) {
Vals <<- numeric() # the lookup table for distance values
# here, created in the global env.
}
key <- sprintf("X%d.%d", a, b)
thisDist <- Vals[key]
if (is.na(thisDist)) { # Hasn't been calculated yet ...
cat("Calculating ... ")
thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function
...
Vals[key] <<- thisDist # store in global table
}
return(thisDist)
}
# run this
set.seed(112358)
for (i in 1:10) {
x <- sample(1:3, 2)
print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2])))
}
Thanks!
Boris
Use an environment to hold your table. ?new.env or ?local (I leave it to you to work out details) Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, Mar 19, 2016 at 9:45 AM, Boris Steipe <boris.steipe at utoronto.ca> wrote:> Dear all - > > I need to have a function maintain a persistent lookup table of results for an expensive calculation, a named vector or hash. I know that I can just keep the table in the global environment. One problem with this approach is that the function should be able to delete/recalculate the table and I don't like side-effects in the global environment. This table really should be private. What I don't know is: > -A- how can I keep the table in an environment that is private to the function but persistent for the session? > -B- how can I store and reload such table? > -C- most importantly: is that the right strategy to initialize and maintain state in a function in the first place? > > > For illustration ... > > ----------------------------------- > > myDist <- function(a, b) { > # retrieve or calculate distances > if (!exists("Vals")) { > Vals <<- numeric() # the lookup table for distance values > # here, created in the global env. > } > key <- sprintf("X%d.%d", a, b) > thisDist <- Vals[key] > if (is.na(thisDist)) { # Hasn't been calculated yet ... > cat("Calculating ... ") > thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function ... > Vals[key] <<- thisDist # store in global table > } > return(thisDist) > } > > > # run this > set.seed(112358) > > for (i in 1:10) { > x <- sample(1:3, 2) > print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2]))) > } > > > Thanks! > Boris > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
package memoise might help you> On 19 Mar 2016, at 17:45, Boris Steipe <boris.steipe at utoronto.ca> wrote: > > Dear all - > > I need to have a function maintain a persistent lookup table of results for an expensive calculation, a named vector or hash. I know that I can just keep the table in the global environment. One problem with this approach is that the function should be able to delete/recalculate the table and I don't like side-effects in the global environment. This table really should be private. What I don't know is: > -A- how can I keep the table in an environment that is private to the function but persistent for the session? > -B- how can I store and reload such table? > -C- most importantly: is that the right strategy to initialize and maintain state in a function in the first place? > > > For illustration ... > > ----------------------------------- > > myDist <- function(a, b) { > # retrieve or calculate distances > if (!exists("Vals")) { > Vals <<- numeric() # the lookup table for distance values > # here, created in the global env. > } > key <- sprintf("X%d.%d", a, b) > thisDist <- Vals[key] > if (is.na(thisDist)) { # Hasn't been calculated yet ... > cat("Calculating ... ") > thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function ... > Vals[key] <<- thisDist # store in global table > } > return(thisDist) > } > > > # run this > set.seed(112358) > > for (i in 1:10) { > x <- sample(1:3, 2) > print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2]))) > } > > > Thanks! > Boris > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 670 bytes Desc: Message signed with OpenPGP using GPGMail URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20160319/ed1cbd29/attachment.bin>
On 19/03/2016 12:45 PM, Boris Steipe wrote:> Dear all - > > I need to have a function maintain a persistent lookup table of results for an expensive calculation, a named vector or hash. I know that I can just keep the table in the global environment. One problem with this approach is that the function should be able to delete/recalculate the table and I don't like side-effects in the global environment. This table really should be private. What I don't know is: > -A- how can I keep the table in an environment that is private to the function but persistent for the session? > -B- how can I store and reload such table? > -C- most importantly: is that the right strategy to initialize and maintain state in a function in the first place? > > > For illustration ... > > ----------------------------------- > > myDist <- function(a, b) { > # retrieve or calculate distances > if (!exists("Vals")) { > Vals <<- numeric() # the lookup table for distance values > # here, created in the global env. > } > key <- sprintf("X%d.%d", a, b) > thisDist <- Vals[key] > if (is.na(thisDist)) { # Hasn't been calculated yet ... > cat("Calculating ... ") > thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function ... > Vals[key] <<- thisDist # store in global table > } > return(thisDist) > } > > > # run this > set.seed(112358) > > for (i in 1:10) { > x <- sample(1:3, 2) > print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2]))) > }Use local() to create a persistent environment for the function. For example: f <- local({ x <- NULL function(y) { cat("last x was ", x, "\n") x <<- y } }) Then: > f(3) last x was > f(4) last x was 3 > f(12) last x was 4 Duncan Murdoch
>>>>> Duncan Murdoch <murdoch.duncan at gmail.com> >>>>> on Sat, 19 Mar 2016 17:57:56 -0400 writes:> On 19/03/2016 12:45 PM, Boris Steipe wrote: >> Dear all - >> >> I need to have a function maintain a persistent lookup table of results for an expensive calculation, a named vector or hash. I know that I can just keep the table in the global environment. One problem with this approach is that the function should be able to delete/recalculate the table and I don't like side-effects in the global environment. This table really should be private. What I don't know is: >> -A- how can I keep the table in an environment that is private to the function but persistent for the session? >> -B- how can I store and reload such table? >> -C- most importantly: is that the right strategy to initialize and maintain state in a function in the first place? >> >> >> For illustration ... >> >> ----------------------------------- >> >> myDist <- function(a, b) { >> # retrieve or calculate distances >> if (!exists("Vals")) { >> Vals <<- numeric() # the lookup table for distance values >> # here, created in the global env. >> } >> key <- sprintf("X%d.%d", a, b) >> thisDist <- Vals[key] >> if (is.na(thisDist)) { # Hasn't been calculated yet ... >> cat("Calculating ... ") >> thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function ... >> Vals[key] <<- thisDist # store in global table >> } >> return(thisDist) >> } >> >> >> # run this >> set.seed(112358) >> >> for (i in 1:10) { >> x <- sample(1:3, 2) >> print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2]))) >> } > Use local() to create a persistent environment for the function. For > example: > f <- local({ > x <- NULL > function(y) { > cat("last x was ", x, "\n") > x <<- y > } > }) > Then: >> f(3) > last x was >> f(4) > last x was 3 >> f(12) > last x was 4 > Duncan Murdoch Yes, indeed. Or use another function {than 'local()'} which returns a function: The functions approxfun(), splinefun() and ecdf() are "base R" functions which return functions "with a non-trivial environment" as I use to say. Note that this is *the* proper R way solving your problem. The fact that this works as it works is called "lexical scoping" and also the reason why (((regular, i.e., non-primitive))) functions in R are called closures. When R was created > 20 years ago, this has been the distinguishing language feature of R (in comparison to S / S-plus). Enjoy! - Martin
Boris, You may want to look into the R6 package. This package has tools that help create objects (environments) with methods that can use and change the object. You can have your persistent table stored as part of your object and then create methods that will use and modify the table within the object. On Sat, Mar 19, 2016 at 10:45 AM, Boris Steipe <boris.steipe at utoronto.ca> wrote:> Dear all - > > I need to have a function maintain a persistent lookup table of results for an expensive calculation, a named vector or hash. I know that I can just keep the table in the global environment. One problem with this approach is that the function should be able to delete/recalculate the table and I don't like side-effects in the global environment. This table really should be private. What I don't know is: > -A- how can I keep the table in an environment that is private to the function but persistent for the session? > -B- how can I store and reload such table? > -C- most importantly: is that the right strategy to initialize and maintain state in a function in the first place? > > > For illustration ... > > ----------------------------------- > > myDist <- function(a, b) { > # retrieve or calculate distances > if (!exists("Vals")) { > Vals <<- numeric() # the lookup table for distance values > # here, created in the global env. > } > key <- sprintf("X%d.%d", a, b) > thisDist <- Vals[key] > if (is.na(thisDist)) { # Hasn't been calculated yet ... > cat("Calculating ... ") > thisDist <- sqrt(a^2 + b^2) # calculate with some expensive function ... > Vals[key] <<- thisDist # store in global table > } > return(thisDist) > } > > > # run this > set.seed(112358) > > for (i in 1:10) { > x <- sample(1:3, 2) > print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1], x[2]))) > } > > > Thanks! > Boris > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Gregory (Greg) L. Snow Ph.D. 538280 at gmail.com
Use a local environment to as a place to store state. Update with <<-
and resolve symbol references through lexical scope E.g.,
persist <- local({
last <- NULL # initialize
function(value) {
if (!missing(value))
last <<- value # update with <<-
last # use
}
})
and in action
> persist("foo")
[1] "foo"
> persist()
[1] "foo"
> persist("bar")
[1] "bar"
> persist()
[1] "bar"
A variant is to use a 'factory' function
factory <- function(init) {
stopifnot(!missing(init))
last <- init
function(value) {
if (!missing(value))
last <<- value
last
}
}
and
> p1 = factory("foo")
> p2 = factory("bar")
> c(p1(), p2())
[1] "foo" "bar"
> c(p1(), p2("foo"))
[1] "foo" "foo"
> c(p1(), p2())
[1] "foo" "foo"
The 'bank account' exercise in section 10.7 of
RShowDoc("R-intro")
illustrates this.
Martin
On 03/19/2016 12:45 PM, Boris Steipe wrote:> Dear all -
>
> I need to have a function maintain a persistent lookup table of results for
an expensive calculation, a named vector or hash. I know that I can just keep
the table in the global environment. One problem with this approach is that the
function should be able to delete/recalculate the table and I don't like
side-effects in the global environment. This table really should be private.
What I don't know is:
> -A- how can I keep the table in an environment that is private to the
function but persistent for the session?
> -B- how can I store and reload such table?
> -C- most importantly: is that the right strategy to initialize and
maintain state in a function in the first place?
>
>
> For illustration ...
>
> -----------------------------------
>
> myDist <- function(a, b) {
> # retrieve or calculate distances
> if (!exists("Vals")) {
> Vals <<- numeric() # the lookup table for distance values
> # here, created in the global env.
> }
> key <- sprintf("X%d.%d", a, b)
> thisDist <- Vals[key]
> if (is.na(thisDist)) { # Hasn't been calculated yet ...
> cat("Calculating ... ")
> thisDist <- sqrt(a^2 + b^2) # calculate with some expensive
function ...
> Vals[key] <<- thisDist # store in global table
> }
> return(thisDist)
> }
>
>
> # run this
> set.seed(112358)
>
> for (i in 1:10) {
> x <- sample(1:3, 2)
> print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1],
x[2])))
> }
>
>
> Thanks!
> Boris
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee or agent
responsible for the delivery of this message to the intended recipient(s), you
are hereby notified that any disclosure, copying, distribution, or use of this
email message is prohibited. If you have received this message in error, please
notify the sender immediately by e-mail and delete this email message from your
computer. Thank you.
All -
Thanks, this has been a real eye-opener.
Here's my variation based on what I've learned so far. It's based on
Bert's earlier function-returning-a-closure example. I hope I got the
terminology right.
# =======================================================
makeCache <- function(){ # returns a "closure",
# i.e. a function
# plus its private, lexically
# scoped environment
myCache <- numeric() # a variable that we want to persist;
# makeCache() creates the
# environment that holds myCache and
# the function useCache() that uses myCache
useCache <- function(x){
myCache <<- c(myCache, x) # appends a value to myCache
# <<- does _not_ assign to the
# global environment, but searches
# through the parent environments
# and assigns to the global environment
# only if no match was found along
# the way.
print(myCache)
}
return(useCache) # return the function plus its environment
}
# ======= creating instances of the closure and using them
cacheThis <- makeCache() # cacheThis is the closure that was created
# by makeCache
cacheThis(17) # 17
cacheThis(13) # 17 13
cacheThis(11) # 17 13 11
cacheThat <- makeCache() # create another closure
cacheThat(1) # 1
cacheThat(2) # 1 2
cacheThat(3) # 1 2 3
cacheThat(5) # 1 2 3 5
# ======= accessing the private variables
# The caches for cacheThis() and cacheThat() are not visible
# from the (default) global environment:
ls() # [1] "cacheThat" "cacheThis" "makeCache"
# To access them from the global environment, use
# ls(), exists(), get() and assign(), with their environment
# argument:
ls.str(envir = environment(cacheThis))
ls.str(envir = environment(cacheThat))
exists("myCache", envir = environment(cacheThat))
exists("noSuchThing", envir = environment(cacheThat))
# The following won't work - save() needs a name as symbol or string:
save(get("myCache", envir = environment(cacheThis)),
file="myCache.Rdata")
# do this instead:
tmp <- get("myCache", envir = environment(cacheThis))
save(tmp, file="myCache.Rdata")
rm(tmp)
# add a number we don't want...
cacheThis(6) # 17 13 11 6
# restore cache from saved version
load("myCache.Rdata") # this recreates "tmp"
assign("myCache", tmp, envir = environment(cacheThis))
# cache another prime ...
cacheThis(7) # 17 13 11 7
# etc.
# =======================================================
I don't yet understand the pros and cons of using local() instead of a
generating function. From my current understanding, local() should end up doing
the same thing - I think that's why Martin calls one a "variant"
of the other. But I'll play some more with this later today. Is there a
Preferred Way?
memoise has some nice ideas - such as creating a hash from the arguments passed
into a function to see if the cached results need to be recomputed. In my use
case, I'd like to have more explicit access to the cached results to be able
to store, reload and otherwise manipulate them.
I haven't looked at R6 yet.
Cheers,
Boris
On Mar 23, 2016, at 5:58 PM, Martin Morgan <martin.morgan at
roswellpark.org> wrote:
> Use a local environment to as a place to store state. Update with <<-
and resolve symbol references through lexical scope E.g.,
>
> persist <- local({
> last <- NULL # initialize
> function(value) {
> if (!missing(value))
> last <<- value # update with <<-
> last # use
> }
> })
>
> and in action
>
> > persist("foo")
> [1] "foo"
> > persist()
> [1] "foo"
> > persist("bar")
> [1] "bar"
> > persist()
> [1] "bar"
>
> A variant is to use a 'factory' function
>
> factory <- function(init) {
> stopifnot(!missing(init))
> last <- init
> function(value) {
> if (!missing(value))
> last <<- value
> last
> }
> }
>
> and
>
> > p1 = factory("foo")
> > p2 = factory("bar")
> > c(p1(), p2())
> [1] "foo" "bar"
> > c(p1(), p2("foo"))
> [1] "foo" "foo"
> > c(p1(), p2())
> [1] "foo" "foo"
>
> The 'bank account' exercise in section 10.7 of
RShowDoc("R-intro") illustrates this.
>
> Martin
>
> On 03/19/2016 12:45 PM, Boris Steipe wrote:
>> Dear all -
>>
>> I need to have a function maintain a persistent lookup table of results
for an expensive calculation, a named vector or hash. I know that I can just
keep the table in the global environment. One problem with this approach is that
the function should be able to delete/recalculate the table and I don't like
side-effects in the global environment. This table really should be private.
What I don't know is:
>> -A- how can I keep the table in an environment that is private to the
function but persistent for the session?
>> -B- how can I store and reload such table?
>> -C- most importantly: is that the right strategy to initialize and
maintain state in a function in the first place?
>>
>>
>> For illustration ...
>>
>> -----------------------------------
>>
>> myDist <- function(a, b) {
>> # retrieve or calculate distances
>> if (!exists("Vals")) {
>> Vals <<- numeric() # the lookup table for distance values
>> # here, created in the global env.
>> }
>> key <- sprintf("X%d.%d", a, b)
>> thisDist <- Vals[key]
>> if (is.na(thisDist)) { # Hasn't been calculated yet
...
>> cat("Calculating ... ")
>> thisDist <- sqrt(a^2 + b^2) # calculate with some expensive
function ...
>> Vals[key] <<- thisDist # store in global table
>> }
>> return(thisDist)
>> }
>>
>>
>> # run this
>> set.seed(112358)
>>
>> for (i in 1:10) {
>> x <- sample(1:3, 2)
>> print(sprintf("d(%d, %d) = %f", x[1], x[2], myDist(x[1],
x[2])))
>> }
>>
>>
>> Thanks!
>> Boris
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
> This email message may contain legally privileged and/or confidential
information. If you are not the intended recipient(s), or the employee or agent
responsible for the delivery of this message to the intended recipient(s), you
are hereby notified that any disclosure, copying, distribution, or use of this
email message is prohibited. If you have received this message in error, please
notify the sender immediately by e-mail and delete this email message from your
computer. Thank you.