Giles Percy
2013-May-18 21:50 UTC
[Rd] Copy on assignment to large field of reference class
Dear all I am trying to find the best way to handle large fields in reference classes. As the code below shows assignment via <<- causes many copies to be made if the subsetting is extensive (in modb1). This can cause R to run out of memory. Creating a local copy and using the optimisation in <- is the best solution I have found so far (in modb2) - but it is not really much better than ordinary functions using call by value and then reassigning. Is there a reason why optimisation does not occur for <<- ? Or is their a better solution for reference classes? Regards Giles A <- setRefClass("A", fields=list(b="vector")) A$methods( initialize=function() { b <<- 1:10000 }, modb1=function() { # simple subsetting for illustration for(i in 2:length(b)) b[i] <<- b[i-1] + 1 }, modb2=function() { bb <- b for(i in 2:length(b)) bb[i] <- bb[i-1] + 1 b <<- bb } ) a <- new("A") tracemem(a$b) a$modb1() a$modb2() [[alternative HTML version deleted]]
John Chambers
2013-May-19 21:59 UTC
[Rd] Copy on assignment to large field of reference class
This is a useful observation. To talk about it, though, we need to re-express it in terms that make sense for R; there are too many misconceptions otherwise. The basic observation is this: When simple subset or element replacement is done in a loop, normally the object is only copied on the first time through the loop. This is true whether using local assignment, <-, or global assignment, <<-. However, if global assignment is done in a method to replace in a field, the object is copied every time. For long loops this makes for substantial overhead. Very relevant observation. What's going on? The non-copying depends on the fact that `[<-` is a primitive function. When a field is declared with a class ("vector" in the example), its assignment is done by an R function that checks the validity (via what's called an "active binding" in R). That causes the extra copy on each assignment. (To be honest, I don't totally understand why, but I have no intention of messing with the active binding code.) What to do about it? There are two solutions; either take the attitude that field assignment is basically inefficient and don't do it in a loop, as in method modb2. Or don't declare a class for the field, in which case no active binding is used. Check this out by changing the class definition to setRefClass("A", fields="b"). I prefer the first solution since it retains the validity check on the field. John PS: A few comments. - it makes no sense to expect _greater_ efficiency than for a simple assignment. The object in a$b is NOT a reference object so its manipulation obeys R's normal rules. - all this only applies to replacement functions that are primitives. Otherwise you're stuck with copies each time. - Please don't use the term "call by value" for R; that's not how R's evaluation works and has nothing to do with when duplication takes place. That topic is not for the faint of heart, but basically when R knows that there is only one reference to an object, it doesn't copy. But in practice this is mainly when a primitive replacement function is used. On May 18, 2013, at 2:50 PM, Giles Percy <giles.percy at gmail.com> wrote:> Dear all > > I am trying to find the best way to handle large fields in reference > classes. > > As the code below shows assignment via <<- causes many copies to be made if > the subsetting is extensive (in modb1). This can cause R to run out of > memory. Creating a local copy and using the optimisation in <- is the best > solution I have found so far (in modb2) - but it is not really much better > than ordinary functions using call by value and then reassigning. > > Is there a reason why optimisation does not occur for <<- ? Or is their a > better solution for reference classes? > > Regards > Giles > > A <- setRefClass("A", fields=list(b="vector")) > > A$methods( > initialize=function() { > b <<- 1:10000 > }, > modb1=function() { > # simple subsetting for illustration > for(i in 2:length(b)) b[i] <<- b[i-1] + 1 > }, > modb2=function() { > bb <- b > for(i in 2:length(b)) bb[i] <- bb[i-1] + 1 > b <<- bb > } > ) > a <- new("A") > tracemem(a$b) > > a$modb1() > > a$modb2() > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Apparently Analagous Threads
- Assignment operator and deep copy for calling C functions
- Why is vector assignment in R recreates the entire vector ?
- Reference class finalize() fails with 'attempt to apply non-function'
- setting invalid fields on reference classes sometimes allowed
- Class generator functions for reference classes