Giles Percy
2013-May-18  21:50 UTC
[Rd] Copy on assignment to large field of reference class
Dear all
I am trying to find the best way to handle large fields in reference
classes.
As the code below shows assignment via <<- causes many copies to be made
if
the subsetting is extensive (in modb1). This can cause R to run out of
memory. Creating a local copy and using the optimisation in <- is the best
solution I have found so far (in modb2) - but it is not really much better
than ordinary functions using call by value and then reassigning.
Is there a reason why optimisation does not occur for <<- ? Or is their a
better solution for reference classes?
Regards
Giles
A <- setRefClass("A", fields=list(b="vector"))
A$methods(
  initialize=function() {
b <<- 1:10000
},
  modb1=function() {
# simple subsetting for illustration
for(i in 2:length(b)) b[i] <<- b[i-1] + 1
},
  modb2=function() {
bb <- b
for(i in 2:length(b)) bb[i] <- bb[i-1] + 1
b <<- bb
}
)
a <- new("A")
tracemem(a$b)
a$modb1()
a$modb2()
	[[alternative HTML version deleted]]
John Chambers
2013-May-19  21:59 UTC
[Rd] Copy on assignment to large field of reference class
This is a useful observation.  To talk about it, though, we need to re-express
it in terms that make sense for R; there are too many misconceptions otherwise.
The basic observation is this:  When simple subset or element replacement is
done in a loop, normally the object is only copied on the first time through the
loop.  This is true whether using local assignment, <-, or global assignment,
<<-.
However, if global assignment is done in a method to replace in a field, the
object is copied every time.  For long loops this makes for substantial
overhead.  Very relevant observation.
What's going on?
The non-copying depends on the fact that `[<-` is a primitive function.
When a field is declared with a class ("vector" in the example), its
assignment is done by an R function that checks the validity (via what's
called an "active binding" in R).  That causes the extra copy on each
assignment.  (To be honest, I don't totally understand why, but I have no
intention of messing with the active binding code.)
What to do about it?
There are two solutions; either take the attitude that field assignment is
basically inefficient and don't do it in a loop, as in method modb2.
Or don't declare a class for the field, in which case no active binding is
used.  Check this out by changing the class definition to
setRefClass("A", fields="b").
I prefer the first solution since it retains the validity check on the field.
John
PS: A few comments.
 - it makes no sense to expect _greater_ efficiency than for a simple
assignment.  The object in a$b is NOT a reference object so its manipulation
obeys R's normal rules.
 - all this only applies to replacement functions that are primitives. 
Otherwise you're stuck with copies each time.
 - Please don't use the term "call by value" for R; that's not
how R's evaluation works and has nothing to do with when duplication takes
place.  That topic is not for the faint of heart, but basically when R knows
that there is only one reference to an object, it doesn't copy.  But in
practice this is mainly when a primitive replacement function is used.
On May 18, 2013, at 2:50 PM, Giles Percy <giles.percy at gmail.com> wrote:
> Dear all
> 
> I am trying to find the best way to handle large fields in reference
> classes.
> 
> As the code below shows assignment via <<- causes many copies to be
made if
> the subsetting is extensive (in modb1). This can cause R to run out of
> memory. Creating a local copy and using the optimisation in <- is the
best
> solution I have found so far (in modb2) - but it is not really much better
> than ordinary functions using call by value and then reassigning.
> 
> Is there a reason why optimisation does not occur for <<- ? Or is
their a
> better solution for reference classes?
> 
> Regards
> Giles
> 
> A <- setRefClass("A", fields=list(b="vector"))
> 
> A$methods(
>  initialize=function() {
> b <<- 1:10000
> },
>  modb1=function() {
> # simple subsetting for illustration
> for(i in 2:length(b)) b[i] <<- b[i-1] + 1
> },
>  modb2=function() {
> bb <- b
> for(i in 2:length(b)) bb[i] <- bb[i-1] + 1
> b <<- bb
> }
> )
> a <- new("A")
> tracemem(a$b)
> 
> a$modb1()
> 
> a$modb2()
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
Apparently Analagous Threads
- Assignment operator and deep copy for calling C functions
- Why is vector assignment in R recreates the entire vector ?
- Reference class finalize() fails with 'attempt to apply non-function'
- setting invalid fields on reference classes sometimes allowed
- Class generator functions for reference classes