John Chambers
2013-Jan-05 18:55 UTC
[Rd] Small changes to big objects (2): Local Reference Classes
Back to the scenario in my email of Jan. 3: We have objects with some large (or very large) components and some other components as well. We need to modify the smaller stuff but are not changing the big data. How can we avoid copying the big data? (A use case might be some modeling of large data where we want to save various versions, all including the same original data but differing in some stored parameters, estimates, etc.) A new kind of class, "local reference classes" has been added to r-devel (rev. 61562). It's the idea that using these classes to represent data can avoid copying that's not needed, while retaining the standard R functional semantics, or close to that. For a quick look, see ?LocalReferenceClasses. Here is the idea. We imagine that our object has components/slots/attributes/fields "BigData", say, and "twiddle". With normal R evaluation, replacing "twiddle" in the object will cause internal duplication of the whole thing, in the very likely case that we pass some object, myX say as argument x to a function. As soon as the evaluator sees a replacement function, "@<-", "$<-" or "attr<-" for an ordinary object, the EnsureLocal routine calls duplicate() if the object has more than one reference, as it will in this scenario. And BigData gets copied. I think it's important to understand that this follows from the "replacement function" concept in S and R: A replacement function takes an object from the frame, does whatever it does, and returns a replacement for this object. The evaluator doesn't know what the replacement function does, so the EnsureLocal strategy is inevitable. There is one trapdoor, however. duplicate() does essentially nothing for data types that are references, most importantly for environments. That's the basis for reference classes. But a reference class is not exactly what we want here. Our different models share the BigData but should not share the same other fields. If I twiddle parameters in one model, it better not change another model. So it's R's standard "functional" semantics we want. In fact, R is not strictly a functional language. Rather it has the idea of "local references": ordinary assignments change the references in the local frame but have no external effect. Local reference classes implement essentially this using reference class fields. Specifically, calling a method $ensureLocal() on an object, directly or via replacing a field, causes a *shallow* copy of the object to be created and remembered locally. Subsequent replacements have no effect on the object passed in to the function. The implementation is fairly simple, but the programmer does have to be aware of what's happening, to some extent. Please look it over and play with it if it seems interesting. John