Hi, I'm looking for some guidance on whether to use S4 or Reference Classes for an analysis application I'm developing. I'm a C++/Python developer, and like to 'think' in OOD. I started my app with S4, thinking that was the best set of OO features in R. However, it appears that one needs Reference Classes to allow object methods to assign values (other than the .Object in the initialize method) to slots of the object. This is typically what I prefer: creating an object, then operating on the object (reference) calling object methods to access/modify slots. So I'm wondering what (dis)advantages there are in developing with S4 vs Reference Classes. Things of interest: Performance (i.e. memory management) Integration compatibility with R packages ??? other issues Thanks!
Hi, On Tue, Sep 13, 2011 at 1:54 PM, Joseph Park <jpark.us at att.net> wrote:> > ? Hi, I'm looking for some guidance on whether to use > ? S4 or Reference Classes for an analysis application > ? I'm developing. > ? I'm a C++/Python developer, and like to 'think' in OOD. > ? I started my app with S4, thinking that was the best > ? set of OO features in R. However, it appears that one > ? needs Reference Classes to allow object methods to assign > ? values (other than the .Object in the initialize method) > ? to slots of the object. > ? This is typically what I prefer: creating an object, then > ? operating on the object (reference) calling object methods > ? to access/modify slots. > ? So I'm wondering what (dis)advantages there are in > ? developing with S4 vs Reference Classes. > ? Things of interest: > ? Performance (i.e. memory management) > ? Integration compatibility with R packages > ? ??? other issuesI actually don't have much experience with Reference Classes and (most) all of my R OO(P|D) with S4 (since I'm generally playing w/ bioconductor stuff, which has an S4 mandate). I'm not sure exactly what you are after, but the way I design many of my classes to enable them to have *some* pass by reference semantics is to add a slot of type `environment` to the class def, like so: setClass("Something", representation=representation(x='numeric', cache='environment'), prototype=prototype(x=numeric(), cache=new.env())) Anything that gets put in `cache` is "passed by ref" so to speak. Consider this: R> s1 <- new("Something", x=10) R> s1 at cache$by.reference <- 'there can be only 1' R> s2 <- s1 R> s2 at x [1] 10 R> s2 at x <- 12 R> s2 at x [1] 12 R> s1 at x [1] 10 R> s1 at cache$by.reference [1] "there can be only 1" R> s2 at cache$by.reference <- 'and then there were 2' R> s2 at cache$by.reference [1] "and then there were 2" R> s1 at cache$by.reference [1] "and then there were 2" Proceed with caution ... HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
On Tue, Sep 13, 2011 at 12:54 PM, Joseph Park <jpark.us at att.net> wrote:> ? Hi, I'm looking for some guidance on whether to use > ? S4 or Reference Classes for an analysis application > ? I'm developing. > ? I'm a C++/Python developer, and like to 'think' in OOD. > ? I started my app with S4, thinking that was the best > ? set of OO features in R. However, it appears that one > ? needs Reference Classes to allow object methods to assign > ? values (other than the .Object in the initialize method) > ? to slots of the object. > ? This is typically what I prefer: creating an object, then > ? operating on the object (reference) calling object methods > ? to access/modify slots. > ? So I'm wondering what (dis)advantages there are in > ? developing with S4 vs Reference Classes. > ? Things of interest: > ? Performance (i.e. memory management) > ? Integration compatibility with R packages > ? ??? other issues>From a C++/Python background you will probably feel more comfortablewith reference classes. They are newer than S4 classes and much newer than S3 "classes" (which aren't really classes) and methods. Because reference classes are newer the support for them has not been as fully developed and you may encounter warts from time to time. I use both reference classes and S4 classes. Often I have objects that represent model/data combinations for which the parameter estimates are to be determined by optimizing a criterion. In those cases it makes sense to me to use reference classes because the state of the object can be changed by a method. I want to update the parameters in the object and evaluate the estimation criterion without needing to copy the entire object. If you try to perform some kind of update operation on an S4 object and not cheat in some way (i.e. adhere to strict functional programming semantics) you need to create a new instance of the object each time you update it. When the object is potentially very large you find yourself worrying about memory usage if you take that route. I found that my code started to look pretty ugly because conceptually I was updating in place but the code needs to be written as replacements. Having said all that, you should realize that the style of programming favored in R, and particularly in R packages, is to regard a method as determined jointly by the generic function and the class(es) of the argument(s). This is different from most other object-oriented languages in which the class is paramount and a method is just a member of a class that happens to be code, not data. You can get a lot of mileage out of the idiom of defining methods for common generics (print, plot, summary, ...) for particular S3 or S4 classes. The structure of R packages favors S3 generics but you can define a method for an S3 generic applied to an object from an S4 class. The only restriction is that S3 generics can only dispatch on the first argument but that is what happens in a language where the methods are part of the class definitions. When you need multiple dispatch S4 generics and methods are worth the pain. So my current approach is to use S4 classes for objects that are in some way static but to use reference classes for objects that will need to be updated when performing some kind of estimation (or other such operations such as Markov chain Monte Carlo).
On 09/13/2011 10:54 AM, Joseph Park wrote:> > Hi, I'm looking for some guidance on whether to use > S4 or Reference Classes for an analysis application > I'm developing. > I'm a C++/Python developer, and like to 'think' in OOD. > I started my app with S4, thinking that was the best > set of OO features in R. However, it appears that one > needs Reference Classes to allow object methods to assign > values (other than the .Object in the initialize method) > to slots of the object.With setClass("A", representation=representation(slt="numeric")) a slot can be updated with @<- and an object updated with a replacement method setGeneric("slt<-", function(x, ..., value) standardGeneric("slt<-")) setReplaceMethod("slt", c("A", "numeric"), function(x, ..., value) { x at slt <- value x }) so > a = new("A", slt=1) > slt(a) = 2 > a An object of class "A" Slot "slt": [1] 2 The default initialize method also works as a copy constructor with validity check, e.g., allowing multiple slot updates setReplaceMethod("slt", c("A", "ANY"), function(x, ..., value) { initialize(x, slt=as.numeric(value)) }) > slt(a) = "1"> This is typically what I prefer: creating an object, then > operating on the object (reference) calling object methods > to access/modify slots. > So I'm wondering what (dis)advantages there are in > developing with S4 vs Reference Classes.R's copy-on-change semantics leads me to expect that b = a slt(a) = 2 leaves b unchanged, which S4 does (necessarily copying and thus with a time and memory performance cost). A reference class might be appropriate when the entity referred to exists in a single copy, as e.g., an on-disk data base, or an external pointer to a C++ class. Martin> Things of interest: > Performance (i.e. memory management) > Integration compatibility with R packages > ??? other issues > Thanks! > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793
Hi Joseph (and Martin), Don't mean to beat a dead horse, but I wanted to add one last comment to this thread in case someone stumbles upon this via google/gmane (or you) and gives it a shot. I neglected to mention a very important step that you'd have to do to in order to avoid shooting yourself in the foot. Martin, off list, thankfully pointed out to me that you still need to define an "initialize" method for your class so that each @cache slot for every new object defined gets *its own* environment. If you don't, they all share the same environment when you create new objects through a call to `new("Element")`. Here's what happens and how to fix ... it's intentionally a bit verbose for pedagogical purposes, so please bear with me: R> setClass("Element", representation=representation(x='numeric', cache='environment'), prototype=prototype(x=numeric(), cache=new.env())) R> a <- new("Element") R> b <- new("Element") If we look at the cache object in both `a` and `b`, you'll see that they actually are *the same* environment: R> a at cache <environment: 0x100a23788> R> b at cache <environment: 0x100a23788> See -- those two environments share the same address. So, if you do: R> a at cache$some.var <- 42 R> a at cache$some.var [1] 42 R> b at cache$some.var [1] 42 ?Yikes! If you explicitly set the cache slot to a `new.env()` you can avoid this: R> a <- new("Element", cache=new.env()) R> b <- new("Element", cache=new.env()) R> a at cache <environment: 0x10214d5b8> R> b at cache <environment: 0x100eff908> You see the two environments are different, so setting a var into one @cache won't affect the other: R> a at cache$some.var <- 42 R> b at cache$some.var NULL So that's what you want, but who wants to keep typing new("Element", cache=new.env())? Not me, so that's what initialize methods are for. These are what the ones I have in my libs look like: setMethod("initialize", "Element", function(.Object, ..., x=numeric(), cache=new.env()) { callNextMethod(.Object, x=x, cache=cache, ...) }) Now, with those loaded up: R> aa <- new("Element") R> bb <- new("Element") R> aa at cache <environment: 0x10312e3f8> R> bb at cache <environment: 0x103251ae0> Problem solved. Martin suggested a slightly different version of "initialize", like so: setMethod(initialize, "Element", function(.Object, ...) { callNextMethod(.Object, ..., cache=new.env(parent=emptyenv())) }) Where he mentions "... with parent=emptyenv() to avoid searching outside the cache during symbol look-up". I actually never used that, and don't think I ran into problems (I always set `inherits=FALSE` if I'm `get`-ing something out of an environment), but I'd go with his advice over mine any day. So ... (i) thanks to Martin for pointing that out; and (ii) thanks for bearing with me here, I'll stop now :-) -steve On Wed, Sep 14, 2011 at 4:24 PM, Joseph Park <jpark.us at att.net> wrote:> Thanks Steve. > > I'll take a closer look at this. > > all the best... > > > On 9/14/2011 4:18 PM, Steve Lianoglou wrote: > > Hi, > > Just wanted to say that embedding a slot in your class that's an > environment (as I shown earlier) will still solve your problem w/o you > having to switch to Ref classes (since you've already done lots of > work for your app in S4). > > Let's assume you have a slot `cache` that is an environment, using > your latests examples, let's say it's like this: > > setClass("Element", > representation=representation(x='numeric', cache='environment'), > prototype=prototype(x=numeric(), cache=new.env())) > > Let's say "gradient" is something you want to be access by reference, > you can have something like this (setGenerics left out for lack of > time): > > setMethod("gradient", "Element", function(x, ...) { > if (!'gradient' %in% ls(x at cache)) { > x at cache$gradient <- calc.gradient.from.element(x) > } > x at cache$gradient > }) > > Then a call to `gradient(my.obj)` will return the gradient if it > already calculated, or it will calc it on the fly and set it into your > object (w/o copying your object) and return it when it's done. > > which is my issue. Without the reference-based approach an object > in a slot which is then included in another object slot is a copy. > An update to the original object slot then requires 'extra' code > to update/synchronize the copy. > > Again, this "semi-s4-semi-ref-class" approach would run around this > issue .. but life might get confusing to you (or your users) depending > on what one expects as "normal" behavioR. > > Just wanted to try to clear up my original intention (if it wasn't > clear before). > > -steve > >-- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact