thr3ads.net - R help - [R] S4 vs Reference Classes [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Joseph Park

2011-Sep-13 17:54 UTC

[R] S4 vs Reference Classes

Hi, I'm looking for some guidance on whether to use
   S4 or Reference Classes for an analysis application
   I'm developing.
   I'm a C++/Python developer, and like to 'think' in OOD.
   I started my app with S4, thinking that was the best
   set of OO features in R. However, it appears that one
   needs Reference Classes to allow object methods to assign
   values (other than the .Object in the initialize method)
   to slots of the object.
   This is typically what I prefer: creating an object, then
   operating on the object (reference) calling object methods
   to access/modify slots.
   So I'm wondering what (dis)advantages there are in
   developing with S4 vs Reference Classes.
   Things of interest:
   Performance (i.e. memory management)
   Integration compatibility with R packages
   ??? other issues
   Thanks!

Steve Lianoglou

2011-Sep-13 21:11 UTC

head link

[R] S4 vs Reference Classes

Hi,

On Tue, Sep 13, 2011 at 1:54 PM, Joseph Park <jpark.us at att.net>
wrote:>
> ? Hi, I'm looking for some guidance on whether to use
> ? S4 or Reference Classes for an analysis application
> ? I'm developing.
> ? I'm a C++/Python developer, and like to 'think' in OOD.
> ? I started my app with S4, thinking that was the best
> ? set of OO features in R. However, it appears that one
> ? needs Reference Classes to allow object methods to assign
> ? values (other than the .Object in the initialize method)
> ? to slots of the object.
> ? This is typically what I prefer: creating an object, then
> ? operating on the object (reference) calling object methods
> ? to access/modify slots.
> ? So I'm wondering what (dis)advantages there are in
> ? developing with S4 vs Reference Classes.
> ? Things of interest:
> ? Performance (i.e. memory management)
> ? Integration compatibility with R packages
> ? ??? other issues
I actually don't have much experience with Reference Classes and
(most) all of my R OO(P|D) with S4 (since I'm generally playing w/
bioconductor stuff, which has an S4 mandate).

I'm not sure exactly what you are after, but the way I design many of
my classes to enable them to have *some* pass by reference semantics
is to add a slot of type `environment` to the class def, like so:

setClass("Something",
  representation=representation(x='numeric',
cache='environment'),
  prototype=prototype(x=numeric(), cache=new.env()))

Anything that gets put in `cache` is "passed by ref" so to speak.
Consider this:

R> s1 <- new("Something", x=10)
R> s1 at cache$by.reference <- 'there can be only 1'

R> s2 <- s1
R> s2 at x
[1] 10

R> s2 at x <- 12
R> s2 at x
[1] 12

R> s1 at x
[1] 10

R> s1 at cache$by.reference
[1] "there can be only 1"

R> s2 at cache$by.reference <- 'and then there were 2'
R> s2 at cache$by.reference
[1] "and then there were 2"

R> s1 at cache$by.reference
[1] "and then there were 2"


Proceed with caution ...

HTH,

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Douglas Bates

2011-Sep-13 21:24 UTC

head link

[R] S4 vs Reference Classes

On Tue, Sep 13, 2011 at 12:54 PM, Joseph Park <jpark.us at att.net> wrote:
> ? Hi, I'm looking for some guidance on whether to use
> ? S4 or Reference Classes for an analysis application
> ? I'm developing.
> ? I'm a C++/Python developer, and like to 'think' in OOD.
> ? I started my app with S4, thinking that was the best
> ? set of OO features in R. However, it appears that one
> ? needs Reference Classes to allow object methods to assign
> ? values (other than the .Object in the initialize method)
> ? to slots of the object.
> ? This is typically what I prefer: creating an object, then
> ? operating on the object (reference) calling object methods
> ? to access/modify slots.
> ? So I'm wondering what (dis)advantages there are in
> ? developing with S4 vs Reference Classes.
> ? Things of interest:
> ? Performance (i.e. memory management)
> ? Integration compatibility with R packages
> ? ??? other issues
>From a C++/Python background you will probably feel more comfortablewith reference classes.  They are newer than S4 classes and much newer
than S3 "classes" (which aren't really classes) and methods. 
Because
reference classes are newer the support for them has not been as fully
developed and you may encounter warts from time to time.

I use both reference classes and S4 classes.  Often I have objects
that represent model/data combinations for which the parameter
estimates are to be determined by optimizing a criterion.  In those
cases it makes sense to me to use reference classes because the state
of the object can be changed by a method.  I want to update the
parameters in the object and evaluate the estimation criterion without
needing to copy the entire object.  If you try to perform some kind of
update operation on an S4 object and not cheat in some way (i.e.
adhere to strict functional programming semantics) you need to create
a new instance of the object each time you update it.  When the object
is potentially very large you find yourself worrying about memory
usage if you take that route.  I found that my code started to look
pretty ugly because conceptually I was updating in place but the code
needs to be written as replacements.

Having said all that, you should realize that the style of programming
favored in R, and particularly in R packages, is to regard a method as
determined jointly by the generic function and the class(es) of the
argument(s).  This is different from most other object-oriented
languages in which the class is paramount and a method is just a
member of a class that happens to be code, not data.  You can get a
lot of mileage out of the idiom of defining methods for common
generics (print, plot, summary, ...) for particular S3 or S4 classes.
The structure of R packages favors S3 generics but you can define a
method for an S3 generic applied to an object from an S4 class.  The
only restriction is that S3 generics can only dispatch on the first
argument but that is what happens in a language where the methods are
part of the class definitions.  When you need multiple dispatch S4
generics and methods are worth the pain.

So my current approach is to use S4 classes for objects that are in
some way static but to use reference classes for objects that will
need to be updated when performing some kind of estimation (or other
such operations such as Markov chain Monte Carlo).

Martin Morgan

2011-Sep-14 04:17 UTC

head link

[R] S4 vs Reference Classes

On 09/13/2011 10:54 AM, Joseph Park wrote:>
>     Hi, I'm looking for some guidance on whether to use
>     S4 or Reference Classes for an analysis application
>     I'm developing.
>     I'm a C++/Python developer, and like to 'think' in OOD.
>     I started my app with S4, thinking that was the best
>     set of OO features in R. However, it appears that one
>     needs Reference Classes to allow object methods to assign
>     values (other than the .Object in the initialize method)
>     to slots of the object.
With

   setClass("A",
representation=representation(slt="numeric"))

a slot can be updated with @<- and an object updated with a replacement 
method

   setGeneric("slt<-", function(x, ..., value)
standardGeneric("slt<-"))

   setReplaceMethod("slt", c("A", "numeric"),
function(x, ..., value) {
       x at slt <- value
       x
   })

so

 > a = new("A", slt=1)
 > slt(a) = 2
 > a
An object of class "A"
Slot "slt":
[1] 2

The default initialize method also works as a copy constructor with 
validity check, e.g., allowing multiple slot updates

   setReplaceMethod("slt", c("A", "ANY"),
function(x, ..., value) {
       initialize(x, slt=as.numeric(value))
   })

 > slt(a) = "1"

>     This is typically what I prefer: creating an object, then
>     operating on the object (reference) calling object methods
>     to access/modify slots.
>     So I'm wondering what (dis)advantages there are in
>     developing with S4 vs Reference Classes.
R's copy-on-change semantics leads me to expect that

b = a
slt(a) = 2

leaves b unchanged, which S4 does (necessarily copying and thus with a 
time and memory performance cost). A reference class might be 
appropriate when the entity referred to exists in a single copy, as 
e.g., an on-disk data base, or an external pointer to a C++ class.

Martin
>     Things of interest:
>     Performance (i.e. memory management)
>     Integration compatibility with R packages
>     ??? other issues
>     Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

Steve Lianoglou

2011-Sep-15 13:57 UTC

head link

[R] S4 vs Reference Classes

Hi Joseph (and Martin),

Don't mean to beat a dead horse, but I wanted to add one last comment
to this thread in case someone stumbles upon this via google/gmane (or
you) and gives it a shot.

I neglected to mention a very important step that you'd have to do to
in order to avoid shooting yourself in the foot.

Martin, off list, thankfully pointed out to me that you still need to
define an "initialize" method for your class so that each @cache slot
for every new object defined gets *its own* environment. If you don't,
they all share the same environment when you create new objects
through a call to `new("Element")`.

Here's what happens and how to fix ... it's intentionally a bit
verbose for pedagogical purposes, so please bear with me:

R> setClass("Element",
 representation=representation(x='numeric',
cache='environment'),
 prototype=prototype(x=numeric(), cache=new.env()))

R> a <- new("Element")
R> b <- new("Element")

If we look at the cache object in both `a` and `b`, you'll see that
they actually are *the same* environment:

R> a at cache
<environment: 0x100a23788>

R> b at cache
<environment: 0x100a23788>

See -- those two environments share the same address. So, if you do:

R> a at cache$some.var <- 42
R> a at cache$some.var
[1] 42

R> b at cache$some.var
[1] 42

?Yikes!

If you explicitly set the cache slot to a `new.env()` you can avoid this:

R> a <- new("Element", cache=new.env())
R> b <- new("Element", cache=new.env())
R> a at cache
<environment: 0x10214d5b8>
R> b at cache
<environment: 0x100eff908>

You see the two environments are different, so setting a var into one
@cache won't affect the other:

R> a at cache$some.var <- 42
R> b at cache$some.var
NULL

So that's what you want, but who wants to keep typing
new("Element",
cache=new.env())? Not me, so that's what initialize methods are for.
These are what the ones I have in my libs look like:

setMethod("initialize", "Element",
  function(.Object, ..., x=numeric(), cache=new.env()) {
    callNextMethod(.Object, x=x, cache=cache, ...)
})

Now, with those loaded up:

R> aa <- new("Element")
R> bb <- new("Element")
R> aa at cache
<environment: 0x10312e3f8>

R> bb at cache
<environment: 0x103251ae0>

Problem solved.

Martin suggested a slightly different version of "initialize", like
so:

setMethod(initialize, "Element", function(.Object, ...) {
   callNextMethod(.Object, ..., cache=new.env(parent=emptyenv()))
})

Where he mentions "... with parent=emptyenv() to avoid searching
outside the cache during symbol look-up".

I actually never used that, and don't think I ran into problems (I
always set `inherits=FALSE` if I'm `get`-ing something out of an
environment), but I'd go with his advice over mine any day.

So ...

(i) thanks to Martin for pointing that out; and
(ii) thanks for bearing with me here,

I'll stop now :-)

-steve

On Wed, Sep 14, 2011 at 4:24 PM, Joseph Park <jpark.us at att.net>
wrote:> Thanks Steve.
>
> I'll take a closer look at this.
>
> all the best...
>
>
> On 9/14/2011 4:18 PM, Steve Lianoglou wrote:
>
> Hi,
>
> Just wanted to say that embedding a slot in your class that's an
> environment (as I shown earlier) will still solve your problem w/o you
> having to switch to Ref classes (since you've already done lots of
> work for your app in S4).
>
> Let's assume you have a slot `cache` that is an environment, using
> your latests examples, let's say it's like this:
>
> setClass("Element",
>  representation=representation(x='numeric',
cache='environment'),
>  prototype=prototype(x=numeric(), cache=new.env()))
>
> Let's say "gradient" is something you want to be access by
reference,
> you can have something like this (setGenerics left out for lack of
> time):
>
> setMethod("gradient", "Element", function(x, ...) {
>   if (!'gradient' %in% ls(x at cache)) {
>     x at cache$gradient <- calc.gradient.from.element(x)
>   }
>   x at cache$gradient
> })
>
> Then a call to `gradient(my.obj)` will return the gradient if it
> already calculated, or it will calc it on the fly and set it into your
> object (w/o copying your object) and return it when it's done.
>
> which is my issue. Without the reference-based approach an object
> in a slot which is then included in another object slot is a copy.
> An update to the original object slot then requires 'extra' code
> to update/synchronize the copy.
>
> Again, this "semi-s4-semi-ref-class" approach would run around
this
> issue .. but life might get confusing to you (or your users) depending
> on what one expects as "normal" behavioR.
>
> Just wanted to try to clear up my original intention (if it wasn't
> clear before).
>
> -steve
>
>

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
?| Memorial Sloan-Kettering Cancer Center
?| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Sep 2011 - S4 vs Reference Classes

[R] S4 vs Reference Classes

[R] S4 vs Reference Classes

[R] S4 vs Reference Classes

[R] S4 vs Reference Classes

[R] S4 vs Reference Classes

Apparently Analagous Threads