thr3ads.net - R devel - [Rd] Best style to organize code, namespaces [Feb 2010]

If this information is useful, please help other people find it:
Share via:

Ben

2010-Feb-23 02:49 UTC

[Rd] Best style to organize code, namespaces

Hi all,

I'm hoping someone could tell me what best practices are as far as
keeping programs organized in R.  In most languages, I like to keep
things organized by writing small functions.  So, suppose I want to
write a function that would require helper functions or would just be
too big to write in one piece.  Below are three ways to do this:


################### Style 1 (C-style) ###############
Foo <- function(x) {
  ....
}
Foo.subf <- function(x, blah) {
  ....
}
Foo.subg <- function(x, bar) {
  ....
}

################### Style 2 (Lispish?) ##############
Foo <- function(x) {
  Subf <- function(blah) {
    ....
  }
  Subg <- function(bar) {
    ....
  }
  ....
}

################### Object-Oriented #################
Foo <- function(x) {
  Subf <- function(blah) {
    ....
  }
  Subg <- function(bar) {
    ....
  }
  Main <- function() {
    ....
  }
  return(list(subf=subf, subg=subg, foo=foo))
}
################### End examples ####################

Which of these ways is best?  Style 2 seems at first to be the most
natural in R, but I found there are some major drawbacks.  First, it
is hard to debug.  For instance, if I want to debug Subf, I need to
first "debug(Foo)" and then while Foo is debugging, type
"debug(Subf)".  Another big limitation is that I can't write
test-cases (e.g. using RUnit) for Subf and Subg because they aren't
visible in any way at the global level.

For these reasons, style 1 seems to be better than style 2, if less
elegant.  However, style 1 can get awkward because any parameters
passed to the main function are not visible to the others.  In the
above case, the value of "x" must be passed to Foo.subf and Foo.subg
explicitly.  Also there is no enforcement of code isolation
(i.e. anyone can call Foo.subf).

Style 3 is more explicitly object oriented.  It has the advantage of
style 2 in that you don't need to pass x around, and the advantage of
style 1 in that you can still write tests and easily debug the
subfunctions.  However to actually call the main function you have to
type "Foo(x)$Main()" instead of "Foo(x)", or else write a
wrapper
function for this.  Either way there is more typing.

So anyway, what is the best way to handle this?  R does not seem to
have a good way of managing namespaces or avoiding collisions, like a
module system or explicit object-orientation.  How should we get
around this limitation?  I've looked at sample R code in the
distribution and elsewhere, but so far it's been pretty
disappointing---most people seem to write very long, hard to
understand functions.

Thanks for any advice!

-- 
Ben

Duncan Murdoch

2010-Feb-23 03:05 UTC

head link

[Rd] Best style to organize code, namespaces

On 22/02/2010 9:49 PM, Ben wrote:> Hi all,
> 
> I'm hoping someone could tell me what best practices are as far as
> keeping programs organized in R.  In most languages, I like to keep
> things organized by writing small functions.  So, suppose I want to
> write a function that would require helper functions or would just be
> too big to write in one piece.  Below are three ways to do this:
> 
> 
> ################### Style 1 (C-style) ###############
> Foo <- function(x) {
>   ....
> }
> Foo.subf <- function(x, blah) {
>   ....
> }
> Foo.subg <- function(x, bar) {
>   ....
> }
> 
> ################### Style 2 (Lispish?) ##############
> Foo <- function(x) {
>   Subf <- function(blah) {
>     ....
>   }
>   Subg <- function(bar) {
>     ....
>   }
>   ....
> }
> 
> ################### Object-Oriented #################
> Foo <- function(x) {
>   Subf <- function(blah) {
>     ....
>   }
>   Subg <- function(bar) {
>     ....
>   }
>   Main <- function() {
>     ....
>   }
>   return(list(subf=subf, subg=subg, foo=foo))
> }
> ################### End examples ####################
> 
> Which of these ways is best?  Style 2 seems at first to be the most
> natural in R, but I found there are some major drawbacks.  First, it
> is hard to debug.  For instance, if I want to debug Subf, I need to
> first "debug(Foo)" and then while Foo is debugging, type
> "debug(Subf)".  
You can use setBreakpoint to set a breakpoint in the nested functions, 
and it will exist in all invocations of Foo (which each create new 
instances of the nested functions).  debug() is not the only debugging tool.

Another big limitation is that I can't write> test-cases (e.g. using RUnit) for Subf and Subg because they aren't
> visible in any way at the global level.
> 
> For these reasons, style 1 seems to be better than style 2, if less
> elegant.  However, style 1 can get awkward because any parameters
> passed to the main function are not visible to the others.  In the
> above case, the value of "x" must be passed to Foo.subf and
Foo.subg
> explicitly.  Also there is no enforcement of code isolation
> (i.e. anyone can call Foo.subf).
> 
> Style 3 is more explicitly object oriented.  It has the advantage of
> style 2 in that you don't need to pass x around, and the advantage of
> style 1 in that you can still write tests and easily debug the
> subfunctions.  However to actually call the main function you have to
> type "Foo(x)$Main()" instead of "Foo(x)", or else write
a wrapper
> function for this.  Either way there is more typing.
> 
> So anyway, what is the best way to handle this?  R does not seem to
> have a good way of managing namespaces or avoiding collisions, like a
> module system or explicit object-orientation. 
Packages are self-contained modules.  You don't get collisions between 
names of locals between packages, and if they export the same name, 
other packages can explicitly select which export to use.

  How should we get> around this limitation?  I've looked at sample R code in the
> distribution and elsewhere, but so far it's been pretty
> disappointing---most people seem to write very long, hard to
> understand functions.
I would normally use a mixture of styles 1 and 2.  Use style 2 for 
functions that really do need access to Foo locals, and use style 1 for 
self-contained functions.

Duncan Murdoch

Gabor Grothendieck

2010-Feb-23 03:15 UTC

head link

[Rd] Best style to organize code, namespaces

As you mention ease of debugging basically precludes subfunctions so
style 1 is left.

Functions can be nested in environments rather than in other functions
and this will allow debugging to still occur.

The proto package which makes it particularly convenient to nest
functions in environments giving an analog to #3 while still allowing
debugging.  See http//:r-proto.googlecode.com
> library(proto)
> # p is proto object with variable a and method f
> p <- proto(a = 1, f = function(., x = 1) .$a <- .$a + 1)
> with(p, debug(f))
> p$f()debugging in: get("f", env = p, inherits = TRUE)(p, ...)
debug: .$a <- .$a + 1
Browse[2]>
exiting from: get("f", env = p, inherits = TRUE)(p, ...)
[1] 2> p$a[1] 2


On Mon, Feb 22, 2010 at 9:49 PM, Ben <misc7 at emerose.org>
wrote:> Hi all,
>
> I'm hoping someone could tell me what best practices are as far as
> keeping programs organized in R. ?In most languages, I like to keep
> things organized by writing small functions. ?So, suppose I want to
> write a function that would require helper functions or would just be
> too big to write in one piece. ?Below are three ways to do this:
>
>
> ################### Style 1 (C-style) ###############
> Foo <- function(x) {
> ?....
> }
> Foo.subf <- function(x, blah) {
> ?....
> }
> Foo.subg <- function(x, bar) {
> ?....
> }
>
> ################### Style 2 (Lispish?) ##############
> Foo <- function(x) {
> ?Subf <- function(blah) {
> ? ?....
> ?}
> ?Subg <- function(bar) {
> ? ?....
> ?}
> ?....
> }
>
> ################### Object-Oriented #################
> Foo <- function(x) {
> ?Subf <- function(blah) {
> ? ?....
> ?}
> ?Subg <- function(bar) {
> ? ?....
> ?}
> ?Main <- function() {
> ? ?....
> ?}
> ?return(list(subf=subf, subg=subg, foo=foo))
> }
> ################### End examples ####################
>
> Which of these ways is best? ?Style 2 seems at first to be the most
> natural in R, but I found there are some major drawbacks. ?First, it
> is hard to debug. ?For instance, if I want to debug Subf, I need to
> first "debug(Foo)" and then while Foo is debugging, type
> "debug(Subf)". ?Another big limitation is that I can't write
> test-cases (e.g. using RUnit) for Subf and Subg because they aren't
> visible in any way at the global level.
>
> For these reasons, style 1 seems to be better than style 2, if less
> elegant. ?However, style 1 can get awkward because any parameters
> passed to the main function are not visible to the others. ?In the
> above case, the value of "x" must be passed to Foo.subf and
Foo.subg
> explicitly. ?Also there is no enforcement of code isolation
> (i.e. anyone can call Foo.subf).
>
> Style 3 is more explicitly object oriented. ?It has the advantage of
> style 2 in that you don't need to pass x around, and the advantage of
> style 1 in that you can still write tests and easily debug the
> subfunctions. ?However to actually call the main function you have to
> type "Foo(x)$Main()" instead of "Foo(x)", or else write
a wrapper
> function for this. ?Either way there is more typing.
>
> So anyway, what is the best way to handle this? ?R does not seem to
> have a good way of managing namespaces or avoiding collisions, like a
> module system or explicit object-orientation. ?How should we get
> around this limitation? ?I've looked at sample R code in the
> distribution and elsewhere, but so far it's been pretty
> disappointing---most people seem to write very long, hard to
> understand functions.
>
> Thanks for any advice!
>
> --
> Ben
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Mark.Bravington at csiro.au

2010-Feb-23 04:27 UTC

head link

[Rd] Best style to organize code, namespaces

Ben--

FWIW my general take on this is:

 - Namespaces solve the collision issue.

 - Style 2 tends to make for unreadably long code inside Foo, unless the
subfunctions are really short.

 - Style 3 is too hard to work with

 - So I usually use a variant on style 1:

################### Style 4 (mlocal-style) ############### 
Foo <-  function(x) { ....
   initialize.Foo()
}

initialize.Foo <- function( nlocal=sys.parent()) mlocal({
  ....
})

The 'mlocal' call means that code in the body of
'initialize.Foo' executes directly in the environment of 'Foo',
or wherever it's called from-- it doesn't get its own private
environment, and automatically reads/writes/creates variables in 'Foo'.
However, you can still pass parameters that are private to
'initialize.Foo', though you may not need any. The 'debug'
package will handle 'mlocal' functions without any trouble. One downside
might be that you can't (or shouldn't) call 'initialize.Foo'
directly. Another is if your sub-function creates a lot of junk variables that
you really don't want in 'Foo'-- obviously that's exactly what
you want from an initialization function, but not necessarily in general.

 - Sometimes (style 5) I define the subfunctions externally to 'Foo' but
not as 'mlocal's, and then inside 'Foo' I do

subf <- subf
environment( subf) <- environment()

just as if I'd inserted the definition of 'subf' into 'Foo'.
This is like style 2, but keeps the 'Foo' code short, and lets me set up
debugging externally.

 - If you use style 2, you can still automatically set up the 'debug'
package's debugging on 'Subf' by:

mtrace( Foo)
bp( fname='Foo', 1, FALSE) # don't stop at line 1
bp( fname='Foo', 2, { mtrace( Subf); FALSE}) #  set the breakpoint in
'Subf,' and then carry on in 'Foo' without stopping

You won't have to intervene manually when 'Foo' runs. However, this
may slow down 'Foo' itself, and does require you to know a line number
after the definition of 'Subf'.

No doubt there are many other approaches...

Mark

-- 
Mark Bravington
CSIRO Mathematical & Information Sciences
Marine Laboratory
Castray Esplanade
Hobart 7001
TAS

ph (+61) 3 6232 5118
fax (+61) 3 6232 5012
mob (+61) 438 315 623

Ben wrote:> Hi all,
> 
> I'm hoping someone could tell me what best practices are as far as
> keeping programs organized in R.  In most languages, I like to keep
> things organized by writing small functions.  So, suppose I want to
> write a function that would require helper functions or would just be
> too big to write in one piece.  Below are three ways to do this:    
> 
> 
> ################### Style 1 (C-style) ############### Foo <-
>   function(x) { ....
> }
> Foo.subf <- function(x, blah) {
>   ....
> }
> Foo.subg <- function(x, bar) {
>   ....
> }
> 
> ################### Style 2 (Lispish?) ############## Foo <-
>   function(x) { Subf <- function(blah) {
>     ....
>   }
>   Subg <- function(bar) {
>     ....
>   }
>   ....
> }
> 
> ################### Object-Oriented ################# Foo <-
>   function(x) { Subf <- function(blah) {
>     ....
>   }
>   Subg <- function(bar) {
>     ....
>   }
>   Main <- function() {
>     ....
>   }
>   return(list(subf=subf, subg=subg, foo=foo)) } ###################
> End examples #################### 
> 
> Which of these ways is best?  Style 2 seems at first to be the most
> natural in R, but I found there are some major drawbacks.  First, it
> is hard to debug.  For instance, if I want to debug Subf, I need to
> first "debug(Foo)" and then while Foo is debugging, type
> "debug(Subf)".  Another big limitation is that I can't write
> test-cases (e.g. using RUnit) for Subf and Subg because they aren't
> visible in any way at the global level.      
> 
> For these reasons, style 1 seems to be better than style 2, if less
> elegant.  However, style 1 can get awkward because any parameters
> passed to the main function are not visible to the others.  In the
> above case, the value of "x" must be passed to Foo.subf and
Foo.subg
> explicitly.  Also there is no enforcement of code isolation (i.e.
> anyone can call Foo.subf).     
> 
> Style 3 is more explicitly object oriented.  It has the advantage of
> style 2 in that you don't need to pass x around, and the advantage of
> style 1 in that you can still write tests and easily debug the
> subfunctions.  However to actually call the main function you have to
> type "Foo(x)$Main()" instead of "Foo(x)", or else write
a wrapper
> function for this.  Either way there is more typing.     
> 
> So anyway, what is the best way to handle this?  R does not seem to
> have a good way of managing namespaces or avoiding collisions, like a
> module system or explicit object-orientation.  How should we get
> around this limitation?  I've looked at sample R code in the
> distribution and elsewhere, but so far it's been pretty
> disappointing---most people seem to write very long, hard to
> understand functions.      
> 
> Thanks for any advice!

Reasonably Related Threads

Search for more possibly parallel threads

R devel - Feb 2010 - Best style to organize code, namespaces

[Rd] Best style to organize code, namespaces

[Rd] Best style to organize code, namespaces

[Rd] Best style to organize code, namespaces

[Rd] Best style to organize code, namespaces

Reasonably Related Threads