> From: Barry Rowlingson
>
> Jan T. Kim wrote:
>
> > Generally, I fully agree -- modular coding is good, not only in R.
> > However, with regard to execution time, modularisation that involves
> > passing of large amounts of data (100 x 1000 data frames etc.) can
> > cause problems.
>
> I've just tried a few simple examples of throwing biggish
> (3000x3000)
> matrices around and haven't encountered any pathological
> behaviour yet.
> I tried modifying the matrices within the functions, tried
> looping a few
> thousand times to estimate the matrix passing overhead, and in most
> cases the modular version run pretty much as fast as - or
> occasionally
> faster than - the inline version. There was some variability
> in CPU time
> taken, probably due to garbage collection.
>
> Does anyone have a simple example where passing large data
> sets causes
> a huge increase in CPU time? I think R is pretty smart with its
> parameter passing these days - anyone who thinks its still like Splus
> version 2.3 should update their brains to the 21st Century.
I think one example of this is using the formula interface to fit models on
large data sets, especially those with tons of variables. Some model
fitting functions have the default interface f(x, y, ...), along with a
formula method f(formula, data, ...). If x has lots of variables (say over
1000), using the formula interface can take several times longer than
calling the "raw" interface directly.
Andy
> Baz
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>