thr3ads.net - R devel - [Rd] Speed up code, profiling, optimization, lapply vs. loops [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Thorn Thaler

2009-Jul-06 08:26 UTC

[Rd] Speed up code, profiling, optimization, lapply vs. loops

High everybody,

currently I'm writinig a package that, for a given family of variance 
functions depending on a parameter theta, say, computes the extended 
quasi likelihood (eql) function for different values of theta.

The computation involves a couple of calls of the 'glm' routine. What 
I'm doing now is to call 'lapply' for a list of theta values and a 
function, that constructs a family object for the particular choice of 
theta, computes the glm and uses the results to get the eql. Not 
surprisingly the function is not very fast. Depending on the size of the 
parameter space under consideration it takes a couple of minutes until 
the function finishes. Testing ~1000 Parameters takes about 5 minutes on 
my machine.

I know that loops in R are slow more often than not. Thus, I thought 
using 'lapply' is a better way. But anyways, it is just another way of a
loop. Besides, it involves some overhead for the function call and hence 
i'm not sure wheter using 'lapply' is really the better choice.

What I like to know is to figure out, where the bottleneck lies. 
Vectorization would help, but since I don't think that there is 
vectorized 'glm' function, which is able to handle a vector of family 
objects. I'm not aware if there is any choice aside from using a loop.

So my questions:
- how can I figure out where the bottleneck lies?
- is 'lapply' always superior to a loop in terms of execution time?
- are there any 'evil' commands that should be avoided in a loop, for 
they slow down the computation?
- are there any good books, tutorials about how to profile R code 
efficiently?

TIA 4 ur help,

Thorn

Søren Højsgaard

2009-Jul-06 08:37 UTC

head link

[Rd] Speed up code, profiling, optimization, lapply vs. loops

If you do

Rprof()
<your code>
Rprof(NULL)
summaryRprof()

then you'll get a listing of how much time is spent in each function of
<your code>.

Regards
S?ren


________________________________________
Fra: r-devel-bounces at r-project.org [r-devel-bounces at r-project.org]
P&#229; vegne af Thorn Thaler [thothal at sbox.tugraz.at]
Sendt: 6. juli 2009 10:26
Til: r-devel at r-project.org
Emne: [Rd] Speed up code, profiling, optimization, lapply vs. loops

High everybody,

currently I'm writinig a package that, for a given family of variance
functions depending on a parameter theta, say, computes the extended
quasi likelihood (eql) function for different values of theta.

The computation involves a couple of calls of the 'glm' routine. What
I'm doing now is to call 'lapply' for a list of theta values and a
function, that constructs a family object for the particular choice of
theta, computes the glm and uses the results to get the eql. Not
surprisingly the function is not very fast. Depending on the size of the
parameter space under consideration it takes a couple of minutes until
the function finishes. Testing ~1000 Parameters takes about 5 minutes on
my machine.

I know that loops in R are slow more often than not. Thus, I thought
using 'lapply' is a better way. But anyways, it is just another way of a
loop. Besides, it involves some overhead for the function call and hence
i'm not sure wheter using 'lapply' is really the better choice.

What I like to know is to figure out, where the bottleneck lies.
Vectorization would help, but since I don't think that there is
vectorized 'glm' function, which is able to handle a vector of family
objects. I'm not aware if there is any choice aside from using a loop.

So my questions:
- how can I figure out where the bottleneck lies?
- is 'lapply' always superior to a loop in terms of execution time?
- are there any 'evil' commands that should be avoided in a loop, for
they slow down the computation?
- are there any good books, tutorials about how to profile R code
efficiently?

TIA 4 ur help,

Thorn

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Roger Peng

2009-Jul-06 20:42 UTC

head link

[Rd] Speed up code, profiling, optimization, lapply vs. loops

My advice would be to use the profiler 'Rprof()' --- you may find that
the loop is not really the problem. In my experience, there's
relatively little difference between 'lapply' and a 'for' loop,
although 'lapply' can be faster at times.

-roger

On Mon, Jul 6, 2009 at 4:26 AM, Thorn Thaler<thothal at sbox.tugraz.at>
wrote:> High everybody,
>
> currently I'm writinig a package that, for a given family of variance
> functions depending on a parameter theta, say, computes the extended quasi
> likelihood (eql) function for different values of theta.
>
> The computation involves a couple of calls of the 'glm' routine.
What I'm
> doing now is to call 'lapply' for a list of theta values and a
function,
> that constructs a family object for the particular choice of theta,
computes
> the glm and uses the results to get the eql. Not surprisingly the function
> is not very fast. Depending on the size of the parameter space under
> consideration it takes a couple of minutes until the function finishes.
> Testing ~1000 Parameters takes about 5 minutes on my machine.
>
> I know that loops in R are slow more often than not. Thus, I thought using
> 'lapply' is a better way. But anyways, it is just another way of a
loop.
> Besides, it involves some overhead for the function call and hence i'm
not
> sure wheter using 'lapply' is really the better choice.
>
> What I like to know is to figure out, where the bottleneck lies.
> Vectorization would help, but since I don't think that there is
vectorized
> 'glm' function, which is able to handle a vector of family objects.
I'm not
> aware if there is any choice aside from using a loop.
>
> So my questions:
> - how can I figure out where the bottleneck lies?
> - is 'lapply' always superior to a loop in terms of execution time?
> - are there any 'evil' commands that should be avoided in a loop,
for they
> slow down the computation?
> - are there any good books, tutorials about how to profile R code
> efficiently?
>
> TIA 4 ur help,
>
> Thorn
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Roger D. Peng  |  http://www.biostat.jhsph.edu/~rpeng/

Kasper Daniel Hansen

2009-Jul-07 04:53 UTC

head link

[Rd] Speed up code, profiling, optimization, lapply vs. loops

Aside from the advice from other people, you seem to be doing many glm  
calls. A big part of a call to a model function involves setting up  
the design matrix, check for missing values etc. If I understand you  
description correctly you may only need to do this once. This will  
require some poking around in glm, but might save you a lot of time.

Kasper

On Jul 6, 2009, at 1:26 , Thorn Thaler wrote:
> High everybody,
>
> currently I'm writinig a package that, for a given family of  
> variance functions depending on a parameter theta, say, computes the  
> extended quasi likelihood (eql) function for different values of  
> theta.
>
> The computation involves a couple of calls of the 'glm' routine.  
> What I'm doing now is to call 'lapply' for a list of theta
values
> and a function, that constructs a family object for the particular  
> choice of theta, computes the glm and uses the results to get the  
> eql. Not surprisingly the function is not very fast. Depending on  
> the size of the parameter space under consideration it takes a  
> couple of minutes until the function finishes. Testing ~1000  
> Parameters takes about 5 minutes on my machine.
>
> I know that loops in R are slow more often than not. Thus, I thought  
> using 'lapply' is a better way. But anyways, it is just another way
> of a loop. Besides, it involves some overhead for the function call  
> and hence i'm not sure wheter using 'lapply' is really the
better
> choice.
>
> What I like to know is to figure out, where the bottleneck lies.  
> Vectorization would help, but since I don't think that there is  
> vectorized 'glm' function, which is able to handle a vector of  
> family objects. I'm not aware if there is any choice aside from  
> using a loop.
>
> So my questions:
> - how can I figure out where the bottleneck lies?
> - is 'lapply' always superior to a loop in terms of execution time?
> - are there any 'evil' commands that should be avoided in a loop,  
> for they slow down the computation?
> - are there any good books, tutorials about how to profile R code  
> efficiently?
>
> TIA 4 ur help,
>
> Thorn
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Maybe Matching Threads

Search for more possibly parallel threads

R devel - Jul 2009 - Speed up code, profiling, optimization, lapply vs. loops

[Rd] Speed up code, profiling, optimization, lapply vs. loops

[Rd] Speed up code, profiling, optimization, lapply vs. loops

[Rd] Speed up code, profiling, optimization, lapply vs. loops

[Rd] Speed up code, profiling, optimization, lapply vs. loops

Maybe Matching Threads