thr3ads.net - R help - [R] Timings of function execution in R [was Re: R in Industry] [Feb 2007]

If this information is useful, please help other people find it:
Share via:

Douglas Bates

2007-Feb-08 23:00 UTC

[R] Timings of function execution in R [was Re: R in Industry]

On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner)
<stefan.albrecht at apep.com> wrote:> Dear all,
>
> Thanks a lot for your comments.
>
> I very well agree with you that writing efficient code is about
optimisation. The most important rules I know would be:
> - vectorization
> - pre-definition of vectors, etc.
> - use matrix instead of data.frame
> - do not use named objects
> - use pure matrix instead of involved S4 (perhaps also S3) objects (can
have enormous effects)
> - use function instead of expression
> - use compiled code
> - I guess indexing with numbers (better variables) is also much faster than
with text (names) (see also above)
> - I even made, for example, my own min, max, since they are slow, e.g.,
>
> greaterOf <- function(x, y){
> # Returns for each element of x and y (numeric)
> # x or y may be a multiple of the other
>   z <- x > y
>   z*x + (!z)*y
That's an interesting function.  I initially was tempted to respond
that you have managed to reinvent a specialized form of the ifelse
function but then I decided to do the timings just to check (always a
good idea).  The enclosed timings show that your function is indeed
faster than a call to ifelse.  A couple of comments:

- I needed to make the number of components in the vectors x and y
quite large before I could  get reliable timings on the system I am
using.

- The recommended way of doing timings is with system.time function,
which makes an effort to minimize the effects of garbage collection on
the timings.

- Even when using system.time there is often a big difference in
timing between the first execution of a function call that generates a
large object and subsequent executions of the same function call.

[additional parts of the original message not relevant to this
discussion have been removed]
-------------- next part --------------> x <- rnorm(1000000)
> y <- rnorm(1000000)
> system.time(r1 <- greaterOf(x, y))   user  system elapsed 
  0.255   0.023   0.278 > system.time(r1 <- greaterOf(x, y))   user  system elapsed 
  0.054   0.029   0.084 > system.time(r1 <- greaterOf(x, y))   user  system elapsed 
  0.057   0.028   0.086 > system.time(r1 <- greaterOf(x, y))   user  system elapsed 
  0.083   0.040   0.124 > system.time(r1 <- greaterOf(x, y))   user  system elapsed 
  0.099   0.026   0.124 > system.time(r2 <- ifelse(x > y, x, y))   user  system elapsed 
  0.805   0.109   0.913 > system.time(r2 <- ifelse(x > y, x, y))   user  system elapsed 
  0.723   0.113   0.835 > system.time(r2 <- ifelse(x > y, x, y))   user  system elapsed 
  0.641   0.116   0.757 > system.time(r2 <- ifelse(x > y, x, y))   user  system elapsed 
  0.647   0.111   0.757 > all.equal(r1,r2)[1] TRUE

Gabor Grothendieck

2007-Feb-08 23:29 UTC

head link

[R] Timings of function execution in R [was Re: R in Industry]

This may not be exactly the same to the last decimal but is nearly
twice as fast again:
> set.seed(1)
> n <- 1000000
> x <- rnorm(n)
> y <- rnorm(n)
> system.time({z <- x > y; z*x+(!z)*y})   user  system elapsed
   0.64    0.08    0.72> system.time({z <- x > y; z * (x-y) + y})   user  system elapsed
   0.35    0.04    0.39

On 2/8/07, Douglas Bates <bates at stat.wisc.edu>
wrote:> On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner)
> <stefan.albrecht at apep.com> wrote:
> > Dear all,
> >
> > Thanks a lot for your comments.
> >
> > I very well agree with you that writing efficient code is about
optimisation. The most important rules I know would be:
> > - vectorization
> > - pre-definition of vectors, etc.
> > - use matrix instead of data.frame
> > - do not use named objects
> > - use pure matrix instead of involved S4 (perhaps also S3) objects
(can have enormous effects)
> > - use function instead of expression
> > - use compiled code
> > - I guess indexing with numbers (better variables) is also much faster
than with text (names) (see also above)
> > - I even made, for example, my own min, max, since they are slow,
e.g.,
> >
> > greaterOf <- function(x, y){
> > # Returns for each element of x and y (numeric)
> > # x or y may be a multiple of the other
> >   z <- x > y
> >   z*x + (!z)*y
>
> That's an interesting function.  I initially was tempted to respond
> that you have managed to reinvent a specialized form of the ifelse
> function but then I decided to do the timings just to check (always a
> good idea).  The enclosed timings show that your function is indeed
> faster than a call to ifelse.  A couple of comments:
>
> - I needed to make the number of components in the vectors x and y
> quite large before I could  get reliable timings on the system I am
> using.
>
> - The recommended way of doing timings is with system.time function,
> which makes an effort to minimize the effects of garbage collection on
> the timings.
>
> - Even when using system.time there is often a big difference in
> timing between the first execution of a function call that generates a
> large object and subsequent executions of the same function call.
>
> [additional parts of the original message not relevant to this
> discussion have been removed]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

Ravi Varadhan

2007-Feb-08 23:41 UTC

head link

[R] Timings of function execution in R [was Re: R in Industry]

Hi,

"greaterOf" is indeed an interesting function.  It is much faster than
the
equivalent R function, "pmax", because pmax does a lot of checking for
missing data and for recycling.  Tom Lumley suggested a simple function to
replace pmax, without these checks, that is analogous to greaterOf, which I
call fast.pmax.  

fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x}

Interestingly, greaterOf is even faster than fast.pmax, although you have to
be dealing with very large vectors (O(10^6)) to see any real difference.
> n <- 2000000
> 
>  x1 <- runif(n)
>  x2 <- rnorm(n)
> system.time( ans1 <- greaterOf(x1,x2) )
[1] 0.17 0.06 0.23   NA   NA> system.time( ans2 <- pmax(x1,x2) )
[1] 0.72 0.19 0.94   NA   NA> system.time( ans3 <- fast.pmax(x1,x2) )
[1] 0.29 0.05 0.35   NA   NA>  
> all.equal(ans1,ans2,ans3)[1] TRUE


Ravi.

----------------------------------------------------------------------------
-------

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: rvaradhan at jhmi.edu

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 

----------------------------------------------------------------------------
--------


-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Douglas Bates
Sent: Thursday, February 08, 2007 6:00 PM
To: R-Help
Subject: [R] Timings of function execution in R [was Re: R in Industry]

On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner)
<stefan.albrecht at apep.com> wrote:> Dear all,
>
> Thanks a lot for your comments.
>
> I very well agree with you that writing efficient code is about
optimisation. The most important rules I know would be:> - vectorization
> - pre-definition of vectors, etc.
> - use matrix instead of data.frame
> - do not use named objects
> - use pure matrix instead of involved S4 (perhaps also S3) objects (can
have enormous effects)> - use function instead of expression
> - use compiled code
> - I guess indexing with numbers (better variables) is also much faster
than with text (names) (see also above)> - I even made, for example, my own min, max, since they are slow, e.g.,
>
> greaterOf <- function(x, y){
> # Returns for each element of x and y (numeric)
> # x or y may be a multiple of the other
>   z <- x > y
>   z*x + (!z)*y
That's an interesting function.  I initially was tempted to respond
that you have managed to reinvent a specialized form of the ifelse
function but then I decided to do the timings just to check (always a
good idea).  The enclosed timings show that your function is indeed
faster than a call to ifelse.  A couple of comments:

- I needed to make the number of components in the vectors x and y
quite large before I could  get reliable timings on the system I am
using.

- The recommended way of doing timings is with system.time function,
which makes an effort to minimize the effects of garbage collection on
the timings.

- Even when using system.time there is often a big difference in
timing between the first execution of a function call that generates a
large object and subsequent executions of the same function call.

[additional parts of the original message not relevant to this
discussion have been removed]

Seemingly Similar Threads

Search for more reasonably related threads

R help - Feb 2007 - Timings of function execution in R [was Re: R in Industry]

[R] Timings of function execution in R [was Re: R in Industry]

[R] Timings of function execution in R [was Re: R in Industry]

[R] Timings of function execution in R [was Re: R in Industry]

Seemingly Similar Threads