Douglas Bates
2007-Feb-08 23:00 UTC
[R] Timings of function execution in R [was Re: R in Industry]
On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner) <stefan.albrecht at apep.com> wrote:> Dear all, > > Thanks a lot for your comments. > > I very well agree with you that writing efficient code is about optimisation. The most important rules I know would be: > - vectorization > - pre-definition of vectors, etc. > - use matrix instead of data.frame > - do not use named objects > - use pure matrix instead of involved S4 (perhaps also S3) objects (can have enormous effects) > - use function instead of expression > - use compiled code > - I guess indexing with numbers (better variables) is also much faster than with text (names) (see also above) > - I even made, for example, my own min, max, since they are slow, e.g., > > greaterOf <- function(x, y){ > # Returns for each element of x and y (numeric) > # x or y may be a multiple of the other > z <- x > y > z*x + (!z)*yThat's an interesting function. I initially was tempted to respond that you have managed to reinvent a specialized form of the ifelse function but then I decided to do the timings just to check (always a good idea). The enclosed timings show that your function is indeed faster than a call to ifelse. A couple of comments: - I needed to make the number of components in the vectors x and y quite large before I could get reliable timings on the system I am using. - The recommended way of doing timings is with system.time function, which makes an effort to minimize the effects of garbage collection on the timings. - Even when using system.time there is often a big difference in timing between the first execution of a function call that generates a large object and subsequent executions of the same function call. [additional parts of the original message not relevant to this discussion have been removed] -------------- next part --------------> x <- rnorm(1000000) > y <- rnorm(1000000) > system.time(r1 <- greaterOf(x, y))user system elapsed 0.255 0.023 0.278> system.time(r1 <- greaterOf(x, y))user system elapsed 0.054 0.029 0.084> system.time(r1 <- greaterOf(x, y))user system elapsed 0.057 0.028 0.086> system.time(r1 <- greaterOf(x, y))user system elapsed 0.083 0.040 0.124> system.time(r1 <- greaterOf(x, y))user system elapsed 0.099 0.026 0.124> system.time(r2 <- ifelse(x > y, x, y))user system elapsed 0.805 0.109 0.913> system.time(r2 <- ifelse(x > y, x, y))user system elapsed 0.723 0.113 0.835> system.time(r2 <- ifelse(x > y, x, y))user system elapsed 0.641 0.116 0.757> system.time(r2 <- ifelse(x > y, x, y))user system elapsed 0.647 0.111 0.757> all.equal(r1,r2)[1] TRUE
Gabor Grothendieck
2007-Feb-08 23:29 UTC
[R] Timings of function execution in R [was Re: R in Industry]
This may not be exactly the same to the last decimal but is nearly twice as fast again:> set.seed(1) > n <- 1000000 > x <- rnorm(n) > y <- rnorm(n) > system.time({z <- x > y; z*x+(!z)*y})user system elapsed 0.64 0.08 0.72> system.time({z <- x > y; z * (x-y) + y})user system elapsed 0.35 0.04 0.39 On 2/8/07, Douglas Bates <bates at stat.wisc.edu> wrote:> On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner) > <stefan.albrecht at apep.com> wrote: > > Dear all, > > > > Thanks a lot for your comments. > > > > I very well agree with you that writing efficient code is about optimisation. The most important rules I know would be: > > - vectorization > > - pre-definition of vectors, etc. > > - use matrix instead of data.frame > > - do not use named objects > > - use pure matrix instead of involved S4 (perhaps also S3) objects (can have enormous effects) > > - use function instead of expression > > - use compiled code > > - I guess indexing with numbers (better variables) is also much faster than with text (names) (see also above) > > - I even made, for example, my own min, max, since they are slow, e.g., > > > > greaterOf <- function(x, y){ > > # Returns for each element of x and y (numeric) > > # x or y may be a multiple of the other > > z <- x > y > > z*x + (!z)*y > > That's an interesting function. I initially was tempted to respond > that you have managed to reinvent a specialized form of the ifelse > function but then I decided to do the timings just to check (always a > good idea). The enclosed timings show that your function is indeed > faster than a call to ifelse. A couple of comments: > > - I needed to make the number of components in the vectors x and y > quite large before I could get reliable timings on the system I am > using. > > - The recommended way of doing timings is with system.time function, > which makes an effort to minimize the effects of garbage collection on > the timings. > > - Even when using system.time there is often a big difference in > timing between the first execution of a function call that generates a > large object and subsequent executions of the same function call. > > [additional parts of the original message not relevant to this > discussion have been removed] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >
Ravi Varadhan
2007-Feb-08 23:41 UTC
[R] Timings of function execution in R [was Re: R in Industry]
Hi, "greaterOf" is indeed an interesting function. It is much faster than the equivalent R function, "pmax", because pmax does a lot of checking for missing data and for recycling. Tom Lumley suggested a simple function to replace pmax, without these checks, that is analogous to greaterOf, which I call fast.pmax. fast.pmax <- function(x,y) {i<- x<y; x[i]<-y[i]; x} Interestingly, greaterOf is even faster than fast.pmax, although you have to be dealing with very large vectors (O(10^6)) to see any real difference.> n <- 2000000 > > x1 <- runif(n) > x2 <- rnorm(n) > system.time( ans1 <- greaterOf(x1,x2) )[1] 0.17 0.06 0.23 NA NA> system.time( ans2 <- pmax(x1,x2) )[1] 0.72 0.19 0.94 NA NA> system.time( ans3 <- fast.pmax(x1,x2) )[1] 0.29 0.05 0.35 NA NA> > all.equal(ans1,ans2,ans3)[1] TRUE Ravi. ---------------------------------------------------------------------------- ------- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: rvaradhan at jhmi.edu Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html ---------------------------------------------------------------------------- -------- -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Douglas Bates Sent: Thursday, February 08, 2007 6:00 PM To: R-Help Subject: [R] Timings of function execution in R [was Re: R in Industry] On 2/8/07, Albrecht, Dr. Stefan (AZ Private Equity Partner) <stefan.albrecht at apep.com> wrote:> Dear all, > > Thanks a lot for your comments. > > I very well agree with you that writing efficient code is aboutoptimisation. The most important rules I know would be:> - vectorization > - pre-definition of vectors, etc. > - use matrix instead of data.frame > - do not use named objects > - use pure matrix instead of involved S4 (perhaps also S3) objects (canhave enormous effects)> - use function instead of expression > - use compiled code > - I guess indexing with numbers (better variables) is also much fasterthan with text (names) (see also above)> - I even made, for example, my own min, max, since they are slow, e.g., > > greaterOf <- function(x, y){ > # Returns for each element of x and y (numeric) > # x or y may be a multiple of the other > z <- x > y > z*x + (!z)*yThat's an interesting function. I initially was tempted to respond that you have managed to reinvent a specialized form of the ifelse function but then I decided to do the timings just to check (always a good idea). The enclosed timings show that your function is indeed faster than a call to ifelse. A couple of comments: - I needed to make the number of components in the vectors x and y quite large before I could get reliable timings on the system I am using. - The recommended way of doing timings is with system.time function, which makes an effort to minimize the effects of garbage collection on the timings. - Even when using system.time there is often a big difference in timing between the first execution of a function call that generates a large object and subsequent executions of the same function call. [additional parts of the original message not relevant to this discussion have been removed]