thr3ads.net - R help - [R] Quirks with system.time and simulations [Jun 2004]

If this information is useful, please help other people find it:
Share via:

Patrick Connolly

2004-Jun-14 00:33 UTC

[R] Quirks with system.time and simulations

I tried the code that Richard O'Keefe posted last week, to wit:

library(chron)
    ymd.to.POSIXlt <-
        function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
    n <- 100000
    y <- sample(1970:2004, n, replace=TRUE)
    m <- sample(1:12,      n, replace=TRUE)
    d <- sample(1:28,      n, replace=TRUE)
    system.time(ymd.to.POSIXlt(y, m, d))
    [1]  8.78  0.10 31.76  0.00  0.00
    system.time(as.POSIXlt(paste(y,m,d, sep="-")))
    [1] 14.64  0.13 53.30  0.00  0.00


On a somewhat newer machine, I got

$ R --vanilla

R : Copyright 2004, The R Foundation for Statistical Computing
Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3

[...]

> library(chron)
>     ymd.to.POSIXlt <-+         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m,
d=d)))>     n <- 100000
>     y <- sample(1970:2004, n, replace=TRUE)
>     m <- sample(1:12,      n, replace=TRUE)
>     d <- sample(1:28,      n, replace=TRUE)
> 
> system.time(ymd.to.POSIXlt(y, m, d))
[1] 1.67 0.24 2.01 0.00 0.00> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
[1] 3.06 0.02 3.08 0.00 0.00> 
But then I tried a few more times...
> system.time(ymd.to.POSIXlt(y, m, d))
[1] 1.09 0.04 1.13 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))
[1] 1.11 0.09 1.20 0.00 0.00>
The second time is a lot faster, but subsequent ones don't "improve
further".
'
But with the "standard" function,
> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
[1] 2.64 0.02 2.66 0.00 0.00> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
[1] 2.82 0.03 2.85 0.00 0.00>... it does improve slightly but rather a lot less.


THEN

If I compare the two methods in the reverse order,


$ R --vanilla

R : Copyright 2004, The R Foundation for Statistical Computing
Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3

[....]

> library(chron)
>     ymd.to.POSIXlt <-+         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m,
d=d)))>     n <- 100000
>     y <- sample(1970:2004, n, replace=TRUE)
>     m <- sample(1:12,      n, replace=TRUE)
>     d <- sample(1:28,      n, replace=TRUE)
> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
[1] 3.66 0.02 3.76 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))
[1] 1.65 0.05 1.70 0.00 0.00> 
> 
> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
[1] 2.59 0.02 2.61 0.00 0.00> system.time(as.POSIXlt(paste(y,m,d, sep="-")))
[1] 2.73 0.00 2.74 0.00 0.00> 
> system.time(ymd.to.POSIXlt(y, m, d))
[1] 1.29 0.01 1.30 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))
[1] 0.94 0.00 0.94 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))
[1] 1.06 0.01 1.07 0.00 0.00> 

It seems as though the first simulation makes it "easier" for
subsequent simulations of the same type AND also for simulations of a
somewhat different type also.  The degree to which it "helps" varies
according to just what is being run (no surprise there).  What I can't
figure out is what is happening that makes it quicker for second and
subsequent runs.

I even tried doing a gc() and setting seeds before each run to make a
more direct comparison, but it made no difference other than being
slightly less variable.  I have seen a similar phenomenon in other
types of simulations.

In the case of this code, it makes no difference whether n is 100 or
10000000.  Would that be attibutable to lazy evaluation?

> version         _                
platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    9.0              
year     2004             
month    04               
day      12               
language R         


It's not exactly a problem, but it could have a bearing on comparing
processing times which is something that happens from time to time.
In the comparison that gave rise to the code above, the order would
have made a substantial difference to the perceived effectiveness of
Richard's code.


-- 
Patrick Connolly
HortResearch
Mt Albert
Auckland
New Zealand 
Ph: +64-9 815 4200 x 7188
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~

Roger D. Peng

2004-Jun-14 00:50 UTC

head link

[R] Quirks with system.time and simulations

I think the first time is potentially much slower because of a 
garbage collection.  R-devel has a flag `gcFirst' for 
system.time() which (I think) forces a garbage collection before 
timing.

-roger

Patrick Connolly wrote:> I tried the code that Richard O'Keefe posted last week, to wit:
> 
> library(chron)
>     ymd.to.POSIXlt <-
>         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
>     n <- 100000
>     y <- sample(1970:2004, n, replace=TRUE)
>     m <- sample(1:12,      n, replace=TRUE)
>     d <- sample(1:28,      n, replace=TRUE)
>     system.time(ymd.to.POSIXlt(y, m, d))
>     [1]  8.78  0.10 31.76  0.00  0.00
>     system.time(as.POSIXlt(paste(y,m,d, sep="-")))
>     [1] 14.64  0.13 53.30  0.00  0.00
> 
> 
> On a somewhat newer machine, I got
> 
> $ R --vanilla
> 
> R : Copyright 2004, The R Foundation for Statistical Computing
> Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
> 
> [...]
> 
> 
> 
>>library(chron)
>>    ymd.to.POSIXlt <-
> 
> +         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
> 
>>    n <- 100000
>>    y <- sample(1970:2004, n, replace=TRUE)
>>    m <- sample(1:12,      n, replace=TRUE)
>>    d <- sample(1:28,      n, replace=TRUE)
>>
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 1.67 0.24 2.01 0.00 0.00
> 
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> 
> [1] 3.06 0.02 3.08 0.00 0.00
> 
> 
> But then I tried a few more times...
> 
> 
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 1.09 0.04 1.13 0.00 0.00
> 
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 1.11 0.09 1.20 0.00 0.00
> 
> 
> The second time is a lot faster, but subsequent ones don't
"improve further".
> '
> But with the "standard" function,
> 
> 
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> 
> [1] 2.64 0.02 2.66 0.00 0.00
> 
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> 
> [1] 2.82 0.03 2.85 0.00 0.00
> 
> ... it does improve slightly but rather a lot less.
> 
> 
> THEN
> 
> If I compare the two methods in the reverse order,
> 
> 
> $ R --vanilla
> 
> R : Copyright 2004, The R Foundation for Statistical Computing
> Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
> 
> [....]
> 
> 
> 
>>library(chron)
>>    ymd.to.POSIXlt <-
> 
> +         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
> 
>>    n <- 100000
>>    y <- sample(1970:2004, n, replace=TRUE)
>>    m <- sample(1:12,      n, replace=TRUE)
>>    d <- sample(1:28,      n, replace=TRUE)
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> 
> [1] 3.66 0.02 3.76 0.00 0.00
> 
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 1.65 0.05 1.70 0.00 0.00
> 
>>
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> 
> [1] 2.59 0.02 2.61 0.00 0.00
> 
>>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> 
> [1] 2.73 0.00 2.74 0.00 0.00
> 
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 1.29 0.01 1.30 0.00 0.00
> 
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 0.94 0.00 0.94 0.00 0.00
> 
>>system.time(ymd.to.POSIXlt(y, m, d))
> 
> [1] 1.06 0.01 1.07 0.00 0.00
> 
> 
> 
> It seems as though the first simulation makes it "easier" for
> subsequent simulations of the same type AND also for simulations of a
> somewhat different type also.  The degree to which it "helps"
varies
> according to just what is being run (no surprise there).  What I can't
> figure out is what is happening that makes it quicker for second and
> subsequent runs.
> 
> I even tried doing a gc() and setting seeds before each run to make a
> more direct comparison, but it made no difference other than being
> slightly less variable.  I have seen a similar phenomenon in other
> types of simulations.
> 
> In the case of this code, it makes no difference whether n is 100 or
> 10000000.  Would that be attibutable to lazy evaluation?
> 
> 
> 
>>version
> 
>          _                
> platform i686-pc-linux-gnu
> arch     i686             
> os       linux-gnu        
> system   i686, linux-gnu  
> status                    
> major    1                
> minor    9.0              
> year     2004             
> month    04               
> day      12               
> language R         
> 
> 
> It's not exactly a problem, but it could have a bearing on comparing
> processing times which is something that happens from time to time.
> In the comparison that gave rise to the code above, the order would
> have made a substantial difference to the perceived effectiveness of
> Richard's code.
> 
> 
-- 
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/

Liaw, Andy

2004-Jun-14 01:24 UTC

head link

[R] Quirks with system.time and simulations

I wonder if there's also effect of cpu cache...

Andy
> From: Roger D. Peng
> 
> I think the first time is potentially much slower because of a 
> garbage collection.  R-devel has a flag `gcFirst' for 
> system.time() which (I think) forces a garbage collection before 
> timing.
> 
> -roger
> 
> Patrick Connolly wrote:
> > I tried the code that Richard O'Keefe posted last week, to wit:
> > 
> > library(chron)
> >     ymd.to.POSIXlt <-
> >         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
> >     n <- 100000
> >     y <- sample(1970:2004, n, replace=TRUE)
> >     m <- sample(1:12,      n, replace=TRUE)
> >     d <- sample(1:28,      n, replace=TRUE)
> >     system.time(ymd.to.POSIXlt(y, m, d))
> >     [1]  8.78  0.10 31.76  0.00  0.00
> >     system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> >     [1] 14.64  0.13 53.30  0.00  0.00
> > 
> > 
> > On a somewhat newer machine, I got
> > 
> > $ R --vanilla
> > 
> > R : Copyright 2004, The R Foundation for Statistical Computing
> > Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
> > 
> > [...]
> > 
> > 
> > 
> >>library(chron)
> >>    ymd.to.POSIXlt <-
> > 
> > +         function (y, m, d) as.POSIXlt(chron(julian(y=y, 
> x=m, d=d)))
> > 
> >>    n <- 100000
> >>    y <- sample(1970:2004, n, replace=TRUE)
> >>    m <- sample(1:12,      n, replace=TRUE)
> >>    d <- sample(1:28,      n, replace=TRUE)
> >>
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 1.67 0.24 2.01 0.00 0.00
> > 
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > 
> > [1] 3.06 0.02 3.08 0.00 0.00
> > 
> > 
> > But then I tried a few more times...
> > 
> > 
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 1.09 0.04 1.13 0.00 0.00
> > 
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 1.11 0.09 1.20 0.00 0.00
> > 
> > 
> > The second time is a lot faster, but subsequent ones don't 
> "improve further".
> > '
> > But with the "standard" function,
> > 
> > 
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > 
> > [1] 2.64 0.02 2.66 0.00 0.00
> > 
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > 
> > [1] 2.82 0.03 2.85 0.00 0.00
> > 
> > ... it does improve slightly but rather a lot less.
> > 
> > 
> > THEN
> > 
> > If I compare the two methods in the reverse order,
> > 
> > 
> > $ R --vanilla
> > 
> > R : Copyright 2004, The R Foundation for Statistical Computing
> > Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
> > 
> > [....]
> > 
> > 
> > 
> >>library(chron)
> >>    ymd.to.POSIXlt <-
> > 
> > +         function (y, m, d) as.POSIXlt(chron(julian(y=y, 
> x=m, d=d)))
> > 
> >>    n <- 100000
> >>    y <- sample(1970:2004, n, replace=TRUE)
> >>    m <- sample(1:12,      n, replace=TRUE)
> >>    d <- sample(1:28,      n, replace=TRUE)
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > 
> > [1] 3.66 0.02 3.76 0.00 0.00
> > 
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 1.65 0.05 1.70 0.00 0.00
> > 
> >>
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > 
> > [1] 2.59 0.02 2.61 0.00 0.00
> > 
> >>system.time(as.POSIXlt(paste(y,m,d, sep="-")))
> > 
> > [1] 2.73 0.00 2.74 0.00 0.00
> > 
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 1.29 0.01 1.30 0.00 0.00
> > 
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 0.94 0.00 0.94 0.00 0.00
> > 
> >>system.time(ymd.to.POSIXlt(y, m, d))
> > 
> > [1] 1.06 0.01 1.07 0.00 0.00
> > 
> > 
> > 
> > It seems as though the first simulation makes it "easier"
for
> > subsequent simulations of the same type AND also for 
> simulations of a
> > somewhat different type also.  The degree to which it
"helps" varies
> > according to just what is being run (no surprise there).  
> What I can't
> > figure out is what is happening that makes it quicker for second and
> > subsequent runs.
> > 
> > I even tried doing a gc() and setting seeds before each run 
> to make a
> > more direct comparison, but it made no difference other than being
> > slightly less variable.  I have seen a similar phenomenon in other
> > types of simulations.
> > 
> > In the case of this code, it makes no difference whether n is 100 or
> > 10000000.  Would that be attibutable to lazy evaluation?
> > 
> > 
> > 
> >>version
> > 
> >          _                
> > platform i686-pc-linux-gnu
> > arch     i686             
> > os       linux-gnu        
> > system   i686, linux-gnu  
> > status                    
> > major    1                
> > minor    9.0              
> > year     2004             
> > month    04               
> > day      12               
> > language R         
> > 
> > 
> > It's not exactly a problem, but it could have a bearing on
comparing
> > processing times which is something that happens from time to time.
> > In the comparison that gave rise to the code above, the order would
> > have made a substantial difference to the perceived effectiveness of
> > Richard's code.
> > 
> > 
> 
> -- 
> Roger D. Peng
> http://www.biostat.jhsph.edu/~rpeng/
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Gabor Grothendieck

2004-Jun-14 02:41 UTC

head link

[R] Quirks with system.time and simulations

I don't know the answer but I tried running each of the following a few
times:

gc(); system.time(for(i in 1:15)as.POSIXlt(paste(y,m,d, sep="-")))
gc(); system.time(for(i in 1:15)ymd.to.POSIXlt(y, m, d))

and noticed that the Vcells gc trigger and Mb used varied all over
the place.  Does that suggest anything?

Patrick Connolly <p.connolly <at> hortresearch.co.nz> writes:

: 
: I tried the code that Richard O'Keefe posted last week, to wit:
: 
: library(chron)
:     ymd.to.POSIXlt <-
:         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
:     n <- 100000
:     y <- sample(1970:2004, n, replace=TRUE)
:     m <- sample(1:12,      n, replace=TRUE)
:     d <- sample(1:28,      n, replace=TRUE)
:     system.time(ymd.to.POSIXlt(y, m, d))
:     [1]  8.78  0.10 31.76  0.00  0.00
:     system.time(as.POSIXlt(paste(y,m,d, sep="-")))
:     [1] 14.64  0.13 53.30  0.00  0.00
: 
: 
: On a somewhat newer machine, I got
: 
: $ R --vanilla
: 
: R : Copyright 2004, The R Foundation for Statistical Computing
: Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
: 
: [...]
: 
: > library(chron)
: >     ymd.to.POSIXlt <-
: +         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
: >     n <- 100000
: >     y <- sample(1970:2004, n, replace=TRUE)
: >     m <- sample(1:12,      n, replace=TRUE)
: >     d <- sample(1:28,      n, replace=TRUE)
: > 
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.67 0.24 2.01 0.00 0.00
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 3.06 0.02 3.08 0.00 0.00
: > 
: 
: But then I tried a few more times...
: 
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.09 0.04 1.13 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.11 0.09 1.20 0.00 0.00
: >
: 
: The second time is a lot faster, but subsequent ones don't "improve
further".
: '
: But with the "standard" function,
: 
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.64 0.02 2.66 0.00 0.00
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.82 0.03 2.85 0.00 0.00
: >
: ... it does improve slightly but rather a lot less.
: 
: THEN
: 
: If I compare the two methods in the reverse order,
: 
: $ R --vanilla
: 
: R : Copyright 2004, The R Foundation for Statistical Computing
: Version 1.9.0  (2004-04-12), ISBN 3-900051-00-3
: 
: [....]
: 
: > library(chron)
: >     ymd.to.POSIXlt <-
: +         function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))
: >     n <- 100000
: >     y <- sample(1970:2004, n, replace=TRUE)
: >     m <- sample(1:12,      n, replace=TRUE)
: >     d <- sample(1:28,      n, replace=TRUE)
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 3.66 0.02 3.76 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.65 0.05 1.70 0.00 0.00
: > 
: > 
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.59 0.02 2.61 0.00 0.00
: > system.time(as.POSIXlt(paste(y,m,d, sep="-")))
: [1] 2.73 0.00 2.74 0.00 0.00
: > 
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.29 0.01 1.30 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 0.94 0.00 0.94 0.00 0.00
: > system.time(ymd.to.POSIXlt(y, m, d))
: [1] 1.06 0.01 1.07 0.00 0.00
: > 
: 
: It seems as though the first simulation makes it "easier" for
: subsequent simulations of the same type AND also for simulations of a
: somewhat different type also.  The degree to which it "helps" varies
: according to just what is being run (no surprise there).  What I can't
: figure out is what is happening that makes it quicker for second and
: subsequent runs.
: 
: I even tried doing a gc() and setting seeds before each run to make a
: more direct comparison, but it made no difference other than being
: slightly less variable.  I have seen a similar phenomenon in other
: types of simulations.
: 
: In the case of this code, it makes no difference whether n is 100 or
: 10000000.  Would that be attibutable to lazy evaluation?
: 
: > version
:          _                
: platform i686-pc-linux-gnu
: arch     i686             
: os       linux-gnu        
: system   i686, linux-gnu  
: status                    
: major    1                
: minor    9.0              
: year     2004             
: month    04               
: day      12               
: language R         
: 
: It's not exactly a problem, but it could have a bearing on comparing
: processing times which is something that happens from time to time.
: In the comparison that gave rise to the code above, the order would
: have made a substantial difference to the perceived effectiveness of
: Richard's code.
:

Thomas Lumley

2004-Jun-14 14:08 UTC

head link

[R] Quirks with system.time and simulations

On Mon, 14 Jun 2004, Patrick Connolly wrote:>
> It seems as though the first simulation makes it "easier" for
> subsequent simulations of the same type AND also for simulations of a
> somewhat different type also.  The degree to which it "helps"
varies
> according to just what is being run (no surprise there).  What I can't
> figure out is what is happening that makes it quicker for second and
> subsequent runs.
>
Luke Tierney would be the person most likely to have a definitive answer,
but my guess is that it is because of the generational garbage collector.
When this was added the speed of R improved about 20%, and the main reason
is that most garbage collections involve only recently allocated memory.
One effect is that memory blocks tend to get reused for the same objects
in later iterations of the simulation, which is more efficient.  For the
second simulation the gains are smaller.

Possibly a more accurate benchmark would be something like

Rprof("timing.prof")
replicate(LOTS, {oneway(); otherway()})
Rprof(NULL)
summaryRprof("timing.prof")

interleaving the two methods.


	-thomas

Possibly Parallel Threads

Search for more maybe matching threads

R help - Jun 2004 - Quirks with system.time and simulations

[R] Quirks with system.time and simulations

[R] Quirks with system.time and simulations

[R] Quirks with system.time and simulations

[R] Quirks with system.time and simulations

[R] Quirks with system.time and simulations

Possibly Parallel Threads