I tried the code that Richard O'Keefe posted last week, to wit: library(chron) ymd.to.POSIXlt <- function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) n <- 100000 y <- sample(1970:2004, n, replace=TRUE) m <- sample(1:12, n, replace=TRUE) d <- sample(1:28, n, replace=TRUE) system.time(ymd.to.POSIXlt(y, m, d)) [1] 8.78 0.10 31.76 0.00 0.00 system.time(as.POSIXlt(paste(y,m,d, sep="-"))) [1] 14.64 0.13 53.30 0.00 0.00 On a somewhat newer machine, I got $ R --vanilla R : Copyright 2004, The R Foundation for Statistical Computing Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 [...]> library(chron) > ymd.to.POSIXlt <-+ function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))> n <- 100000 > y <- sample(1970:2004, n, replace=TRUE) > m <- sample(1:12, n, replace=TRUE) > d <- sample(1:28, n, replace=TRUE) > > system.time(ymd.to.POSIXlt(y, m, d))[1] 1.67 0.24 2.01 0.00 0.00> system.time(as.POSIXlt(paste(y,m,d, sep="-")))[1] 3.06 0.02 3.08 0.00 0.00>But then I tried a few more times...> system.time(ymd.to.POSIXlt(y, m, d))[1] 1.09 0.04 1.13 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))[1] 1.11 0.09 1.20 0.00 0.00>The second time is a lot faster, but subsequent ones don't "improve further". ' But with the "standard" function,> system.time(as.POSIXlt(paste(y,m,d, sep="-")))[1] 2.64 0.02 2.66 0.00 0.00> system.time(as.POSIXlt(paste(y,m,d, sep="-")))[1] 2.82 0.03 2.85 0.00 0.00>... it does improve slightly but rather a lot less. THEN If I compare the two methods in the reverse order, $ R --vanilla R : Copyright 2004, The R Foundation for Statistical Computing Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 [....]> library(chron) > ymd.to.POSIXlt <-+ function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d)))> n <- 100000 > y <- sample(1970:2004, n, replace=TRUE) > m <- sample(1:12, n, replace=TRUE) > d <- sample(1:28, n, replace=TRUE) > system.time(as.POSIXlt(paste(y,m,d, sep="-")))[1] 3.66 0.02 3.76 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))[1] 1.65 0.05 1.70 0.00 0.00> > > system.time(as.POSIXlt(paste(y,m,d, sep="-")))[1] 2.59 0.02 2.61 0.00 0.00> system.time(as.POSIXlt(paste(y,m,d, sep="-")))[1] 2.73 0.00 2.74 0.00 0.00> > system.time(ymd.to.POSIXlt(y, m, d))[1] 1.29 0.01 1.30 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))[1] 0.94 0.00 0.94 0.00 0.00> system.time(ymd.to.POSIXlt(y, m, d))[1] 1.06 0.01 1.07 0.00 0.00>It seems as though the first simulation makes it "easier" for subsequent simulations of the same type AND also for simulations of a somewhat different type also. The degree to which it "helps" varies according to just what is being run (no surprise there). What I can't figure out is what is happening that makes it quicker for second and subsequent runs. I even tried doing a gc() and setting seeds before each run to make a more direct comparison, but it made no difference other than being slightly less variable. I have seen a similar phenomenon in other types of simulations. In the case of this code, it makes no difference whether n is 100 or 10000000. Would that be attibutable to lazy evaluation?> version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 1 minor 9.0 year 2004 month 04 day 12 language R It's not exactly a problem, but it could have a bearing on comparing processing times which is something that happens from time to time. In the comparison that gave rise to the code above, the order would have made a substantial difference to the perceived effectiveness of Richard's code. -- Patrick Connolly HortResearch Mt Albert Auckland New Zealand Ph: +64-9 815 4200 x 7188 ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~ I have the world`s largest collection of seashells. I keep it on all the beaches of the world ... Perhaps you`ve seen it. ---Steven Wright ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I think the first time is potentially much slower because of a garbage collection. R-devel has a flag `gcFirst' for system.time() which (I think) forces a garbage collection before timing. -roger Patrick Connolly wrote:> I tried the code that Richard O'Keefe posted last week, to wit: > > library(chron) > ymd.to.POSIXlt <- > function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) > n <- 100000 > y <- sample(1970:2004, n, replace=TRUE) > m <- sample(1:12, n, replace=TRUE) > d <- sample(1:28, n, replace=TRUE) > system.time(ymd.to.POSIXlt(y, m, d)) > [1] 8.78 0.10 31.76 0.00 0.00 > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > [1] 14.64 0.13 53.30 0.00 0.00 > > > On a somewhat newer machine, I got > > $ R --vanilla > > R : Copyright 2004, The R Foundation for Statistical Computing > Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 > > [...] > > > >>library(chron) >> ymd.to.POSIXlt <- > > + function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) > >> n <- 100000 >> y <- sample(1970:2004, n, replace=TRUE) >> m <- sample(1:12, n, replace=TRUE) >> d <- sample(1:28, n, replace=TRUE) >> >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 1.67 0.24 2.01 0.00 0.00 > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 3.06 0.02 3.08 0.00 0.00 > > > But then I tried a few more times... > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 1.09 0.04 1.13 0.00 0.00 > >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 1.11 0.09 1.20 0.00 0.00 > > > The second time is a lot faster, but subsequent ones don't "improve further". > ' > But with the "standard" function, > > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 2.64 0.02 2.66 0.00 0.00 > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 2.82 0.03 2.85 0.00 0.00 > > ... it does improve slightly but rather a lot less. > > > THEN > > If I compare the two methods in the reverse order, > > > $ R --vanilla > > R : Copyright 2004, The R Foundation for Statistical Computing > Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 > > [....] > > > >>library(chron) >> ymd.to.POSIXlt <- > > + function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) > >> n <- 100000 >> y <- sample(1970:2004, n, replace=TRUE) >> m <- sample(1:12, n, replace=TRUE) >> d <- sample(1:28, n, replace=TRUE) >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 3.66 0.02 3.76 0.00 0.00 > >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 1.65 0.05 1.70 0.00 0.00 > >> >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 2.59 0.02 2.61 0.00 0.00 > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 2.73 0.00 2.74 0.00 0.00 > >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 1.29 0.01 1.30 0.00 0.00 > >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 0.94 0.00 0.94 0.00 0.00 > >>system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 1.06 0.01 1.07 0.00 0.00 > > > > It seems as though the first simulation makes it "easier" for > subsequent simulations of the same type AND also for simulations of a > somewhat different type also. The degree to which it "helps" varies > according to just what is being run (no surprise there). What I can't > figure out is what is happening that makes it quicker for second and > subsequent runs. > > I even tried doing a gc() and setting seeds before each run to make a > more direct comparison, but it made no difference other than being > slightly less variable. I have seen a similar phenomenon in other > types of simulations. > > In the case of this code, it makes no difference whether n is 100 or > 10000000. Would that be attibutable to lazy evaluation? > > > >>version > > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 1 > minor 9.0 > year 2004 > month 04 > day 12 > language R > > > It's not exactly a problem, but it could have a bearing on comparing > processing times which is something that happens from time to time. > In the comparison that gave rise to the code above, the order would > have made a substantial difference to the perceived effectiveness of > Richard's code. > >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/
I wonder if there's also effect of cpu cache... Andy> From: Roger D. Peng > > I think the first time is potentially much slower because of a > garbage collection. R-devel has a flag `gcFirst' for > system.time() which (I think) forces a garbage collection before > timing. > > -roger > > Patrick Connolly wrote: > > I tried the code that Richard O'Keefe posted last week, to wit: > > > > library(chron) > > ymd.to.POSIXlt <- > > function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) > > n <- 100000 > > y <- sample(1970:2004, n, replace=TRUE) > > m <- sample(1:12, n, replace=TRUE) > > d <- sample(1:28, n, replace=TRUE) > > system.time(ymd.to.POSIXlt(y, m, d)) > > [1] 8.78 0.10 31.76 0.00 0.00 > > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > [1] 14.64 0.13 53.30 0.00 0.00 > > > > > > On a somewhat newer machine, I got > > > > $ R --vanilla > > > > R : Copyright 2004, The R Foundation for Statistical Computing > > Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 > > > > [...] > > > > > > > >>library(chron) > >> ymd.to.POSIXlt <- > > > > + function (y, m, d) as.POSIXlt(chron(julian(y=y, > x=m, d=d))) > > > >> n <- 100000 > >> y <- sample(1970:2004, n, replace=TRUE) > >> m <- sample(1:12, n, replace=TRUE) > >> d <- sample(1:28, n, replace=TRUE) > >> > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 1.67 0.24 2.01 0.00 0.00 > > > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > > > [1] 3.06 0.02 3.08 0.00 0.00 > > > > > > But then I tried a few more times... > > > > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 1.09 0.04 1.13 0.00 0.00 > > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 1.11 0.09 1.20 0.00 0.00 > > > > > > The second time is a lot faster, but subsequent ones don't > "improve further". > > ' > > But with the "standard" function, > > > > > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > > > [1] 2.64 0.02 2.66 0.00 0.00 > > > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > > > [1] 2.82 0.03 2.85 0.00 0.00 > > > > ... it does improve slightly but rather a lot less. > > > > > > THEN > > > > If I compare the two methods in the reverse order, > > > > > > $ R --vanilla > > > > R : Copyright 2004, The R Foundation for Statistical Computing > > Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 > > > > [....] > > > > > > > >>library(chron) > >> ymd.to.POSIXlt <- > > > > + function (y, m, d) as.POSIXlt(chron(julian(y=y, > x=m, d=d))) > > > >> n <- 100000 > >> y <- sample(1970:2004, n, replace=TRUE) > >> m <- sample(1:12, n, replace=TRUE) > >> d <- sample(1:28, n, replace=TRUE) > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > > > [1] 3.66 0.02 3.76 0.00 0.00 > > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 1.65 0.05 1.70 0.00 0.00 > > > >> > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > > > [1] 2.59 0.02 2.61 0.00 0.00 > > > >>system.time(as.POSIXlt(paste(y,m,d, sep="-"))) > > > > [1] 2.73 0.00 2.74 0.00 0.00 > > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 1.29 0.01 1.30 0.00 0.00 > > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 0.94 0.00 0.94 0.00 0.00 > > > >>system.time(ymd.to.POSIXlt(y, m, d)) > > > > [1] 1.06 0.01 1.07 0.00 0.00 > > > > > > > > It seems as though the first simulation makes it "easier" for > > subsequent simulations of the same type AND also for > simulations of a > > somewhat different type also. The degree to which it "helps" varies > > according to just what is being run (no surprise there). > What I can't > > figure out is what is happening that makes it quicker for second and > > subsequent runs. > > > > I even tried doing a gc() and setting seeds before each run > to make a > > more direct comparison, but it made no difference other than being > > slightly less variable. I have seen a similar phenomenon in other > > types of simulations. > > > > In the case of this code, it makes no difference whether n is 100 or > > 10000000. Would that be attibutable to lazy evaluation? > > > > > > > >>version > > > > _ > > platform i686-pc-linux-gnu > > arch i686 > > os linux-gnu > > system i686, linux-gnu > > status > > major 1 > > minor 9.0 > > year 2004 > > month 04 > > day 12 > > language R > > > > > > It's not exactly a problem, but it could have a bearing on comparing > > processing times which is something that happens from time to time. > > In the comparison that gave rise to the code above, the order would > > have made a substantial difference to the perceived effectiveness of > > Richard's code. > > > > > > -- > Roger D. Peng > http://www.biostat.jhsph.edu/~rpeng/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >
I don't know the answer but I tried running each of the following a few times: gc(); system.time(for(i in 1:15)as.POSIXlt(paste(y,m,d, sep="-"))) gc(); system.time(for(i in 1:15)ymd.to.POSIXlt(y, m, d)) and noticed that the Vcells gc trigger and Mb used varied all over the place. Does that suggest anything? Patrick Connolly <p.connolly <at> hortresearch.co.nz> writes: : : I tried the code that Richard O'Keefe posted last week, to wit: : : library(chron) : ymd.to.POSIXlt <- : function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) : n <- 100000 : y <- sample(1970:2004, n, replace=TRUE) : m <- sample(1:12, n, replace=TRUE) : d <- sample(1:28, n, replace=TRUE) : system.time(ymd.to.POSIXlt(y, m, d)) : [1] 8.78 0.10 31.76 0.00 0.00 : system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 14.64 0.13 53.30 0.00 0.00 : : : On a somewhat newer machine, I got : : $ R --vanilla : : R : Copyright 2004, The R Foundation for Statistical Computing : Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 : : [...] : : > library(chron) : > ymd.to.POSIXlt <- : + function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) : > n <- 100000 : > y <- sample(1970:2004, n, replace=TRUE) : > m <- sample(1:12, n, replace=TRUE) : > d <- sample(1:28, n, replace=TRUE) : > : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 1.67 0.24 2.01 0.00 0.00 : > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 3.06 0.02 3.08 0.00 0.00 : > : : But then I tried a few more times... : : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 1.09 0.04 1.13 0.00 0.00 : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 1.11 0.09 1.20 0.00 0.00 : > : : The second time is a lot faster, but subsequent ones don't "improve further". : ' : But with the "standard" function, : : > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 2.64 0.02 2.66 0.00 0.00 : > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 2.82 0.03 2.85 0.00 0.00 : > : ... it does improve slightly but rather a lot less. : : THEN : : If I compare the two methods in the reverse order, : : $ R --vanilla : : R : Copyright 2004, The R Foundation for Statistical Computing : Version 1.9.0 (2004-04-12), ISBN 3-900051-00-3 : : [....] : : > library(chron) : > ymd.to.POSIXlt <- : + function (y, m, d) as.POSIXlt(chron(julian(y=y, x=m, d=d))) : > n <- 100000 : > y <- sample(1970:2004, n, replace=TRUE) : > m <- sample(1:12, n, replace=TRUE) : > d <- sample(1:28, n, replace=TRUE) : > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 3.66 0.02 3.76 0.00 0.00 : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 1.65 0.05 1.70 0.00 0.00 : > : > : > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 2.59 0.02 2.61 0.00 0.00 : > system.time(as.POSIXlt(paste(y,m,d, sep="-"))) : [1] 2.73 0.00 2.74 0.00 0.00 : > : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 1.29 0.01 1.30 0.00 0.00 : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 0.94 0.00 0.94 0.00 0.00 : > system.time(ymd.to.POSIXlt(y, m, d)) : [1] 1.06 0.01 1.07 0.00 0.00 : > : : It seems as though the first simulation makes it "easier" for : subsequent simulations of the same type AND also for simulations of a : somewhat different type also. The degree to which it "helps" varies : according to just what is being run (no surprise there). What I can't : figure out is what is happening that makes it quicker for second and : subsequent runs. : : I even tried doing a gc() and setting seeds before each run to make a : more direct comparison, but it made no difference other than being : slightly less variable. I have seen a similar phenomenon in other : types of simulations. : : In the case of this code, it makes no difference whether n is 100 or : 10000000. Would that be attibutable to lazy evaluation? : : > version : _ : platform i686-pc-linux-gnu : arch i686 : os linux-gnu : system i686, linux-gnu : status : major 1 : minor 9.0 : year 2004 : month 04 : day 12 : language R : : It's not exactly a problem, but it could have a bearing on comparing : processing times which is something that happens from time to time. : In the comparison that gave rise to the code above, the order would : have made a substantial difference to the perceived effectiveness of : Richard's code. :
On Mon, 14 Jun 2004, Patrick Connolly wrote:> > It seems as though the first simulation makes it "easier" for > subsequent simulations of the same type AND also for simulations of a > somewhat different type also. The degree to which it "helps" varies > according to just what is being run (no surprise there). What I can't > figure out is what is happening that makes it quicker for second and > subsequent runs. >Luke Tierney would be the person most likely to have a definitive answer, but my guess is that it is because of the generational garbage collector. When this was added the speed of R improved about 20%, and the main reason is that most garbage collections involve only recently allocated memory. One effect is that memory blocks tend to get reused for the same objects in later iterations of the simulation, which is more efficient. For the second simulation the gains are smaller. Possibly a more accurate benchmark would be something like Rprof("timing.prof") replicate(LOTS, {oneway(); otherway()}) Rprof(NULL) summaryRprof("timing.prof") interleaving the two methods. -thomas