James Marca
2011-Nov-04 16:34 UTC
[R] zoo performance regression noticed (1.6-5 is faster...)
Good morning, I have discovered what I believe to be a performance regression between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply. On zoo 1.6x, rollapply of my function over my data takes about 20 minutes. Using 1.7-6, the same code takes about 6 hours. R --version R version 2.13.1 (2011-07-08) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) Two versions of zoo 1.6 run *fast* On one machine I am running less /usr/lib64/R/library/zoo/DESCRIPTION Package: zoo Version: 1.6-3 Date: 2010-04-23 Title: Z's ordered observations ... Packaged: 2010-04-23 07:28:47 UTC; zeileis Repository: CRAN Date/Publication: 2010-04-23 07:43:54 Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix (Thankfully I forgot to upgrade.packages() on this machine!) On the other Package: zoo Version: 1.6-5 Date: 2011-04-08 ... Packaged: 2011-04-08 17:13:47 UTC; zeileis Repository: CRAN Date/Publication: 2011-04-08 17:27:47 Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix I have stripped out zoo 1.7-6 from all my machines. I tried to ensure all libraries were identical on the two machines (using lsof), and after finally downgrading zoo I got the second machine to be as fast as the first, so I am quite certain the difference in speed is down to the Zoo version used. My code runs a fairly simple function over a time series using the following call to process a year of 30s data (9 columns, about a million rows): vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)] ,width=40 ,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols) ,by.column=FALSE ,align='right') (The rolling.function.fn call returns a function that is initialized with the initial call above (a trick I learned from Javascript)) If this is a known situation with the new 1.7 generation Zoo, my apologies and I'll go away. If my code could be turned into a useful test, I'd be happy to help out as much as I'm able. Given the extreme runtime difference though, I thought I should offer my help in this case, since zoo is such a useful package in my work. Regards, James Marca -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111104/a7a4de1a/attachment.bin>
Gabor Grothendieck
2011-Nov-04 16:56 UTC
[R] zoo performance regression noticed (1.6-5 is faster...)
On Fri, Nov 4, 2011 at 12:34 PM, James Marca <jmarca at translab.its.uci.edu> wrote:> Good morning, > > I have discovered what I believe to be a performance regression > between Zoo 1.6x and Zoo 1.7-6 in the application of rollapply. > On zoo 1.6x, rollapply of my function over my data takes about 20 > minutes. Using 1.7-6, the same code takes about 6 hours. > > R --version > R version 2.13.1 (2011-07-08) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: x86_64-pc-linux-gnu (64-bit) > > Two versions of zoo 1.6 run *fast* ?On one machine I am running > > ?less /usr/lib64/R/library/zoo/DESCRIPTION > ?Package: zoo > ?Version: 1.6-3 > ?Date: 2010-04-23 > ?Title: Z's ordered observations > ?... > ?Packaged: 2010-04-23 07:28:47 UTC; zeileis > ?Repository: CRAN > ?Date/Publication: 2010-04-23 07:43:54 > ?Built: R 2.10.1; ; 2010-04-25 06:41:34 UTC; unix > > (Thankfully I forgot to upgrade.packages() on this machine!) > > On the other > > ?Package: zoo > ?Version: 1.6-5 > ?Date: 2011-04-08 > ?... > ?Packaged: 2011-04-08 17:13:47 UTC; zeileis > ?Repository: CRAN > ?Date/Publication: 2011-04-08 17:27:47 > ?Built: R 2.13.1; ; 2011-11-04 15:49:54 UTC; unix > > I have stripped out zoo 1.7-6 from all my machines. > > I tried to ensure all libraries were identical on the two machines > (using lsof), and after finally downgrading zoo I got the second > machine to be as fast as the first, so I am quite certain the > difference in speed is down to the Zoo version used. > > My code runs a fairly simple function over a time series using the > following call to process a year of 30s data (9 columns, about a > million rows): > > ? ?vals <- rollapply(data=ts.data[,c(n.3.cols, o.3.cols,volocc.cols)] > ? ? ? ? ? ? ? ? ?,width=40 > ? ? ? ? ? ? ? ? ?,FUN=rolling.function.fn(n.cols=n.3.cols,o.cols=o.3.cols,vo.cols=volocc.cols) > ? ? ? ? ? ? ? ? ?,by.column=FALSE > ? ? ? ? ? ? ? ? ?,align='right') > > > (The rolling.function.fn call returns a function that is initialized > with the initial call above (a trick I learned from Javascript)) > > If this is a known situation with the new 1.7 generation Zoo, my > apologies and I'll go away. ?If my code could be turned into a useful > test, I'd be happy to help out as much as I'm able. ?Given the extreme > runtime difference though, I thought I should offer my help in this > case, since zoo is such a useful package in my work.This was a known problem and was fixed but if its still there then there must be some other condition under which it can occur as well. If you can provide a small self contained reproducible example it would help in tracking it down. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com