On Aug 23, 2010, at 1:19 PM, Radford Neal wrote:
> Looking for more ways to speed up R, I've found that large
> improvements are possible in the speed of "sum" and
"prod" for long
> real vectors.
>
The results are likely very compiler- and architecture specific. On my machine
[x86_64, OS X 10.6] your version is actually slower for narm (the more you
optimize the more assumptions you are making which may turn out to be false):
baseline:> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
2.412 0.018 2.431 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
2.510 0.001 2.509
RN version:> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
1.691 0.004 1.694 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
3.527 0.001 3.528
If you simply take out the narm check and don't mess with updated you get to
a more reasonable:> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
1.688 0.003 1.691 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
2.522 0.000 2.522
But just to bring things into perspective -- simply changing the compiler [here
just for that one file with still the same optimization level] will give
you:> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
5.098 0.003 5.102 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
5.213 0.000 5.214
... and using the original (unmodified) R code with a more optimized flags will
give you> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
1.835 0.003 1.838 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
2.473 0.001 2.474
... and the "optimal" version above (3rd) with the same optimization
settings:
> a <- seq(0,1,length=1000)
> system.time({for (i in 1:1000000) b <- sum(a)})
user system elapsed
1.670 0.003 1.673 > system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
user system elapsed
5.555 0.001 5.556
... so you just can't win ...
Cheers,
Simon
> Here is a little test with R version 2.11.1 on an Intel Linux system
>
>> a <- seq(0,1,length=1000)
>> system.time({for (i in 1:1000000) b <- sum(a)})
> user system elapsed
> 4.800 0.010 4.817
>> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
> user system elapsed
> 8.240 0.030 8.269
>
> and here is the same with "sum" and "prod" modified as
described below:
>
>> a <- seq(0,1,length=1000)
>> system.time({for (i in 1:1000000) b <- sum(a)})
> user system elapsed
> 1.81 0.00 1.81
>> system.time({for (i in 1:1000000) b <- sum(a,na.rm=TRUE)})
> user system elapsed
> 7.250 0.010 7.259
>
> That's an improvement by a factor of 2.65 for real vectors of length
> 1000 with na.rm=FALSE (the default), and an improvement of 12% when
> na.rm=TRUE. Of course, the improvement is smaller for very short
> vectors.
>
> The biggest reason for the improvement is that the current code (in
> 2.11.1 and in the development release of 2010-08-19) makes a costly
> call of ISNAN even when the option is na.rm=FALSE. The inner loop
> can also be sped up a bit in other respects.
>
> Here is the old procedure, in src/main/summary.c:
>
> static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
> {
> LDOUBLE s = 0.0;
> int i;
> Rboolean updated = FALSE;
>
> for (i = 0; i < n; i++) {
> if (!ISNAN(x[i]) || !narm) {
> if(!updated) updated = TRUE;
> s += x[i];
> }
> }
> *value = s;
>
> return(updated);
> }
>
> and here is my modified version:
>
> static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
> {
> LDOUBLE s = 0.0;
> int i;
> Rboolean updated = FALSE;
>
> if (narm) {
> for (i = 0; i < n; i++) {
> if (!ISNAN(x[i])) {
> s += x[i];
> updated = TRUE;
> break;
> }
> }
> for (i = i+1; i < n; i++) {
> if (!ISNAN(x[i]))
> s += x[i];
> }
> } else {
> for (i = 0; i < n; i++)
> s += x[i];
> if (n>0) updated = TRUE;
> }
>
> *value = s;
>
> return(updated);
> }
>
> An entirely analogous improvement can be made to the "prod"
function.
>
> Radford Neal
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>