The following is at least as much out of intellectual curiosity as for practical reasons. On reviewing some code written by novices to R, I came across: crossprod(x, y)[1,1] I thought, "That isn't a very S way of saying that, I wonder what the penalty is for using 'crossprod'." To my surprise the penalty was substantially negative. Handily the client had S-PLUS as well -- there the sign of the penalty was as I had expected, but the order of magnitude was off. Here are the timings of 1 million computations on vectors of length 1000. This is under Windows, R version 1.9.1 and S-PLUS 6.2 (on the same machine). Command R S-PLUS sum(x * y) 28.61 97.6 crossprod(x, y)[1,1] 6.77 2256.2 Another example is when computing the sums of the columns of a matrix. For example: set.seed(1) jjm <- matrix(rnorm(600), 5) Timings for this under Windows 2000 with R version 2.0.1 (on an old chip running at about 0.7Ghz) for 100,000 computations are: apply(jjm, 2, sum) 536.59 colSums(jjm) 18.26 rep(1,5) %*% jjm 15.41 crossprod(rep(1,5), jjm) 13.16 (These timings seem to be stable across R versions and on at least one Linux platform.) Andy Liaw showed another example of 'crossprod' being fast a couple days ago on R-help. Questions for those with a more global picture of the code: * Is the speed advantage of 'crossprod' inherent, or is it because more care has been taken with its implementation than the other functions? * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is going to BLAS while 'sum' can't? * Would it make sense to (essentially) use 'crossprod' in 'colSums' and its friends at least for the special case of matrices? Patrick Burns Burns Statistics patrick@burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User")
Patrick Burns wrote:> The following is at least as much out of intellectual curiosity > as for practical reasons. > On reviewing some code written by novices to R, I came > across: > > crossprod(x, y)[1,1] > > I thought, "That isn't a very S way of saying that, I wonder > what the penalty is for using 'crossprod'." To my surprise the > penalty was substantially negative. Handily the client had S-PLUS > as well -- there the sign of the penalty was as I had expected, but > the order of magnitude was off. > > Here are the timings of 1 million computations on vectors of > length 1000. This is under Windows, R version 1.9.1 and S-PLUS > 6.2 (on the same machine). > > Command R S-PLUS > sum(x * y) 28.61 97.6 > crossprod(x, y)[1,1] 6.77 2256.2 > > > Another example is when computing the sums of the columns of a > matrix. For example: > > set.seed(1) > jjm <- matrix(rnorm(600), 5) > > Timings for this under Windows 2000 with R version 2.0.1 (on an > old chip running at about 0.7Ghz) for 100,000 computations are: > > apply(jjm, 2, sum) 536.59 > colSums(jjm) 18.26 > rep(1,5) %*% jjm 15.41 > crossprod(rep(1,5), jjm) 13.16 > > (These timings seem to be stable across R versions and on at least > one Linux platform.) > > Andy Liaw showed another example of 'crossprod' being fast a couple > days ago on R-help. > > Questions for those with a more global picture of the code: > > * Is the speed advantage of 'crossprod' inherent, or is it because > more care has been taken with its implementation than the other > functions? > > * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is > going to BLAS while 'sum' can't?For a numeric matrix crossprod ends up calling level 3 BLAS; either dsyrk for the single argument case or dgemm for the two argument case. Especially in accelerated versions of the BLAS like Atlas or Goto's BLAS, those routines are hideously efficient and that's where you are seeing the big gain in speed. By the way, you didn't mention if you had an accelerated BLAS installed with R. Do you?
> From: Douglas Bates > > Patrick Burns wrote: > > The following is at least as much out of intellectual curiosity > > as for practical reasons. > > On reviewing some code written by novices to R, I came > > across: > > > > crossprod(x, y)[1,1] > > > > I thought, "That isn't a very S way of saying that, I wonder > > what the penalty is for using 'crossprod'." To my surprise the > > penalty was substantially negative. Handily the client had S-PLUS > > as well -- there the sign of the penalty was as I had expected, but > > the order of magnitude was off. > > > > Here are the timings of 1 million computations on vectors of > > length 1000. This is under Windows, R version 1.9.1 and S-PLUS > > 6.2 (on the same machine). > > > > Command R > S-PLUS > > sum(x * y) 28.61 > 97.6 > > crossprod(x, y)[1,1] 6.77 2256.2 > > > > > > Another example is when computing the sums of the columns of a > > matrix. For example: > > > > set.seed(1) > > jjm <- matrix(rnorm(600), 5) > > > > Timings for this under Windows 2000 with R version 2.0.1 (on an > > old chip running at about 0.7Ghz) for 100,000 computations are: > > > > apply(jjm, 2, sum) 536.59 > > colSums(jjm) 18.26 > > rep(1,5) %*% jjm 15.41 > > crossprod(rep(1,5), jjm) 13.16 > > > > (These timings seem to be stable across R versions and on at least > > one Linux platform.) > > > > Andy Liaw showed another example of 'crossprod' being fast a couple > > days ago on R-help. > > > > Questions for those with a more global picture of the code: > > > > * Is the speed advantage of 'crossprod' inherent, or is it because > > more care has been taken with its implementation than the other > > functions? > > > > * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is > > going to BLAS while 'sum' can't? > > For a numeric matrix crossprod ends up calling level 3 BLAS; either > dsyrk for the single argument case or dgemm for the two > argument case. > Especially in accelerated versions of the BLAS like Atlas or Goto's > BLAS, those routines are hideously efficient and that's where you are > seeing the big gain in speed. > > By the way, you didn't mention if you had an accelerated BLAS > installed > with R. Do you?For the case of crossprod(x, w) for computing weighted mean of x (the posting that Pat referred to), I tried the ATLAS-linked Rblas.dll on CRAN, and it's actually slower than the stock Rblas.dll. I believe Duncan M. had observed similar things for small-ish problems. (I used a Pentium M laptop, and tried both the P3 and P4 versions.) So, I think my main point is that even with non-optimized BLAS crossprod can be way faster. Andy> ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > >
A few weeks ago I noticed > z <- matrix(rnorm(20000),10000,2) > system.time(for (i in 1:1000) apply(z,2,sum)) [1] 13.44 0.48 14.08 0.00 0.00 > system.time(for (i in 1:1000) rep(1,10000) %*% z) [1] 6.46 0.11 6.84 0.00 0.00 which seemed completely contrary to all my childhood teachings. Now > system.time(for (i in 1:1000) crossprod(rep(1,10000), z)) [1] 1.90 0.12 2.24 0.00 0.00 makes sense because it is suppose to be faster than %*% , but why is apply so slow? (And should I go back and change apply in my code everywhere or is this likely to reverse again?) Paul Gilbert Patrick Burns wrote:> The following is at least as much out of intellectual curiosity > as for practical reasons. > On reviewing some code written by novices to R, I came > across: > > crossprod(x, y)[1,1] > > I thought, "That isn't a very S way of saying that, I wonder > what the penalty is for using 'crossprod'." To my surprise the > penalty was substantially negative. Handily the client had S-PLUS > as well -- there the sign of the penalty was as I had expected, but > the order of magnitude was off. > > Here are the timings of 1 million computations on vectors of > length 1000. This is under Windows, R version 1.9.1 and S-PLUS > 6.2 (on the same machine). > > Command R S-PLUS > sum(x * y) 28.61 97.6 > crossprod(x, y)[1,1] 6.77 2256.2 > > > Another example is when computing the sums of the columns of a > matrix. For example: > > set.seed(1) > jjm <- matrix(rnorm(600), 5) > > Timings for this under Windows 2000 with R version 2.0.1 (on an > old chip running at about 0.7Ghz) for 100,000 computations are: > > apply(jjm, 2, sum) 536.59 > colSums(jjm) 18.26 > rep(1,5) %*% jjm 15.41 > crossprod(rep(1,5), jjm) 13.16 > > (These timings seem to be stable across R versions and on at least > one Linux platform.) > > Andy Liaw showed another example of 'crossprod' being fast a couple > days ago on R-help. > > Questions for those with a more global picture of the code: > > * Is the speed advantage of 'crossprod' inherent, or is it because > more care has been taken with its implementation than the other > functions? > > * Is 'crossprod' faster than 'sum(x * y)' because 'crossprod' is > going to BLAS while 'sum' can't? > > * Would it make sense to (essentially) use 'crossprod' in > 'colSums' and its friends at least for the special case of matrices? > > Patrick Burns > > Burns Statistics > patrick@burns-stat.com > +44 (0)20 8525 0696 > http://www.burns-stat.com > (home of S Poetry and "A Guide for the Unwilling S User") > > ______________________________________________ > R-devel@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >