thr3ads.net - R help - [R] crossprod vs %*% timing [Oct 2004]

If this information is useful, please help other people find it:
Share via:

Robin Hankin

2004-Oct-06 08:09 UTC

[R] crossprod vs %*% timing

Hi

the manpage says that crossprod(x,y) is formally equivalent to, but
faster than, the call 't(x) %*% y'.

I have a vector 'a' and a matrix 'A', and need to evaluate
't(a) %*% A
%*% a' many many times, and performance is becoming crucial.  With

f1 <- function(a,X){ ignore <- t(a) %*% X %*% a               }
f2 <- function(a,X){ ignore <- crossprod(t(crossprod(a,X)),a) }
f3 <- function(a,X){ ignore <- crossprod(a,X) %*% a           }

a <- rnorm(100)
X <- matrix(rnorm(10000),100,100)

print(system.time( for(i in 1:10000){ f1(a,X)}))
print(system.time( for(i in 1:10000){ f2(a,X)}))
print(system.time( for(i in 1:10000){ f3(a,X)}))


I get something like:

[1] 2.68 0.05 2.66 0.00 0.00
[1] 0.48 0.00 0.49 0.00 0.00
[1] 0.29 0.00 0.31 0.00 0.00

with quite low variability from run to run.  What surprises me is the
third figure: about 40% faster than the second one, the extra time
possibly related to the call to t() (and Rprof shows about 35% of
total time in t() for my application).

So it looks like f3() is the winner hands down, at least for this
task.  What is a good way of thinking about such issues?  Anyone got
any performance tips?

I quite often need things like 'a %*% X %*% t(Y) %*% Z %*% t(b)' which
would be something like
crossprod(t(crossprod(t(crossprod(t(crossprod(a,X)),t(Y))),Z)),t(b))
(I think).

(R-1.9.1, 2GHz G5 PowerPC, MacOSX10.3.5)

-- 
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
SO14 3ZH
tel +44(0)23-8059-7743
initialDOTsurname at soc.soton.ac.uk (edit in obvious way; spam precaution)

Dimitris Rizopoulos

2004-Oct-06 08:52 UTC

head link

[R] crossprod vs %*% timing

Hi Robin,

I some cases you could benefit from special features of the matrices 
at hand (I don't know if this is applicable in your case). For 
instance, in Bayesian computations I often face the quadratic form 
t(y)%*%solve(Sigma)%*%y, where Sigma is a covariance matrix. In this 
case you could gain by using the positive definiteness of Sigma i.e.,

library(MASS)
y <- rnorm(100)
Sigma <- var(mvrnorm(1000, rep(0,100), diag(100)))
###
system.time( for(i in 1:1000) c(t(y)%*%solve(Sigma)%*%y) )
system.time( for(i in 1:1000) c(crossprod(y, solve(Sigma))%*%y) )
system.time( for(i in 1:1000) c(t(y)%*%solve(Sigma, y)) )
system.time( for(i in 1:1000) quadform(y, Sigma) )

where

quadform <- function(y, Sigma){
    # Warning: The code does not check for symmetry of Sigma
    stopifnot(is.vector(y), length(y)==nrow(Sigma))
    chol.Sigma <- chol(Sigma)
    x <- forwardsolve(t(chol.Sigma), y)
    sum(x*x)
}

I hope this helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "Robin Hankin" <rksh at soc.soton.ac.uk>
To: <R-help at stat.math.ethz.ch>
Sent: Wednesday, October 06, 2004 10:09 AM
Subject: [R] crossprod vs %*% timing

> Hi
>
> the manpage says that crossprod(x,y) is formally equivalent to, but
> faster than, the call 't(x) %*% y'.
>
> I have a vector 'a' and a matrix 'A', and need to evaluate
't(a) %*%
> A
> %*% a' many many times, and performance is becoming crucial.  With
>
> f1 <- function(a,X){ ignore <- t(a) %*% X %*% a               }
> f2 <- function(a,X){ ignore <- crossprod(t(crossprod(a,X)),a) }
> f3 <- function(a,X){ ignore <- crossprod(a,X) %*% a           }
>
> a <- rnorm(100)
> X <- matrix(rnorm(10000),100,100)
>
> print(system.time( for(i in 1:10000){ f1(a,X)}))
> print(system.time( for(i in 1:10000){ f2(a,X)}))
> print(system.time( for(i in 1:10000){ f3(a,X)}))
>
>
> I get something like:
>
> [1] 2.68 0.05 2.66 0.00 0.00
> [1] 0.48 0.00 0.49 0.00 0.00
> [1] 0.29 0.00 0.31 0.00 0.00
>
> with quite low variability from run to run.  What surprises me is 
> the
> third figure: about 40% faster than the second one, the extra time
> possibly related to the call to t() (and Rprof shows about 35% of
> total time in t() for my application).
>
> So it looks like f3() is the winner hands down, at least for this
> task.  What is a good way of thinking about such issues?  Anyone got
> any performance tips?
>
> I quite often need things like 'a %*% X %*% t(Y) %*% Z %*% t(b)' 
> which
> would be something like
> crossprod(t(crossprod(t(crossprod(t(crossprod(a,X)),t(Y))),Z)),t(b))
> (I think).
>
> (R-1.9.1, 2GHz G5 PowerPC, MacOSX10.3.5)
>
> -- 
> Robin Hankin
> Uncertainty Analyst
> Southampton Oceanography Centre
> SO14 3ZH
> tel +44(0)23-8059-7743
> initialDOTsurname at soc.soton.ac.uk (edit in obvious way; spam 
> precaution)
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>
>

Prof Brian Ripley

2004-Oct-06 09:01 UTC

head link

[R] crossprod vs %*% timing

t(a) %*% A %*% a is a quadratic form.  What varies `many many times'? If A
does not vary (often), you want to find B with B'B = A (e.g. via chol,
possibly after symmetrizing A) and the squared length of Ba.

Doing the calculations with compiled code calling LAPACK (and making use
of a decent BLAS) would save a lot of overhead.

On Wed, 6 Oct 2004, Robin Hankin wrote:
> Hi
> 
> the manpage says that crossprod(x,y) is formally equivalent to, but
> faster than, the call 't(x) %*% y'.
> 
> I have a vector 'a' and a matrix 'A', and need to evaluate
't(a) %*% A
> %*% a' many many times, and performance is becoming crucial.  With
> 
> f1 <- function(a,X){ ignore <- t(a) %*% X %*% a               }
> f2 <- function(a,X){ ignore <- crossprod(t(crossprod(a,X)),a) }
> f3 <- function(a,X){ ignore <- crossprod(a,X) %*% a           }
> 
> a <- rnorm(100)
> X <- matrix(rnorm(10000),100,100)
> 
> print(system.time( for(i in 1:10000){ f1(a,X)}))
> print(system.time( for(i in 1:10000){ f2(a,X)}))
> print(system.time( for(i in 1:10000){ f3(a,X)}))
> 
> 
> I get something like:
> 
> [1] 2.68 0.05 2.66 0.00 0.00
> [1] 0.48 0.00 0.49 0.00 0.00
> [1] 0.29 0.00 0.31 0.00 0.00
> 
> with quite low variability from run to run.  What surprises me is the
> third figure: about 40% faster than the second one, the extra time
> possibly related to the call to t() (and Rprof shows about 35% of
> total time in t() for my application).
> 
> So it looks like f3() is the winner hands down, at least for this
> task.  What is a good way of thinking about such issues?  Anyone got
> any performance tips?
> 
> I quite often need things like 'a %*% X %*% t(Y) %*% Z %*% t(b)'
which
> would be something like
> crossprod(t(crossprod(t(crossprod(t(crossprod(a,X)),t(Y))),Z)),t(b))
> (I think).
> 
> (R-1.9.1, 2GHz G5 PowerPC, MacOSX10.3.5)
> 
> 
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Peter Wolf

2004-Oct-06 09:37 UTC

head link

[R] crossprod vs %*% timing

You can study that the order of the operation has an effect on
the times of the computations.

<<*>>f1 <- function(a,X){ ignore <- t(a) %*% X %*% a          
}
f2 <- function(a,X){ ignore <- crossprod(t(crossprod(a,X)),a) }
f3 <- function(a,X){ ignore <- crossprod(a,X) %*% a           }
f4 <- function(a,X){ ignore <- (t(a) %*% X) %*% a               }
f5 <- function(a,X){ ignore <- t(a) %*% (X %*% a)               }
f6 <- function(a,X){ ignore <- crossprod(a,crossprod(X,a)) }

a <- rnorm(100); X <- matrix(rnorm(10000),100,100)

print(system.time( for(i in 1:10000){ a1<-f1(a,X)}))
print(system.time( for(i in 1:10000){ a2<-f2(a,X)}))
print(system.time( for(i in 1:10000){ a3<-f3(a,X)}))
print(system.time( for(i in 1:10000){ a4<-f4(a,X)}))
print(system.time( for(i in 1:10000){ a5<-f5(a,X)}))
print(system.time( for(i in 1:10000){ a6<-f6(a,X)}))
c(a1,a2,a3,a4,a5,a6)


@
output-start
[1] 4.06 0.04 4.11 0.00 0.00
[1] 1.48 0.00 1.53 0.00 0.00
[1] 1.17 0.00 1.22 0.00 0.00
[1] 4.10 0.01 4.39 0.00 0.00
[1] 2.58 0.01 3.24 0.00 0.00
[1] 1.10 0.00 1.29 0.00 0.00
Wed Oct  6 11:26:38 2004
[1] -79.34809 -79.34809 -79.34809 -79.34809 -79.34809 -79.34809
output-end

Peter Wolf



Robin Hankin wrote:
> Hi
>
> the manpage says that crossprod(x,y) is formally equivalent to, but
> faster than, the call 't(x) %*% y'.
>
> I have a vector 'a' and a matrix 'A', and need to evaluate
't(a) %*% A
> %*% a' many many times, and performance is becoming crucial.  With
>
> f1 <- function(a,X){ ignore <- t(a) %*% X %*% a               }
> f2 <- function(a,X){ ignore <- crossprod(t(crossprod(a,X)),a) }
> f3 <- function(a,X){ ignore <- crossprod(a,X) %*% a           }
>
> a <- rnorm(100)
> X <- matrix(rnorm(10000),100,100)
>
> print(system.time( for(i in 1:10000){ f1(a,X)}))
> print(system.time( for(i in 1:10000){ f2(a,X)}))
> print(system.time( for(i in 1:10000){ f3(a,X)}))
>
>
> I get something like:
>
> [1] 2.68 0.05 2.66 0.00 0.00
> [1] 0.48 0.00 0.49 0.00 0.00
> [1] 0.29 0.00 0.31 0.00 0.00
>
> with quite low variability from run to run.  What surprises me is the
> third figure: about 40% faster than the second one, the extra time
> possibly related to the call to t() (and Rprof shows about 35% of
> total time in t() for my application).
>
> So it looks like f3() is the winner hands down, at least for this
> task.  What is a good way of thinking about such issues?  Anyone got
> any performance tips?
>
> I quite often need things like 'a %*% X %*% t(Y) %*% Z %*% t(b)'
which
> would be something like
> crossprod(t(crossprod(t(crossprod(t(crossprod(a,X)),t(Y))),Z)),t(b))
> (I think).
>
> (R-1.9.1, 2GHz G5 PowerPC, MacOSX10.3.5)
>

Reasonably Related Threads

Search for more reasonably related threads

R help - Oct 2004 - crossprod vs %*% timing

[R] crossprod vs %*% timing

[R] crossprod vs %*% timing

[R] crossprod vs %*% timing

[R] crossprod vs %*% timing

Reasonably Related Threads