Can you check if the following gives you what you want?
tmp <- rbind( A, B )
dis <- dist( tmp )
nA <- nrow(A)
nB <- nrow(B)
dis[ 1:nA, nA + 1:nB ] ## output
If it works, this suggestion comes with the caveat that it might be
computationally inefficient compared with using for() loops for very
large values of (a,b) or highly discordant values of (a,b). However I am
hoping the gain from dist() being coded in C should offset it.
Try experimenting to find the optimal speed etc. Also have a look at
mapply() examples to see if they are useful.
Regards, Adai
Prasenjit Kapat wrote:> Hi,
>
> I have two matrices, A (axd) and B (bxd). I want to get another matrix C
(axb)
> such that, C[i,j] is the Euclidean distance between the ith row of A and
jth
> row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]).
> What is the best method for doing so? (assume a < b)
>
> I have been doing some exploration myself: Consider the following function:
> get.f, in which, 'method=1' is the rudimentary double for loop;
'method=2'
> avoids one loop by constructing a bigger matrix, but doesn't use
> apply(); 'method=3' avoids both the loops by using apply() and
constructing
> bigger matrices; 'method=4' avoids constructing bigger matrices by
using
> apply() twice.
>
> get.f <- function (A, B, method=2) {
> if (method == 1){
> a <- nrow(A); b <- nrow(B);
> C <- matrix(NA, nrow=a, ncol=b);
> for (i in 1:a)
> for (j in 1:b)
> C[i,j] <- sum((A[i,]-B[j,])^2)
> } else if (method == 2 ) {
> a <- nrow(A); b <- nrow(B); d <- ncol(A);
> C <- matrix(NA, nrow=a, ncol=b);
> for (i in 1:a)
> C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)
> } else if (method == 3) {
> C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose
is needed
> } else if (method == 4) {
> C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B))
> }
> }
>
> FUN1 <- function(aa, BB)
> return(rowSums(
> (matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
> )
>
> FUN2 <- function(aa, BB)
> return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))
>
> FUN3 <- function(bb,aa) return(sum((aa-bb)^2))
>
> ### With these methods and the following intitializations,
>
> a <- 100; b <- 1000; d <- 100; n.loop <- 20;
>
> A <- matrix(rnorm(a*d), ncol=d)
> B <- matrix(rnorm(b*d), ncol=d)
>
> all.times <- matrix(0,nrow=5,ncol=4)
> rownames(all.times) <- rownames(as.matrix(system.time(NULL)))
>
> for (i in 1:4)
> for (j in 1:n.loop)
> all.times[,i] <- all.times[,i] +
> as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))
>
> all.times <- all.times / n.loop
> print(all.times)
>
> [,1] [,2] [,3] [,4]
> user.self 4.0554 1.50010 1.50130 4.51285
> sys.self 0.0370 0.02420 0.01800 0.04260
> elapsed 4.2705 1.58865 1.59475 6.07535
> user.child 0.0000 0.00000 0.00000 0.00000
> sys.child 0.0000 0.00000 0.00000 0.00000
>
> 'method=2' stands out be the best and 'method=1' (for
loops) beats 'method=4'
> (two apply()s)... Is that expected?
>
> Is it possible to improve over 'method=2'?
>
> Thanks
> PK
>
> PS: The mail text seems fine in my composer, I hope, it looks decent in
your
> reader.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>