thr3ads.net - R help - [R] use loop or use apply? [May 2007]

If this information is useful, please help other people find it:
Share via:

Prasenjit Kapat

2007-May-17 23:56 UTC

[R] use loop or use apply?

Hi,

I have two matrices, A (axd) and B (bxd). I want to get another matrix C (axb) 
such that, C[i,j] is the Euclidean distance between the ith row of A and jth 
row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]). 
What is the best method for doing so? (assume a < b)

I have been doing some exploration myself: Consider the following function: 
get.f, in which, 'method=1' is the rudimentary double for loop;
'method=2'
avoids one loop by constructing a bigger matrix, but doesn't use 
apply(); 'method=3' avoids both the loops by using apply() and
constructing
bigger matrices; 'method=4' avoids constructing bigger matrices by using
apply() twice.

get.f <- function (A, B, method=2) {
	if (method == 1){
		a <- nrow(A); b <- nrow(B);
		C <- matrix(NA, nrow=a, ncol=b);
		for (i in 1:a) 
			for (j in 1:b) 
				C[i,j] <- sum((A[i,]-B[j,])^2)
	} else if (method == 2 ) {
		a <- nrow(A); b <- nrow(B); d <- ncol(A);
		C <- matrix(NA, nrow=a, ncol=b);
		for (i in 1:a) 
			C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)
	} else if (method == 3) {
			C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose is
needed
	} else if (method == 4) {
			C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B))
	}
}

FUN1 <- function(aa, BB)
  return(rowSums(
		(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
  )

FUN2 <- function(aa, BB)
	return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))

FUN3 <- function(bb,aa) return(sum((aa-bb)^2))

### With these methods and the following intitializations,

a <- 100; b <- 1000; d <- 100; n.loop <- 20;

A <- matrix(rnorm(a*d), ncol=d)
B <- matrix(rnorm(b*d), ncol=d)

all.times <- matrix(0,nrow=5,ncol=4)
rownames(all.times) <- rownames(as.matrix(system.time(NULL)))

for (i in 1:4)  
	for (j in 1:n.loop)
		all.times[,i] <- all.times[,i] + 
				as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))

all.times <- all.times / n.loop
print(all.times)

               [,1]    [,2]    [,3]    [,4]
user.self   4.0554 1.50010 1.50130 4.51285
sys.self     0.0370 0.02420 0.01800 0.04260
elapsed    4.2705 1.58865 1.59475 6.07535
user.child 0.0000 0.00000 0.00000 0.00000
sys.child   0.0000 0.00000 0.00000 0.00000

'method=2' stands out be the best and 'method=1' (for loops)
beats 'method=4'
(two apply()s)... Is that expected?

Is it possible to improve over 'method=2'?

Thanks
PK

PS: The mail text seems fine in my composer, I hope, it looks decent in your 
reader.

Adaikalavan Ramasamy

2007-May-18 02:28 UTC

head link

[R] use loop or use apply?

Can you check if the following gives you what you want?

    tmp <- rbind( A, B )
    dis <- dist( tmp )
    nA  <- nrow(A)
    nB  <- nrow(B)
    dis[ 1:nA, nA + 1:nB ] ## output

If it works, this suggestion comes with the caveat that it might be 
computationally inefficient compared with using for() loops for very 
large values of (a,b) or highly discordant values of (a,b). However I am 
hoping the gain from dist() being coded in C should offset it.

Try experimenting to find the optimal speed etc. Also have a look at 
mapply() examples to see if they are useful.

Regards, Adai



Prasenjit Kapat wrote:> Hi,
> 
> I have two matrices, A (axd) and B (bxd). I want to get another matrix C
(axb)
> such that, C[i,j] is the Euclidean distance between the ith row of A and
jth
> row of B. In general, I can say that C[i,j] = some.function (A[i,], B[j,]).
> What is the best method for doing so? (assume a < b)
> 
> I have been doing some exploration myself: Consider the following function:
> get.f, in which, 'method=1' is the rudimentary double for loop;
'method=2'
> avoids one loop by constructing a bigger matrix, but doesn't use 
> apply(); 'method=3' avoids both the loops by using apply() and
constructing
> bigger matrices; 'method=4' avoids constructing bigger matrices by
using
> apply() twice.
> 
> get.f <- function (A, B, method=2) {
> 	if (method == 1){
> 		a <- nrow(A); b <- nrow(B);
> 		C <- matrix(NA, nrow=a, ncol=b);
> 		for (i in 1:a) 
> 			for (j in 1:b) 
> 				C[i,j] <- sum((A[i,]-B[j,])^2)
> 	} else if (method == 2 ) {
> 		a <- nrow(A); b <- nrow(B); d <- ncol(A);
> 		C <- matrix(NA, nrow=a, ncol=b);
> 		for (i in 1:a) 
> 			C[i,] <- rowSums((matrix(A[i,], nrow=b, ncol=d, byrow=TRUE) - B) ^ 2)
> 	} else if (method == 3) {
> 			C <- t(apply(A, MARGIN=1, FUN="FUN1", BB=B)); # transpose
is needed
> 	} else if (method == 4) {
> 			C <- t(apply(A, MARGIN=1, FUN="FUN2", BB=B))
> 	}
> }
> 
> FUN1 <- function(aa, BB)
>   return(rowSums(
> 		(matrix(aa, nrow=nrow(BB), ncol=ncol(BB), byrow=TRUE) - BB)^2)
>   )
> 
> FUN2 <- function(aa, BB)
> 	return(apply(BB, MARGIN=1, FUN="FUN3", aa=aa))
> 
> FUN3 <- function(bb,aa) return(sum((aa-bb)^2))
> 
> ### With these methods and the following intitializations,
> 
> a <- 100; b <- 1000; d <- 100; n.loop <- 20;
> 
> A <- matrix(rnorm(a*d), ncol=d)
> B <- matrix(rnorm(b*d), ncol=d)
> 
> all.times <- matrix(0,nrow=5,ncol=4)
> rownames(all.times) <- rownames(as.matrix(system.time(NULL)))
> 
> for (i in 1:4)  
> 	for (j in 1:n.loop)
> 		all.times[,i] <- all.times[,i] + 
> 				as.matrix(system.time(C <- get.f(A=A, B=B, method=i)))
> 
> all.times <- all.times / n.loop
> print(all.times)
> 
>                [,1]    [,2]    [,3]    [,4]
> user.self   4.0554 1.50010 1.50130 4.51285
> sys.self     0.0370 0.02420 0.01800 0.04260
> elapsed    4.2705 1.58865 1.59475 6.07535
> user.child 0.0000 0.00000 0.00000 0.00000
> sys.child   0.0000 0.00000 0.00000 0.00000
> 
> 'method=2' stands out be the best and 'method=1' (for
loops) beats 'method=4'
> (two apply()s)... Is that expected?
> 
> Is it possible to improve over 'method=2'?
> 
> Thanks
> PK
> 
> PS: The mail text seems fine in my composer, I hope, it looks decent in
your
> reader.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
>

Reasonably Related Threads

Search for more maybe matching threads

R help - May 2007 - use loop or use apply?

[R] use loop or use apply?

[R] use loop or use apply?

Reasonably Related Threads