Dear subscribers, I have made a simulation using loops rather than apply, simply because the loop function seems more natural to me. However, the current simulation takes forever and I have decided - finally - to learn how to use apply, but - as many other people before me - I am having a hard time changing habits. My current problem is: My current code for the loop is: distances <- matrix(NA, 1000, 5) distancer <- function(x, y){-(abs(x-y))} x <- as.matrix(rnorm(1000, 5, 1.67)) y <- rnorm(5, 5, 1.67) for (v in 1:1000){ distances[v,] <- distancer(x[v,], y) } The goal is to calculate the distances between the preferences of each voter (X) and all parties (Y). This gives a 1000 by 5 matrix (distances). If I want to transform this to apply, what would be the best way to go? More specifically, I am not sure what to put into the X part of the apply function. Sorry, for asking this question that is already much debated, I just don't seem to be able to apply to my own case. Many thanks in advance. Kind regards, Gijs Schumacher [[alternative HTML version deleted]]
Hello, Simple question with reproducible example code. The best way to go is to know to what dimension you want to apply the function, the 1st, and to write the function in such a way as to have the passed rows as the first argument. If it has other arguments, they go after. Since your function is already written like this, there's little left to be done. d2 <- apply(x, 1, distancer, y) # note the other arg. dim(d2) all.equal(distances, t(d2)) Why this transpose? Because apply is passing _row_vectors_ and the function's return values are vectors, in R, _columns_. Were it applying the function to columns and the return value would be with the right dims. Like this just assign d2 <- t(d2) # or d2 <- t(apply(...etc...)) Hope this helps, Rui Barradas Em 15-06-2012 12:27, Schumacher, G. escreveu:> Dear subscribers, > > I have made a simulation using loops rather than apply, simply because the loop function seems more natural to me. However, the current simulation takes forever and I have decided - finally - to learn how to use apply, but - as many other people before me - I am having a hard time changing habits. My current problem is: > > My current code for the loop is: > distances <- matrix(NA, 1000, 5) > distancer <- function(x, y){-(abs(x-y))} > x <- as.matrix(rnorm(1000, 5, 1.67)) > y <- rnorm(5, 5, 1.67) > > for (v in 1:1000){ > distances[v,] <- distancer(x[v,], y) > } > > The goal is to calculate the distances between the preferences of each voter (X) and all parties (Y). This gives a 1000 by 5 matrix (distances). > > If I want to transform this to apply, what would be the best way to go? More specifically, I am not sure what to put into the X part of the apply function. > > Sorry, for asking this question that is already much debated, I just don't seem to be able to apply to my own case. Many thanks in advance. > > Kind regards, > > Gijs Schumacher > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
On Fri, Jun 15, 2012 at 11:27:44AM +0000, Schumacher, G. wrote:> Dear subscribers, > > I have made a simulation using loops rather than apply, simply because the loop function seems more natural to me. However, the current simulation takes forever and I have decided - finally - to learn how to use apply, but - as many other people before me - I am having a hard time changing habits. My current problem is: > > My current code for the loop is: > distances <- matrix(NA, 1000, 5) > distancer <- function(x, y){-(abs(x-y))} > x <- as.matrix(rnorm(1000, 5, 1.67)) > y <- rnorm(5, 5, 1.67) > > for (v in 1:1000){ > distances[v,] <- distancer(x[v,], y) > } > > The goal is to calculate the distances between the preferences of each voter (X) and all parties (Y). This gives a 1000 by 5 matrix (distances). > > If I want to transform this to apply, what would be the best way to go? More specifically, I am not sure what to put into the X part of the apply function.Hi. There are also other ways to eliminate loops than apply-type functions. Try, for example distances <- matrix(NA, 1000, 5) distancer <- function(x, y){-(abs(x-y))} x <- as.matrix(rnorm(1000, 5, 1.67)) y <- rnorm(5, 5, 1.67) for (v in 1:1000){ distances[v,] <- distancer(x[v,], y) } dst1 <- outer(c(x), y, FUN=distancer) identical(distances, dst1) [1] TRUE xm <- matrix(x, nrow=nrow(x), ncol=length(y)) ym <- matrix(y, nrow=nrow(x), ncol=length(y), byrow=TRUE) dst2 <- -abs(xm - ym) identical(distances, dst2) [1] TRUE Apply uses a loop internally, so it need not significantly improve efficiency. Petr Savicky.
On Fri, Jun 15, 2012 at 02:08:13PM +0200, Petr Savicky wrote:> On Fri, Jun 15, 2012 at 11:27:44AM +0000, Schumacher, G. wrote: > > Dear subscribers, > > > > I have made a simulation using loops rather than apply, simply because the loop function seems more natural to me. However, the current simulation takes forever and I have decided - finally - to learn how to use apply, but - as many other people before me - I am having a hard time changing habits. My current problem is: > > > > My current code for the loop is: > > distances <- matrix(NA, 1000, 5) > > distancer <- function(x, y){-(abs(x-y))} > > x <- as.matrix(rnorm(1000, 5, 1.67)) > > y <- rnorm(5, 5, 1.67) > > > > for (v in 1:1000){ > > distances[v,] <- distancer(x[v,], y) > > } > > > > The goal is to calculate the distances between the preferences of each voter (X) and all parties (Y). This gives a 1000 by 5 matrix (distances). > > > > If I want to transform this to apply, what would be the best way to go? More specifically, I am not sure what to put into the X part of the apply function. >[...]> > Apply uses a loop internally, so it need not significantly improve efficiency.Hi. Let me include an example of the situation, when a vectorized code is more efficient than a code using apply(), although the apply() code is significantly simpler. n <- 10000 x <- matrix(runif(3*n), nrow=n, ncol=3) t1 <- system.time( { y <- t(apply(x, 1, sort)) } ) # use a vectorized bubble sort, which is efficient on 3 elements t2 <- system.time( { x[, 1:2] <- cbind(pmin(x[, 1], x[, 2]), pmax(x[, 1], x[, 2])) x[, 2:3] <- cbind(pmin(x[, 2], x[, 3]), pmax(x[, 2], x[, 3])) x[, 1:2] <- cbind(pmin(x[, 1], x[, 2]), pmax(x[, 1], x[, 2])) } ) print(identical(x, y)) [1] TRUE print(rbind(t1, t2)) user.self sys.self elapsed user.child sys.child t1 0.478 0 0.477 0 0 t2 0.004 0 0.004 0 0 Hope this helps. Petr Savicky.