robertfeldt
2012-Feb-24 16:59 UTC
[R] Speeding up "accumulation" code in large matrix calc?
Hi, I have R code like so: num.columns.back.since.last.occurence <- function(m, outcome) { nrows <- dim(m)[1]; ncols <- dim(m)[2]; res <- matrix(rep.int(0, nrows*ncols), nrow=nrows); for(row in 1:nrows) { for(col in 2:ncols) { res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]} } } res; } but on the very large matrices I apply this the execution times are a problem. I would appreciate any help to rewrite this with more "standard"/native R functions to speed things up. -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-accumulation-code-in-large-matrix-calc-tp4417911p4417911.html Sent from the R help mailing list archive at Nabble.com.
Petr Savicky
2012-Feb-24 18:50 UTC
[R] Speeding up "accumulation" code in large matrix calc?
On Fri, Feb 24, 2012 at 08:59:44AM -0800, robertfeldt wrote:> Hi, > > I have R code like so: > > num.columns.back.since.last.occurence <- function(m, outcome) { > nrows <- dim(m)[1]; > ncols <- dim(m)[2]; > res <- matrix(rep.int(0, nrows*ncols), nrow=nrows); > for(row in 1:nrows) { > for(col in 2:ncols) { > res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]} > } > } > res; > } > > but on the very large matrices I apply this the execution times are a > problem. I would appreciate any help to rewrite this with more > "standard"/native R functions to speed things up.Hi. If the number of columns is large, so the rows are long, then the following can be more efficient. oneRow <- function(x, outcome) { n <- length(x) y <- c(0, cumsum(x[-n] == outcome)) ave(x, y, FUN = function(z) seq.int(along=z) - 1) } # random matrix A <- matrix((runif(49) < 0.2) + 0, nrow=7) # the required transformation B <- t(apply(A, 1, oneRow, outcome=1)) # verify all(num.columns.back.since.last.occurence(A, 1) == B) [1] TRUE This solution performs a loop over rows (in apply), so if the number of rows is large and the number of columns is not, then a solution, which uses a loop over columns, may be better. Hope this helps. Petr Savicky.
Petr Savicky
2012-Feb-24 19:02 UTC
[R] Speeding up "accumulation" code in large matrix calc?
On Fri, Feb 24, 2012 at 08:59:44AM -0800, robertfeldt wrote:> Hi, > > I have R code like so: > > num.columns.back.since.last.occurence <- function(m, outcome) { > nrows <- dim(m)[1]; > ncols <- dim(m)[2]; > res <- matrix(rep.int(0, nrows*ncols), nrow=nrows); > for(row in 1:nrows) { > for(col in 2:ncols) { > res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]} > } > } > res; > } > > but on the very large matrices I apply this the execution times are a > problem. I would appreciate any help to rewrite this with more > "standard"/native R functions to speed things up.Hi. If the number of rows is large and the number of columns is not, then try the following. # random matrix A <- matrix((runif(49) < 0.2) + 0, nrow=7) outcome <- 1 # transformation B <- array(0, dim=dim(A)) curr <- B[, 1] for (i in seq.int(from=2, length=ncol(A)-1)) { curr <- ifelse (A[, i-1] == outcome, 0, 1 + curr) B[, i] <- curr } # verify all(num.columns.back.since.last.occurence(A, 1) == B) [1] TRUE Hope this helps. Petr Savicky.
robertfeldt
2012-Feb-25 04:47 UTC
[R] Speeding up "accumulation" code in large matrix calc?
Wow! Thanks to both Petr and Berend for this extensive help. I learned a lot not only about this specific case but about R in general from studying your answers. The compiled version t4 seems to give the most consistently quickest results and for my case (about 6000 rows and 500 columns with a probability of the sought for outcome 0.04) I see speedups from my original of 30-40 times. See below for details. Excellent help thank you! # t1-t4 as above in thread and then compiled to t1.c-t4.c ... random_matrix <- function(nrows, ncols, probabilityOfOne) { matrix((runif(nrows*ncols)<probabilityOfOne)+0, nrow=nrows); } library(benchmark) compare.exec.times <- function(A) { benchmark(t1(A,outcome=1), t2(A,outcome=1), t3(A,outcome=1), t4(A,outcome=1), t1.c(A,outcome=1), t2.c(A,outcome=1), t3.c(A,outcome=1), t4.c(A,outcome=1), columns=c("test", "user.self", "relative"), replications=3) } compare.exec.times(random_matrix(100, 1000, 0.10)) # t4.c quickest, 47 times speedup compare.exec.times(random_matrix(1000, 100, 0.10)) # t4.c quickest, 25 times speedup compare.exec.times(random_matrix(1000, 1000, 0.10)) # t4.c quickest, 37 times speedup # Most realistic for my data: compare.exec.times(random_matrix(6000, 400, 0.04)) # t4.c quickest, 30 times speedup test user.self relative 1 t1(A, outcome = 1) 35.372 30.145038 5 t1.c(A, outcome = 1) 8.591 7.329092 2 t2(A, outcome = 1) 14.598 12.761662 6 t2.c(A, outcome = 1) 14.413 12.587786 3 t3(A, outcome = 1) 1.579 1.743851 7 t3.c(A, outcome = 1) 1.608 1.818490 4 t4(A, outcome = 1) 1.092 1.120441 8 t4.c(A, outcome = 1) 0.894 1.000000 -- View this message in context: http://r.789695.n4.nabble.com/Speeding-up-accumulation-code-in-large-matrix-calc-tp4417911p4419422.html Sent from the R help mailing list archive at Nabble.com.