robertfeldt
2012-Feb-24 16:59 UTC
[R] Speeding up "accumulation" code in large matrix calc?
Hi,
I have R code like so:
num.columns.back.since.last.occurence <- function(m, outcome) {
nrows <- dim(m)[1];
ncols <- dim(m)[2];
res <- matrix(rep.int(0, nrows*ncols), nrow=nrows);
for(row in 1:nrows) {
for(col in 2:ncols) {
res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]}
}
}
res;
}
but on the very large matrices I apply this the execution times are a
problem. I would appreciate any help to rewrite this with more
"standard"/native R functions to speed things up.
--
View this message in context:
http://r.789695.n4.nabble.com/Speeding-up-accumulation-code-in-large-matrix-calc-tp4417911p4417911.html
Sent from the R help mailing list archive at Nabble.com.
Petr Savicky
2012-Feb-24 18:50 UTC
[R] Speeding up "accumulation" code in large matrix calc?
On Fri, Feb 24, 2012 at 08:59:44AM -0800, robertfeldt wrote:> Hi, > > I have R code like so: > > num.columns.back.since.last.occurence <- function(m, outcome) { > nrows <- dim(m)[1]; > ncols <- dim(m)[2]; > res <- matrix(rep.int(0, nrows*ncols), nrow=nrows); > for(row in 1:nrows) { > for(col in 2:ncols) { > res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]} > } > } > res; > } > > but on the very large matrices I apply this the execution times are a > problem. I would appreciate any help to rewrite this with more > "standard"/native R functions to speed things up.Hi. If the number of columns is large, so the rows are long, then the following can be more efficient. oneRow <- function(x, outcome) { n <- length(x) y <- c(0, cumsum(x[-n] == outcome)) ave(x, y, FUN = function(z) seq.int(along=z) - 1) } # random matrix A <- matrix((runif(49) < 0.2) + 0, nrow=7) # the required transformation B <- t(apply(A, 1, oneRow, outcome=1)) # verify all(num.columns.back.since.last.occurence(A, 1) == B) [1] TRUE This solution performs a loop over rows (in apply), so if the number of rows is large and the number of columns is not, then a solution, which uses a loop over columns, may be better. Hope this helps. Petr Savicky.
Petr Savicky
2012-Feb-24 19:02 UTC
[R] Speeding up "accumulation" code in large matrix calc?
On Fri, Feb 24, 2012 at 08:59:44AM -0800, robertfeldt wrote:> Hi, > > I have R code like so: > > num.columns.back.since.last.occurence <- function(m, outcome) { > nrows <- dim(m)[1]; > ncols <- dim(m)[2]; > res <- matrix(rep.int(0, nrows*ncols), nrow=nrows); > for(row in 1:nrows) { > for(col in 2:ncols) { > res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]} > } > } > res; > } > > but on the very large matrices I apply this the execution times are a > problem. I would appreciate any help to rewrite this with more > "standard"/native R functions to speed things up.Hi. If the number of rows is large and the number of columns is not, then try the following. # random matrix A <- matrix((runif(49) < 0.2) + 0, nrow=7) outcome <- 1 # transformation B <- array(0, dim=dim(A)) curr <- B[, 1] for (i in seq.int(from=2, length=ncol(A)-1)) { curr <- ifelse (A[, i-1] == outcome, 0, 1 + curr) B[, i] <- curr } # verify all(num.columns.back.since.last.occurence(A, 1) == B) [1] TRUE Hope this helps. Petr Savicky.
robertfeldt
2012-Feb-25 04:47 UTC
[R] Speeding up "accumulation" code in large matrix calc?
Wow! Thanks to both Petr and Berend for this extensive help.
I learned a lot not only about this specific case but about R in general
from studying your answers.
The compiled version t4 seems to give the most consistently quickest results
and for my case (about 6000 rows and 500 columns with a probability of the
sought for outcome 0.04) I see speedups from my original of 30-40 times. See
below for details.
Excellent help thank you!
# t1-t4 as above in thread and then compiled to t1.c-t4.c ...
random_matrix <- function(nrows, ncols, probabilityOfOne) {
matrix((runif(nrows*ncols)<probabilityOfOne)+0, nrow=nrows);
}
library(benchmark)
compare.exec.times <- function(A) {
benchmark(t1(A,outcome=1),
t2(A,outcome=1),
t3(A,outcome=1),
t4(A,outcome=1),
t1.c(A,outcome=1),
t2.c(A,outcome=1),
t3.c(A,outcome=1),
t4.c(A,outcome=1),
columns=c("test", "user.self",
"relative"),
replications=3)
}
compare.exec.times(random_matrix(100, 1000, 0.10)) # t4.c quickest, 47 times
speedup
compare.exec.times(random_matrix(1000, 100, 0.10)) # t4.c quickest, 25 times
speedup
compare.exec.times(random_matrix(1000, 1000, 0.10)) # t4.c quickest, 37
times speedup
# Most realistic for my data:
compare.exec.times(random_matrix(6000, 400, 0.04)) # t4.c quickest, 30
times speedup
test user.self relative
1 t1(A, outcome = 1) 35.372 30.145038
5 t1.c(A, outcome = 1) 8.591 7.329092
2 t2(A, outcome = 1) 14.598 12.761662
6 t2.c(A, outcome = 1) 14.413 12.587786
3 t3(A, outcome = 1) 1.579 1.743851
7 t3.c(A, outcome = 1) 1.608 1.818490
4 t4(A, outcome = 1) 1.092 1.120441
8 t4.c(A, outcome = 1) 0.894 1.000000
--
View this message in context:
http://r.789695.n4.nabble.com/Speeding-up-accumulation-code-in-large-matrix-calc-tp4417911p4419422.html
Sent from the R help mailing list archive at Nabble.com.