thr3ads.net - R help - [R] Speeding up "accumulation" code in large matrix calc? [Feb 2012]

If this information is useful, please help other people find it:
Share via:

robertfeldt

2012-Feb-24 16:59 UTC

[R] Speeding up "accumulation" code in large matrix calc?

Hi,

I have R code like so:

num.columns.back.since.last.occurence <- function(m, outcome) {
	nrows <- dim(m)[1];
	ncols <- dim(m)[2];
	res <- matrix(rep.int(0, nrows*ncols), nrow=nrows);
	for(row in 1:nrows) {
		for(col in 2:ncols) {
			res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]}
		}
	}
	res;
}

but on the very large matrices I apply this the execution times are a
problem. I would appreciate any help to rewrite this with more
"standard"/native R functions to speed things up.

--
View this message in context:
http://r.789695.n4.nabble.com/Speeding-up-accumulation-code-in-large-matrix-calc-tp4417911p4417911.html
Sent from the R help mailing list archive at Nabble.com.

Petr Savicky

2012-Feb-24 18:50 UTC

head link

[R] Speeding up "accumulation" code in large matrix calc?

On Fri, Feb 24, 2012 at 08:59:44AM -0800, robertfeldt
wrote:> Hi,
> 
> I have R code like so:
> 
> num.columns.back.since.last.occurence <- function(m, outcome) {
> 	nrows <- dim(m)[1];
> 	ncols <- dim(m)[2];
> 	res <- matrix(rep.int(0, nrows*ncols), nrow=nrows);
> 	for(row in 1:nrows) {
> 		for(col in 2:ncols) {
> 			res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]}
> 		}
> 	}
> 	res;
> }
> 
> but on the very large matrices I apply this the execution times are a
> problem. I would appreciate any help to rewrite this with more
> "standard"/native R functions to speed things up.
Hi.

If the number of columns is large, so the rows are long, then
the following can be more efficient.

  oneRow <- function(x, outcome)
  {
      n <- length(x)
      y <- c(0, cumsum(x[-n] == outcome))
      ave(x, y, FUN = function(z) seq.int(along=z) - 1)
  }

  # random matrix 
  A <- matrix((runif(49) < 0.2) + 0, nrow=7)

  # the required transformation
  B <- t(apply(A, 1, oneRow, outcome=1))

  # verify
  all(num.columns.back.since.last.occurence(A, 1) == B)

  [1] TRUE

This solution performs a loop over rows (in apply), so if the
number of rows is large and the number of columns is not,
then a solution, which uses a loop over columns, may be
better.

Hope this helps.

Petr Savicky.

Petr Savicky

2012-Feb-24 19:02 UTC

head link

[R] Speeding up "accumulation" code in large matrix calc?

On Fri, Feb 24, 2012 at 08:59:44AM -0800, robertfeldt
wrote:> Hi,
> 
> I have R code like so:
> 
> num.columns.back.since.last.occurence <- function(m, outcome) {
> 	nrows <- dim(m)[1];
> 	ncols <- dim(m)[2];
> 	res <- matrix(rep.int(0, nrows*ncols), nrow=nrows);
> 	for(row in 1:nrows) {
> 		for(col in 2:ncols) {
> 			res[row,col] <- if(m[row,col-1]==outcome) {0} else {1+res[row,col-1]}
> 		}
> 	}
> 	res;
> }
> 
> but on the very large matrices I apply this the execution times are a
> problem. I would appreciate any help to rewrite this with more
> "standard"/native R functions to speed things up.
Hi.

If the number of rows is large and the number of columns is not,
then try the following.

  # random matrix
  A <- matrix((runif(49) < 0.2) + 0, nrow=7)
  outcome <- 1

  # transformation
  B <- array(0, dim=dim(A))
  curr <- B[, 1]
  for (i in seq.int(from=2, length=ncol(A)-1)) {
      curr <- ifelse (A[, i-1] == outcome, 0, 1 + curr)
      B[, i] <- curr
  }

  # verify
  all(num.columns.back.since.last.occurence(A, 1) == B)

  [1] TRUE

Hope this helps.

Petr Savicky.

robertfeldt

2012-Feb-25 04:47 UTC

head link

[R] Speeding up "accumulation" code in large matrix calc?

Wow! Thanks to both Petr and Berend for this extensive help. 

I learned a lot not only about this specific case but about R in general
from studying your answers. 

The compiled version t4 seems to give the most consistently quickest results
and for my case (about 6000 rows and 500 columns with a probability of the
sought for outcome 0.04) I see speedups from my original of 30-40 times. See
below for details.

Excellent help thank you!

# t1-t4 as above in thread and then compiled to t1.c-t4.c ...
random_matrix <- function(nrows, ncols, probabilityOfOne) {
	matrix((runif(nrows*ncols)<probabilityOfOne)+0, nrow=nrows);
}

library(benchmark)
compare.exec.times <- function(A) {
	benchmark(t1(A,outcome=1), 
	            t2(A,outcome=1), 
	            t3(A,outcome=1), 
		    t4(A,outcome=1), 
	            t1.c(A,outcome=1), 
	            t2.c(A,outcome=1), 
	            t3.c(A,outcome=1), 
	            t4.c(A,outcome=1), 
	            columns=c("test", "user.self",
"relative"),
	            replications=3)
}

compare.exec.times(random_matrix(100, 1000, 0.10)) # t4.c quickest, 47 times
speedup
compare.exec.times(random_matrix(1000, 100, 0.10)) # t4.c quickest, 25 times
speedup
compare.exec.times(random_matrix(1000, 1000, 0.10)) # t4.c quickest, 37
times speedup
# Most realistic for my data:
compare.exec.times(random_matrix(6000, 400, 0.04)) # t4.c quickest,  30
times speedup
                  test user.self  relative
1   t1(A, outcome = 1)    35.372 30.145038
5 t1.c(A, outcome = 1)     8.591  7.329092
2   t2(A, outcome = 1)    14.598 12.761662
6 t2.c(A, outcome = 1)    14.413 12.587786
3   t3(A, outcome = 1)     1.579  1.743851
7 t3.c(A, outcome = 1)     1.608  1.818490
4   t4(A, outcome = 1)     1.092  1.120441
8 t4.c(A, outcome = 1)     0.894  1.000000

--
View this message in context:
http://r.789695.n4.nabble.com/Speeding-up-accumulation-code-in-large-matrix-calc-tp4417911p4419422.html
Sent from the R help mailing list archive at Nabble.com.

R help - Feb 2012 - Speeding up "accumulation" code in large matrix calc?

[R] Speeding up "accumulation" code in large matrix calc?

[R] Speeding up "accumulation" code in large matrix calc?

[R] Speeding up "accumulation" code in large matrix calc?

[R] Speeding up "accumulation" code in large matrix calc?