thr3ads.net - R help - [R] Vectorization/Speed Problem [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Tom Johnson

2007-Nov-20 23:42 UTC

[R] Vectorization/Speed Problem

Hi,

I cannot find a 'vectorized' solution to this 'for loop' kind of
problem.
Do you see a vectorized, fast-running solution?

Objective:
Take the value of X at each timepoint and calculate the corresponding value
of Y.  Leading 0's and all 1's for X are assigned to Y; otherwise Y is
incremented by the number of 0's adjacent to the last 1.  The frequency and
distribution of X vary widely and may have ~100 repeated 0's or 1's in a
vector of 10k timepoints.

Example:
time 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
X    0   1   0   1   0   1   0   0   1   1   1   0   0   0   . .
Y    0   1   2   1   2   1   2   3   1   1   1   2   3   4   . .

What I have done:
My for() and apply()-related standard solutions are too slow.  They are 6
times slower than my prototype, vectorized code which uses cumsum().
However(!)... my results are inaccurate and I can't correct them without
introducing a for()!  Here is my shot at a vectorized solution, as far as I
can take it.

Preliminary Vectorized Code:
X	<- matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20, byrow=TRUE)
	colnames(X) <- c(paste("a", 1:20, sep=""))
noX <- X; noX[X!=0] <- 0; cumX <- noX; cumNoX <- noX; Y1 <- noX;
Y2 <- X; Y3
<- X

for (e in 1:ncol(X)) {
	cumX[,e] <- cumsum(X[,e])
	noX[X[,e] < 1 & cumsum(X[,e]) > 0 ,e] <- 1
	cumNoX[,e] <- cumsum(noX[,e])
	}
Y1[cumNoX > 0] <- cumNoX[cumNoX > 0] + 1
Y2[X == 0 & noX > 0] <- Y1[X == 0 & noX > 0]
Y3 <- Y2
Y3[cumX > 1 & noX > 0] <- Y2[cumX > 1 & noX > 0] -
cumX[cumX > 1 & noX > 0]
X; Y3

Your help would be greatly appreciated!  I'm stuck.
Thank you,

Tom
Johnson

Gabor Grothendieck

2007-Nov-21 01:02 UTC

head link

[R] Vectorization/Speed Problem

Let x be the input vector and cx be the cumulative running sum of it.
Then seq_along(cx) - match(cx, cx) gives increasing sequences
starting at 0 and for those after the leading zeros we start them
at 1 by adding cummax(x).

x <- c(0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0) # input

cx <- cumsum(x)
seq_along(cx) - match(cx, cx) + cummax(x)

On Nov 20, 2007 6:42 PM, Tom Johnson <tjohnson at covad.net>
wrote:> Hi,
>
> I cannot find a 'vectorized' solution to this 'for loop'
kind of problem.
> Do you see a vectorized, fast-running solution?
>
> Objective:
> Take the value of X at each timepoint and calculate the corresponding value
> of Y.  Leading 0's and all 1's for X are assigned to Y; otherwise Y
is
> incremented by the number of 0's adjacent to the last 1.  The frequency
and
> distribution of X vary widely and may have ~100 repeated 0's or 1's
in a
> vector of 10k timepoints.
>
> Example:
> time 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
> X    0   1   0   1   0   1   0   0   1   1   1   0   0   0   . .
> Y    0   1   2   1   2   1   2   3   1   1   1   2   3   4   . .
>
> What I have done:
> My for() and apply()-related standard solutions are too slow.  They are 6
> times slower than my prototype, vectorized code which uses cumsum().
> However(!)... my results are inaccurate and I can't correct them
without
> introducing a for()!  Here is my shot at a vectorized solution, as far as I
> can take it.
>
> Preliminary Vectorized Code:
> X       <- matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20,
byrow=TRUE)
>        colnames(X) <- c(paste("a", 1:20, sep=""))
> noX <- X; noX[X!=0] <- 0; cumX <- noX; cumNoX <- noX; Y1 <-
noX; Y2 <- X; Y3
> <- X
>
> for (e in 1:ncol(X)) {
>        cumX[,e] <- cumsum(X[,e])
>        noX[X[,e] < 1 & cumsum(X[,e]) > 0 ,e] <- 1
>        cumNoX[,e] <- cumsum(noX[,e])
>        }
> Y1[cumNoX > 0] <- cumNoX[cumNoX > 0] + 1
> Y2[X == 0 & noX > 0] <- Y1[X == 0 & noX > 0]
> Y3 <- Y2
> Y3[cumX > 1 & noX > 0] <- Y2[cumX > 1 & noX > 0] -
cumX[cumX > 1 & noX > 0]
> X; Y3
>
> Your help would be greatly appreciated!  I'm stuck.
> Thank you,
>
> Tom
> Johnson

Apparently Analagous Threads

Search for more possibly parallel threads

R help - Nov 2007 - Vectorization/Speed Problem

[R] Vectorization/Speed Problem

[R] Vectorization/Speed Problem

Apparently Analagous Threads