Let x be the input vector and cx be the cumulative running sum of it.
Then seq_along(cx) - match(cx, cx) gives increasing sequences
starting at 0 and for those after the leading zeros we start them
at 1 by adding cummax(x).
x <- c(0, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0) # input
cx <- cumsum(x)
seq_along(cx) - match(cx, cx) + cummax(x)
On Nov 20, 2007 6:42 PM, Tom Johnson <tjohnson at covad.net>
wrote:> Hi,
>
> I cannot find a 'vectorized' solution to this 'for loop'
kind of problem.
> Do you see a vectorized, fast-running solution?
>
> Objective:
> Take the value of X at each timepoint and calculate the corresponding value
> of Y. Leading 0's and all 1's for X are assigned to Y; otherwise Y
is
> incremented by the number of 0's adjacent to the last 1. The frequency
and
> distribution of X vary widely and may have ~100 repeated 0's or 1's
in a
> vector of 10k timepoints.
>
> Example:
> time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> X 0 1 0 1 0 1 0 0 1 1 1 0 0 0 . .
> Y 0 1 2 1 2 1 2 3 1 1 1 2 3 4 . .
>
> What I have done:
> My for() and apply()-related standard solutions are too slow. They are 6
> times slower than my prototype, vectorized code which uses cumsum().
> However(!)... my results are inaccurate and I can't correct them
without
> introducing a for()! Here is my shot at a vectorized solution, as far as I
> can take it.
>
> Preliminary Vectorized Code:
> X <- matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20,
byrow=TRUE)
> colnames(X) <- c(paste("a", 1:20, sep=""))
> noX <- X; noX[X!=0] <- 0; cumX <- noX; cumNoX <- noX; Y1 <-
noX; Y2 <- X; Y3
> <- X
>
> for (e in 1:ncol(X)) {
> cumX[,e] <- cumsum(X[,e])
> noX[X[,e] < 1 & cumsum(X[,e]) > 0 ,e] <- 1
> cumNoX[,e] <- cumsum(noX[,e])
> }
> Y1[cumNoX > 0] <- cumNoX[cumNoX > 0] + 1
> Y2[X == 0 & noX > 0] <- Y1[X == 0 & noX > 0]
> Y3 <- Y2
> Y3[cumX > 1 & noX > 0] <- Y2[cumX > 1 & noX > 0] -
cumX[cumX > 1 & noX > 0]
> X; Y3
>
> Your help would be greatly appreciated! I'm stuck.
> Thank you,
>
> Tom
> Johnson