Dear all I need to find a length of true sequences in logical vector (see example 1). I found a possible solution which is good but if I use it on a larger data set I experience a substantial decrease in performance (example 2). Example 1 set.seed(111) x <- sample(c(T,F),50, replace=T) system.time(cetnost <- as.numeric(table(which(x)-cumsum(x[which(x)])))) [1] 0.00 0.00 0.03 NA NA cetnost [1] 1 3 2 5 1 4 1 1 1 3 1 1 2 Example 2 x<-sample(c(T,F),40321*51, replace=T) dd<-matrix(x,40321,51) system.time(cetnost <- lapply(dd,function(x) as.numeric(table(which(x)- cumsum(x[which(x)]))))) Timing stopped at: 750.63 1 775.6 NA NA Please give me any hint how to improve performance or advice a different (but more effective) solution. R 1.8.0, W2000, 512M memory, Pentium4 Thank you in advance. Petr Pikal petr.pikal at precheza.cz
On Fri, 14 Nov 2003, Petr Pikal wrote:> Dear all > > I need to find a length of true sequences in logical vector (see example 1). I found > a possible solution which is good but if I use it on a larger data set I experience a > substantial decrease in performance (example 2). > > Example 1 > set.seed(111) > x <- sample(c(T,F),50, replace=T) > system.time(cetnost <- as.numeric(table(which(x)-cumsum(x[which(x)])))) > [1] 0.00 0.00 0.03 NA NA > cetnost > [1] 1 3 2 5 1 4 1 1 1 3 1 1 2Have you looked at rle()?> rlex <- rle(x) > str(rlex)List of 2 $ lengths: int [1:27] 2 1 1 3 1 2 2 5 1 1 ... $ values : logi [1:27] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE ... - attr(*, "class")= chr "rle"> rlex$lengths[rlex$values][1] 1 3 2 5 1 4 1 1 1 3 1 1 2> cetnost[1] 1 3 2 5 1 4 1 1 1 3 1 1 2 rle() is interpreted too, like your solution, so I'm not sure how it will scale.> > Example 2 > x<-sample(c(T,F),40321*51, replace=T) > dd<-matrix(x,40321,51) > system.time(cetnost <- lapply(dd,function(x) as.numeric(table(which(x)- > cumsum(x[which(x)]))))) > Timing stopped at: 750.63 1 775.6 NA NA > > Please give me any hint how to improve performance or advice a different (but > more effective) solution. > > R 1.8.0, W2000, 512M memory, Pentium4 > > Thank you in advance. > > > > Petr Pikal > petr.pikal at precheza.cz > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Breiviksveien 40, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93 e-mail: Roger.Bivand at nhh.no
Hallo Thanks to all who responded (Patrick Burns, Roger Bivand and especially Peter Dalgaard). Do not know why I used lapply when apply made the task as well. My actuall data are in data frame (size as in example 2) and with apply rle() is about an order of magnitude quicker. Thanks again Petr On 14 Nov 2003 at 13:35, Petr Pikal wrote:> Dear all > > I need to find a length of true sequences in logical vector (see > example 1). I found a possible solution which is good but if I use it > on a larger data set I experience a substantial decrease in > performance (example 2). > > Example 1 > set.seed(111) > x <- sample(c(T,F),50, replace=T) > system.time(cetnost <- > as.numeric(table(which(x)-cumsum(x[which(x)])))) [1] 0.00 0.00 0.03 > NA NA cetnost [1] 1 3 2 5 1 4 1 1 1 3 1 1 2 > > Example 2 > x<-sample(c(T,F),40321*51, replace=T) > dd<-matrix(x,40321,51) > system.time(cetnost <- lapply(dd,function(x) > as.numeric(table(which(x)- cumsum(x[which(x)]))))) Timing stopped at: > 750.63 1 775.6 NA NA > > Please give me any hint how to improve performance or advice a > different (but more effective) solution. > > R 1.8.0, W2000, 512M memory, Pentium4 > > Thank you in advance. > > > > Petr Pikal > petr.pikal at precheza.cz > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-helpPetr Pikal petr.pikal at precheza.cz