Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example:> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T),b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))> data b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, "b" is sorted and denotes the location for each number in "a". So I would like to find the starting & ending locations for each stretch of 1s within "a", also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be: stretch start end No.of.1s 1 13 20 4 2 34 46 2 3 49 77 4 4 97 97 1 I can imagine using for loops can do the job, but I feel it's not a clever way to do this. Is there an efficient algorithm that can do this fast? Thanks for any suggestions. John [[alternative HTML version deleted]]
try this: ?rle Carl ****** From: array chip <arrayprofile_at_yahoo.com> Date: Thu, 06 Jan 2011 14:16:38 -0800 (PST) Hi, I am seeking help on designing an algorithm to identify the locations of stretches of 1s in a vector of 0s and 1s. Below is an simple example: > dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) > dat a b 1 0 4 2 0 12 3 1 13 4 1 16 5 1 18 6 1 20 7 0 28 8 0 30 9 1 34 10 1 46 11 0 47 12 1 49 13 1 61 14 1 73 15 1 77 16 0 84 17 0 87 18 0 90 19 0 95 20 1 97 In this dataset, "b" is sorted and denotes the location for each number in "a". So I would like to find the starting & ending locations for each stretch of 1s within "a", also counting the number of 1s in each stretch as well. Hope the results from the algorithm would be:
On 06-Jan-11 22:16:38, array chip wrote:> Hi, I am seeking help on designing an algorithm to identify the > locations of stretches of 1s in a vector of 0s and 1s. Below is > an simple example: > >> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T) > ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97))) > >> dat > a b > 1 0 4 > 2 0 12 > 3 1 13 > 4 1 16 > 5 1 18 > 6 1 20 > 7 0 28 > 8 0 30 > 9 1 34 > 10 1 46 > 11 0 47 > 12 1 49 > 13 1 61 > 14 1 73 > 15 1 77 > 16 0 84 > 17 0 87 > 18 0 90 > 19 0 95 > 20 1 97 > > In this dataset, "b" is sorted and denotes the location for each > number in "a". > So I would like to find the starting & ending locations for each > stretch of 1s within "a", also counting the number of 1s in each > stretch as well. > Hope the results from the algorithm would be: > > stretch start end No.of.1s > 1 13 20 4 > 2 34 46 2 > 3 49 77 4 > 4 97 97 1 > > I can imagine using for loops can do the job, but I feel it's not a > clever way to do this. Is there an efficient algorithm that can do > this fast? > > Thanks for any suggestions. > JohnThe basic information you need can be got using rle() ("run length encoding"). See '?rle'. In your example: rle(dat$a) # Run Length Encoding # lengths: int [1:8] 2 4 2 2 1 4 4 1 # values : num [1:8] 0 1 0 1 0 1 0 1 ## Note: F -> 0, T -> 1 The following has a somewhat twisted logic at the end, and may be flawed, but you can probably adapt it! L <- rle(dat$a)$lengths V <- rle(dat$a)$values pos <- c(1,cumsum(L)) V1 <- c(-1,V) 1+pos[V1==0] # [1] 3 9 12 20 ## Positions in the series dat$a where each run of "T" (i.e. 1) ## starts Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <ted.harding at wlandres.net> Fax-to-email: +44 (0)870 094 0861 Date: 06-Jan-11 Time: 22:57:44 ------------------------------ XFMail ------------------------------
Seemingly Similar Threads
- Calculating lengths of runs of 0 or 1 sequences in meteorological data
- bug in na.contiguous? Doesn't give the first tied stretch if it is at the start
- Another newbie question
- how can I convert a long to wide matrix?
- help needed to find zero areas in a vector