thr3ads.net - R help - [R] algorithm help [Jan 2011]

If this information is useful, please help other people find it:
Share via:

array chip

2011-Jan-06 22:16 UTC

[R] algorithm help

Hi, I am seeking help on designing an algorithm to identify the locations of 
stretches of 1s in a vector of 0s and 1s. Below is an simple example:
> dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)  ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
> dat   a  b
1  0  4
2  0 12
3  1 13
4  1 16
5  1 18
6  1 20
7  0 28
8  0 30
9  1 34
10 1 46
11 0 47
12 1 49
13 1 61
14 1 73
15 1 77
16 0 84
17 0 87
18 0 90
19 0 95
20 1 97

In this dataset, "b" is sorted and denotes the location for each
number in "a".
So I would like to find the starting & ending locations for each stretch of
1s
within "a", also counting the number of 1s in each stretch as well.
Hope the
results from the algorithm would be:

stretch   start   end   No.of.1s
1         13      20    4
2         34      46    2
3         49      77    4
4         97      97    1

I can imagine using for loops can do the job, but I feel it's not a clever
way
to do this. Is there an efficient algorithm that can do this fast?

Thanks for any suggestions.

John


      
	[[alternative HTML version deleted]]

Carl Witthoft

2011-Jan-06 22:41 UTC

head link

[R] algorithm help

try this:

?rle


Carl



******
From: array chip <arrayprofile_at_yahoo.com>
Date: Thu, 06 Jan 2011 14:16:38 -0800 (PST)


Hi, I am seeking help on designing an algorithm to identify the 
locations of stretches of 1s in a vector of 0s and 1s. Below is an 
simple example:

 > dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
   ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))

 > dat

    a b
1 0 4
2 0 12
3 1 13
4 1 16
5 1 18
6 1 20
7 0 28
8 0 30
9 1 34
10 1 46
11 0 47
12 1 49
13 1 61
14 1 73
15 1 77
16 0 84
17 0 87
18 0 90
19 0 95
20 1 97

In this dataset, "b" is sorted and denotes the location for each
number
in "a". So I would like to find the starting & ending locations
for each
stretch of 1s within "a", also counting the number of 1s in each
stretch
as well. Hope the results from the algorithm would be:

(Ted Harding)

2011-Jan-06 22:57 UTC

head link

[R] algorithm help

On 06-Jan-11 22:16:38, array chip wrote:> Hi, I am seeking help on designing an algorithm to identify the
> locations of stretches of 1s in a vector of 0s and 1s. Below is
> an simple example:
> 
>>
dat<-as.data.frame(cbind(a=c(F,F,T,T,T,T,F,F,T,T,F,T,T,T,T,F,F,F,F,T)
>   ,b=c(4,12,13,16,18,20,28,30,34,46,47,49,61,73,77,84,87,90,95,97)))
> 
>> dat
>    a  b
> 1  0  4
> 2  0 12
> 3  1 13
> 4  1 16
> 5  1 18
> 6  1 20
> 7  0 28
> 8  0 30
> 9  1 34
> 10 1 46
> 11 0 47
> 12 1 49
> 13 1 61
> 14 1 73
> 15 1 77
> 16 0 84
> 17 0 87
> 18 0 90
> 19 0 95
> 20 1 97
> 
> In this dataset, "b" is sorted and denotes the location for each
> number in "a". 
> So I would like to find the starting & ending locations for each
> stretch of 1s within "a", also counting the number of 1s in each
> stretch as well.
> Hope the results from the algorithm would be:
> 
> stretch   start   end   No.of.1s
> 1         13      20    4
> 2         34      46    2
> 3         49      77    4
> 4         97      97    1
> 
> I can imagine using for loops can do the job, but I feel it's not a
> clever way to do this. Is there an efficient algorithm that can do
> this fast?
> 
> Thanks for any suggestions.
> John
The basic information you need can be got using rle() ("run length
encoding"). See '?rle'. In your example:

  rle(dat$a)
  # Run Length Encoding
  #   lengths: int [1:8] 2 4 2 2 1 4 4 1
  #   values : num [1:8] 0 1 0 1 0 1 0 1
  ## Note: F -> 0, T -> 1

The following has a somewhat twisted logic at the end, and may
be flawed, but you can probably adapt it!

  L <- rle(dat$a)$lengths
  V <- rle(dat$a)$values
  pos <- c(1,cumsum(L))
  V1 <- c(-1,V)
  1+pos[V1==0]
  # [1]  3  9 12 20
  ## Positions in the series dat$a where each run of "T" (i.e. 1)
  ##   starts

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.harding at wlandres.net>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jan-11                                       Time: 22:57:44
------------------------------ XFMail ------------------------------

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2011 - algorithm help

[R] algorithm help

[R] algorithm help

[R] algorithm help

Maybe Matching Threads