Gregory Gentlemen
2010-Jan-27  17:31 UTC
[R] Function for describing segements in sequential data
Dear R-users,
Say that I have a sequence of zeroes and ones:
x <- c(1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0)
The sequences of ones represent segments and I want to report the starting and
endpoints of these segments. For example, in 'x', the first segment
starts at location 1 and ends at 3, and the second segment starts at location 8
and ends at location 10. Is there an efficient way of doing this in R without
having to right a bunch of if-else conditions? I know the rle function will
report the length of the segments but not the endpoints.
Thanks in advance.
Gregory Gentlemen
      __________________________________________________________________
[[elided Yahoo spam]]
	[[alternative HTML version deleted]]
William Dunlap
2010-Jan-27  17:46 UTC
[R] Function for describing segements in sequential data
> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Gregory Gentlemen > Sent: Wednesday, January 27, 2010 9:31 AM > To: r-help at r-project.org > Subject: [R] Function for describing segements in sequential data > > Dear R-users, > > Say that I have a sequence of zeroes and ones: > > x <- c(1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0) > > The sequences of ones represent segments and I want to report > the starting and endpoints of these segments. For example, in > 'x', the first segment starts at location 1 and ends at 3, > and the second segment starts at location 8 and ends at > location 10. Is there an efficient way of doing this in R > without having to right a bunch of if-else conditions? I know > the rle function will report the length of the segments but > not the endpoints.You can use expressions based on cumsum(rle(x)$lengths) or, more directly, on the following functions isFirstInRun <- function(x)c(TRUE, x[-1]!=x[-length(x)]) isLastInRun <- function(x)c(x[-1]!=x[-length(x)], TRUE) which do part of what rle() does. E.g., > which(isFirstInRun(x) & x==1) # starting positions of runs of 1's [1] 1 8 15 > which(isLastInRun(x) & x==1) # ending positions of runs of 1's [1] 3 10 17 Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com> > Thanks in advance. > > Gregory Gentlemen > > > > > __________________________________________________________________ > [[elided Yahoo spam]] > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Steve Lianoglou
2010-Jan-27  17:48 UTC
[R] Function for describing segements in sequential data
Hi, On Wed, Jan 27, 2010 at 12:31 PM, Gregory Gentlemen <gregory_gentlemen at yahoo.ca> wrote:> Dear R-users, > > Say that I have a sequence of zeroes and ones: > > x <- c(1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0) > > The sequences of ones represent segments and I want to report the starting and endpoints of these segments. For example, in 'x', the first segment starts at location 1 and ends at 3, and the second segment starts at location 8 and ends at location 10. Is there an efficient way of doing this in R without having to right a bunch of if-else conditions?How about something like this: start <- which(diff(c(0,x)) == 1) ## Append first 0 for a "bookend" end <- which(diff(x) == -1) start is: 1, 8, 15 end is: 3, 10, 17 -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology | Memorial Sloan-Kettering Cancer Center | Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact
Charles C. Berry
2010-Jan-27  19:36 UTC
[R] Function for describing segements in sequential data
On Wed, 27 Jan 2010, Gregory Gentlemen wrote:> Dear R-users, > > Say that I have a sequence of zeroes and ones: > > x <- c(1,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0) > > The sequences of ones represent segments and I want to report the > starting and endpoints of these segments. For example, in 'x', the first > segment starts at location 1 and ends at 3, and the second segment > starts at location 8 and ends at location 10. Is there an efficient way > of doing this in R without having to right a bunch of if-else > conditions? I know the rle function will report the length of the > segments but not the endpoints.If this is more than a small one-off problem you might try this:> require(IRanges) # from BioConductor > IRanges(Rle(x)==1) # n.b. Rle != rleIRanges of length 3 start end width [1] 1 3 3 [2] 8 10 3 [3] 15 17 3>HTH, Chuck> > Thanks in advance. > > Gregory Gentlemen > > > > __________________________________________________________________ > [[elided Yahoo spam]] > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901