Hi, Algorithm question: I have two sets of "intervals", where an interval is an ordered pair [a,b] of two numbers. Is there an efficient way in R to generate the intersection of two lists of same? For concreteness: I'm representing a set of intervals with a data.frame: > list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) > list1 open close 1 1 2 2 5 10 > list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) > list2 open close 1 1.5 2.5 2 3.0 10.0 How do I get the intersection which would be something like: open close 1 1.5 2.0 2 5.0 10.0 I wonder if there's some ready-built functionality that might help me out. I'm new to R and am still learning to vectorize my code and my thinking. Or maybe there's a package for interval arithmetic that I can just pull off the shelf. Thanks, -tom -- Thomas Meyer
On 4/15/2009 8:59 AM, Thomas Meyer wrote:> Hi, > > Algorithm question: I have two sets of "intervals", where an interval is > an ordered pair [a,b] of two numbers. Is there an efficient way in R to > generate the intersection of two lists of same? > > For concreteness: I'm representing a set of intervals with a data.frame: > > > list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) > > list1 > open close > 1 1 2 > 2 5 10 > > > list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) > > list2 > open close > 1 1.5 2.5 > 2 3.0 10.0 > > How do I get the intersection which would be something like: > open close > 1 1.5 2.0 > 2 5.0 10.0 > > I wonder if there's some ready-built functionality that might help me > out. I'm new to R and am still learning to vectorize my code and my > thinking. Or maybe there's a package for interval arithmetic that I can > just pull off the shelf.pmax and pmin should be helpful. I don't know how you want to represent empty intersections, but I'd just compute the pmax of the lower bounds, the pmin of the upper bounds, and then if the new upper bound is less than the new lower bound, treat it as empty. Duncan Murdoch
one way is: list1 <- as.data.frame(list(open=c(1,5), close=c(2,10))) list2 <- as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) data.frame( open = pmax(list1$open, list2$open), close = pmin(list1$close, list2$close) ) I hope it helps. Best, Dimitris Thomas Meyer wrote:> Hi, > > Algorithm question: I have two sets of "intervals", where an interval is > an ordered pair [a,b] of two numbers. Is there an efficient way in R to > generate the intersection of two lists of same? > > For concreteness: I'm representing a set of intervals with a data.frame: > > > list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) > > list1 > open close > 1 1 2 > 2 5 10 > > > list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) > > list2 > open close > 1 1.5 2.5 > 2 3.0 10.0 > > How do I get the intersection which would be something like: > open close > 1 1.5 2.0 > 2 5.0 10.0 > > I wonder if there's some ready-built functionality that might help me > out. I'm new to R and am still learning to vectorize my code and my > thinking. Or maybe there's a package for interval arithmetic that I can > just pull off the shelf. > > Thanks, > > -tom >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Not of the self but still not complicated: list1 <- data.frame(open=c(1,5), close=c(2,10)) list2 <- data.frame(open=c(1.5,3), close=c(2.5,10)) Intersec <- data.frame(Open = pmax(list1$open, list2$open), Close pmin(list1$close, list2$close)) Intersec[Intersec$Open > Intersec$Close, ] <- NA Intersec HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Thomas Meyer Verzonden: woensdag 15 april 2009 14:59 Aan: r-help at r-project.org Onderwerp: [R] Intersection of two sets of intervals Hi, Algorithm question: I have two sets of "intervals", where an interval is an ordered pair [a,b] of two numbers. Is there an efficient way in R to generate the intersection of two lists of same? For concreteness: I'm representing a set of intervals with a data.frame: > list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) > list1 open close 1 1 2 2 5 10 > list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) > list2 open close 1 1.5 2.5 2 3.0 10.0 How do I get the intersection which would be something like: open close 1 1.5 2.0 2 5.0 10.0 I wonder if there's some ready-built functionality that might help me out. I'm new to R and am still learning to vectorize my code and my thinking. Or maybe there's a package for interval arithmetic that I can just pull off the shelf. Thanks, -tom -- Thomas Meyer ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
There is a very nice "intervals" package in CRAN. It is impressively efficient even for intersections of many millions of intervals. If I remember correctly, it is purely in-core, so on a 32-bit R you'll be limited to something like 100 million intervals. Is that enough for your application? -s On Wed, Apr 15, 2009 at 8:59 AM, Thomas Meyer <tm35 at cornell.edu> wrote:> Hi, > > Algorithm question: I have two sets of "intervals", where an interval is an > ordered pair [a,b] of two numbers. Is there an efficient way in R to > generate the intersection of two lists of same? > > For concreteness: I'm representing a set of intervals with a data.frame: > >> list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) >> list1 > ?open close > 1 ? ?1 ? ? 2 > 2 ? ?5 ? ?10 > >> list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) >> list2 > ?open close > 1 ?1.5 ? 2.5 > 2 ?3.0 ?10.0 > > How do I get the intersection which would be something like: > ?open close > 1 ?1.5 ? 2.0 > 2 ?5.0 ?10.0 > > I wonder if there's some ready-built functionality that might help me out. > I'm new to R and am still learning to vectorize my code and my thinking. Or > maybe there's a package for interval arithmetic that I can just pull off the > shelf. > > Thanks, > > -tom > > -- > Thomas Meyer > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Stavros, you are quite correct -- I discovered that the hard way a little while ago when testing my two-line solution. Use of pmin/pmax don't handle, for instance, cases where more than one interval in one set is wholly contained by an interval in the other. (I have a mis-posted msg awaiting moderator approval in R-help with a concrete example.) Your tip to check out the intervals pkg looks promising. FYI, there the general intersection is computed as the complement of the union of the complements, i.e. A*B = (A'+B')' , aka DeMorgan. Thanks for the help, -tom On 4/15/2009 11:27 AM, Stavros Macrakis wrote:> I can see how pmax/pmin would easily allow the intersection of > corresponding elements of two sequences of intervals, but I don't see > how they help in intersecting two *sets* of intervals, which was the > original problem statement. > > -s > > > > On Wed, Apr 15, 2009 at 9:14 AM, ONKELINX, Thierry > <Thierry.ONKELINX at inbo.be> wrote: >> Not of the self but still not complicated: >> >> list1 <- data.frame(open=c(1,5), close=c(2,10)) >> list2 <- data.frame(open=c(1.5,3), close=c(2.5,10)) >> >> Intersec <- data.frame(Open = pmax(list1$open, list2$open), Close >> pmin(list1$close, list2$close)) >> Intersec[Intersec$Open > Intersec$Close, ] <- NA >> Intersec >> >> HTH, >> >> Thierry >> >> ------------------------------------------------------------------------ >> ---- >> ir. Thierry Onkelinx >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature >> and Forest >> Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, >> methodology and quality assurance >> Gaverstraat 4 >> 9500 Geraardsbergen >> Belgium >> tel. + 32 54/436 185 >> Thierry.Onkelinx at inbo.be >> www.inbo.be >> >> To call in the statistician after the experiment is done may be no more >> than asking him to perform a post-mortem examination: he may be able to >> say what the experiment died of. >> ~ Sir Ronald Aylmer Fisher >> >> The plural of anecdote is not data. >> ~ Roger Brinner >> >> The combination of some data and an aching desire for an answer does not >> ensure that a reasonable answer can be extracted from a given body of >> data. >> ~ John Tukey >> >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens Thomas Meyer >> Verzonden: woensdag 15 april 2009 14:59 >> Aan: r-help at r-project.org >> Onderwerp: [R] Intersection of two sets of intervals >> >> Hi, >> >> Algorithm question: I have two sets of "intervals", where an interval is >> >> an ordered pair [a,b] of two numbers. Is there an efficient way in R to >> generate the intersection of two lists of same? >> >> For concreteness: I'm representing a set of intervals with a data.frame: >> >> > list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) >> > list1 >> open close >> 1 1 2 >> 2 5 10 >> >> > list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) >> > list2 >> open close >> 1 1.5 2.5 >> 2 3.0 10.0 >> >> How do I get the intersection which would be something like: >> open close >> 1 1.5 2.0 >> 2 5.0 10.0 >> >> I wonder if there's some ready-built functionality that might help me >> out. I'm new to R and am still learning to vectorize my code and my >> thinking. Or maybe there's a package for interval arithmetic that I can >> just pull off the shelf. >> >> Thanks, >> >> -tom >> >> -- >> Thomas Meyer >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer >> en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is >> door een geldig ondertekend document. The views expressed in this message >> and any annex are purely those of the writer and may not be regarded as stating >> an official position of INBO, as long as the message is not confirmed by a duly >> signed document. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >>
The pmin/pmax approach fails if an open interval in the first list intersects with more than one open interval in the second. You can deal with that by a sorting trick that gives you the number of intervals each time point is in and then selecting the time points when the number of intervals covering it rises to 2 and falls below 2. This version ignores the possibility that intervals in one list might overlap each other.> f2 <-function(list1, list2) { n1 <- nrow(list1) n2 <- nrow(list2) times <- c(list1$open, list2$open, list1$close, list2$close) o <- order(times) times <- times[o] nOpen <- cumsum(rep(c(1, 1, -1, -1), c(n1, n2, n1, n2))[o]) bothOpen <- nOpen == 2 # ==1 for eitherOpen change <- diff(c(FALSE, bothOpen)) data.frame(open = times[change == 1], close = times[change =-1]) }> print(list3 <- data.frame(open = c(1, 5), close = c(4, 10)))open close 1 1 4 2 5 10> list2open close 1 1.5 2.5 2 3.0 10.0> f2(list2,list3)open close 1 1.5 2.5 2 3.0 4.0 3 5.0 10.0 Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ------------------------------------------------------------------- [R] Intersection of two sets of intervals Thomas Meyer tm35 at cornell.edu Wed Apr 15 16:52:48 CEST 2009 Previous message: [R] Intersection of two sets of intervals Next message: [R] Intersection of two sets of intervals Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] pmax/pmin did the trick nicely -- the right-size tool I was hoping for. Thanks to all, -tom On 4/15/2009 9:14 AM, ONKELINX, Thierry wrote:> Not of the self but still not complicated: > > list1 <- data.frame(open=c(1,5), close=c(2,10)) > list2 <- data.frame(open=c(1.5,3), close=c(2.5,10)) > > Intersec <- data.frame(Open = pmax(list1$open, list2$open), Close > pmin(list1$close, list2$close)) > Intersec[Intersec$Open > Intersec$Close, ] <- NA > Intersec > > HTH, > > Thierry > >------------------------------------------------------------------------> ---- > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek / Research Institute for Nature > and Forest > Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, > methodology and quality assurance > Gaverstraat 4 > 9500 Geraardsbergen > Belgium > tel. + 32 54/436 185 > Thierry.Onkelinx at inbo.be > www.inbo.be > > To call in the statistician after the experiment is done may be nomore> than asking him to perform a post-mortem examination: he may be ableto> say what the experiment died of. > ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. > ~ Roger Brinner > > The combination of some data and an aching desire for an answer doesnot> ensure that a reasonable answer can be extracted from a given body of > data. > ~ John Tukey > > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces atr-project.org]> Namens Thomas Meyer > Verzonden: woensdag 15 april 2009 14:59 > Aan: r-help at r-project.org > Onderwerp: [R] Intersection of two sets of intervals > > Hi, > > Algorithm question: I have two sets of "intervals", where an intervalis> > an ordered pair [a,b] of two numbers. Is there an efficient way in Rto> generate the intersection of two lists of same? > > For concreteness: I'm representing a set of intervals with adata.frame:> > > list1 = as.data.frame(list(open=c(1,5), close=c(2,10))) > > list1 > open close > 1 1 2 > 2 5 10 > > > list2 = as.data.frame(list(open=c(1.5,3), close=c(2.5,10))) > > list2 > open close > 1 1.5 2.5 > 2 3.0 10.0 > > How do I get the intersection which would be something like: > open close > 1 1.5 2.0 > 2 5.0 10.0 > > I wonder if there's some ready-built functionality that might help me > out. I'm new to R and am still learning to vectorize my code and my > thinking. Or maybe there's a package for interval arithmetic that Ican> just pull off the shelf. > > Thanks, > > -tom >