thr3ads.net - R help - [R] fusion of overlapping intervals [Nov 2012]

If this information is useful, please help other people find it:
Share via:

Hermann Norpois

2012-Nov-05 17:14 UTC

[R] fusion of overlapping intervals

Hello,

I have start and end coordinates from different experiments (DNase
hypersensitivity data) and now I would like to combine overlapping
intervals. For instance (see my test data below) (2) 30-52 and (3) 49-101
are combined to 30-101. But 49-101 and 70-103 would not be combined because
they are on different chromosomes (chr a and chr b).
Does anybody have an idea?
Thanks
Hermann
> df  chr start end
1   a     5  10
2   a    30  52
3   a    49 101
4   b    70  103
5   b   100 130
6   b   129 140> dput (df)structure(list(chr = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label =
c("a",
"b"), class = "factor"), start = c(5, 30, 49, 70, 100, 129),
    end = c(10, 52, 101, 103, 130, 140)), .Names = c("chr",
"start",
"end"), row.names = c(NA, -6L), class = "data.frame")

	[[alternative HTML version deleted]]

Martin Morgan

2012-Nov-05 17:23 UTC

head link

[R] fusion of overlapping intervals

On 11/05/2012 09:14 AM, Hermann Norpois wrote:> Hello,
>
> I have start and end coordinates from different experiments (DNase
> hypersensitivity data) and now I would like to combine overlapping
> intervals. For instance (see my test data below) (2) 30-52 and (3) 49-101
> are combined to 30-101. But 49-101 and 70-103 would not be combined because
> they are on different chromosomes (chr a and chr b).
> Does anybody have an idea?
This data is very naturally handled by the "GRange" class  in
Bioconductor's
GenomicRanges package

   source("http://bioconductor.org/biocLite.R")
   biocLite("GenomicRanges')
   library(GenomicRanges)

   gr = GRanges(rep(c("a", "b"), each=3),
                IRanges(c(5, 30, 49, 70, 100, 129),
                        c(10, 52, 101, 103, 130, 140)),
                strand="*")

and then

 > reduce(gr)
GRanges with 3 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]        a [ 5,  10]      *
   [2]        a [30, 101]      *
   [3]        b [70, 140]      *
   ---
   seqlengths:
     a  b
    NA NA

There are vignettes

   vignette(package="GenomicRanges")

and additional training material, e.g.,

   http://bioconductor.org/help/course-materials/2012/CSC2012/

If you pursue this solution then please follow-up with questions on the 
Bioconductor mailing list

   http://bioconductor.org/help/mailing-list/

Martin
> Thanks
> Hermann
>
>> df
>    chr start end
> 1   a     5  10
> 2   a    30  52
> 3   a    49 101
> 4   b    70  103
> 5   b   100 130
> 6   b   129 140
>> dput (df)
> structure(list(chr = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label =
c("a",
> "b"), class = "factor"), start = c(5, 30, 49, 70, 100,
129),
>      end = c(10, 52, 101, 103, 130, 140)), .Names = c("chr",
"start",
> "end"), row.names = c(NA, -6L), class = "data.frame")
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

arun

2012-Nov-05 20:26 UTC

head link

[R] fusion of overlapping intervals

HI,

May be you should check this link
(http://r.789695.n4.nabble.com/R-overlapping-intervals-td810061.html).


dat1<-structure(list(chr = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label =
c("a",
"b"), class = "factor"), start = c(5, 30, 49, 70, 100, 129),
??? end = c(10, 52, 101, 103, 130, 140)), .Names = c("chr",
"start",
"end"), row.names = c(NA, -6L), class = "data.frame")

Using Jim's code:
fun1<-function(x){
x1<-x2<-logical(max(x[,2],x[,3]))
x1[unlist(mapply(seq,x[,2],x[,3]))]<-TRUE
?x2[unlist(mapply(seq,x[,2],x[,3]))]<-TRUE
r<-rle(x1 & x2)
offset<-cumsum(r$lengths)
cbind(offset[r$values]-r$lengths[r$values] +1,offset[r$values])}

?list1<-lapply(split(dat1,dat1$chr),function(x) x)
?res<-do.call(rbind,lapply(list1,function(x)
data.frame(chr=names(list1)[match.call()[[2]][[3]]],fun1(x))))
rownames(res)<-1:nrow(res)
?colnames(res)<-colnames(dat1)
?res
#? chr start end
#1?? a???? 5? 10
#2?? a??? 30 101
#3?? b??? 70 140

A.K.





----- Original Message -----
From: Hermann Norpois <hnorpois at googlemail.com>
To: r-help at r-project.org
Cc: 
Sent: Monday, November 5, 2012 12:14 PM
Subject: [R] fusion of overlapping intervals

Hello,

I have start and end coordinates from different experiments (DNase
hypersensitivity data) and now I would like to combine overlapping
intervals. For instance (see my test data below) (2) 30-52 and (3) 49-101
are combined to 30-101. But 49-101 and 70-103 would not be combined because
they are on different chromosomes (chr a and chr b).
Does anybody have an idea?
Thanks
Hermann
> df? chr start end
1?  a? ?  5? 10
2?  a? ? 30? 52
3?  a? ? 49 101
4?  b? ? 70? 103
5?  b?  100 130
6?  b?  129 140> dput (df)structure(list(chr = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label =
c("a",
"b"), class = "factor"), start = c(5, 30, 49, 70, 100, 129),
? ? end = c(10, 52, 101, 103, 130, 140)), .Names = c("chr",
"start",
"end"), row.names = c(NA, -6L), class = "data.frame")

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Nov 2012 - fusion of overlapping intervals

[R] fusion of overlapping intervals

[R] fusion of overlapping intervals

[R] fusion of overlapping intervals

Apparently Analagous Threads