thr3ads.net - R help - [R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges [Apr 2016]

If this information is useful, please help other people find it:
Share via:

何尧

2016-Apr-05 17:27 UTC

[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

I do have a bunch of genes ( nearly ~50000)  from the whole genome, which read
in genomic ranges

A range(gene) can be seem as an observation has three columns chromosome, start
and end, like that

       seqnames start end width strand

gene1     chr1     1   5     5      +

gene2     chr1    10  15     6      +

gene3     chr1    12  17     6      +

gene4     chr1    20  25     6      +

gene5     chr1    30  40    11      +

I just wondering is there an efficient way to find overlapped, upstream and
downstream genes for each gene in the granges

For example, assuming all_genes_gr is a ~50000 genes genomic range, the result I
want like belows:

gene_nameupstream_genedownstream_geneoverlapped_gene
gene1NAgene2NA
gene2gene1gene4gene3
gene3gene1gene4gene2
gene4gene3gene5NA

Currently ,  the strategy I use is like that,  
library(GenomicRanges)
find_overlapped_gene <- function(idx, all_genes_gr) {
  #cat(idx, "\n")
  curr_gene <- all_genes_gr[idx]
  other_genes <- all_genes_gr[-idx]
  n <- countOverlaps(curr_gene, other_genes)
  gene <- subsetByOverlaps(curr_gene, other_genes)
  return(list(n, gene))
}?

system.time(lapply(1:100, function(idx)  find_overlapped_gene(idx,
all_genes_gr)))
However, for 100 genes, it use nearly ~8s by system.time().That means if I had
50000 genes, nearly one hour for just find overlapped gene.

I am just wondering any algorithm or strategy to do that efficiently, perhaps
50000 genes in ~10min or even less

 



	[[alternative HTML version deleted]]

David Winsemius

2016-Apr-06 01:21 UTC

head link

[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

> On Apr 5, 2016, at 10:27 AM, ?? <heyao at pku.edu.cn> wrote:
> 
> I do have a bunch of genes ( nearly ~50000)  from the whole genome, which
read in genomic ranges
> 
> A range(gene) can be seem as an observation has three columns chromosome,
start and end, like that
> 
>       seqnames start end width strand
> 
> gene1     chr1     1   5     5      +
> 
> gene2     chr1    10  15     6      +
> 
> gene3     chr1    12  17     6      +
> 
> gene4     chr1    20  25     6      +
> 
> gene5     chr1    30  40    11      +
> 
> I just wondering is there an efficient way to find overlapped, upstream and
downstream genes for each gene in the granges
The data.table package (in CRAN) and the iRanges package (in bioC) have
formalized efficient approaches to those problems.

> 
> For example, assuming all_genes_gr is a ~50000 genes genomic range, the
result I want like belows:
> 
> gene_nameupstream_genedownstream_geneoverlapped_gene
> gene1NAgene2NA
> gene2gene1gene4gene3
> gene3gene1gene4gene2
> gene4gene3gene5NA
> 
> Currently ,  the strategy I use is like that,  
> library(GenomicRanges)
> find_overlapped_gene <- function(idx, all_genes_gr) {
>  #cat(idx, "\n")
>  curr_gene <- all_genes_gr[idx]
>  other_genes <- all_genes_gr[-idx]
>  n <- countOverlaps(curr_gene, other_genes)
>  gene <- subsetByOverlaps(curr_gene, other_genes)
>  return(list(n, gene))
> }?
> 
> system.time(lapply(1:100, function(idx)  find_overlapped_gene(idx,
all_genes_gr)))
> However, for 100 genes, it use nearly ~8s by system.time().That means if I
had 50000 genes, nearly one hour for just find overlapped gene.
> 
> I am just wondering any algorithm or strategy to do that efficiently,
perhaps 50000 genes in ~10min or even less
> I suspect this would happen on a much faster basis for such a small dataset.

-- 
David.


> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Michael Lawrence

2016-Apr-11 14:57 UTC

head link

[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

For the sake of prosterity, this question was asked and answered here:
https://support.bioconductor.org/p/80448

On Tue, Apr 5, 2016 at 10:27 AM, ?? <heyao at pku.edu.cn>
wrote:> I do have a bunch of genes ( nearly ~50000)  from the whole genome, which
read in genomic ranges
>
> A range(gene) can be seem as an observation has three columns chromosome,
start and end, like that
>
>        seqnames start end width strand
>
> gene1     chr1     1   5     5      +
>
> gene2     chr1    10  15     6      +
>
> gene3     chr1    12  17     6      +
>
> gene4     chr1    20  25     6      +
>
> gene5     chr1    30  40    11      +
>
> I just wondering is there an efficient way to find overlapped, upstream and
downstream genes for each gene in the granges
>
> For example, assuming all_genes_gr is a ~50000 genes genomic range, the
result I want like belows:
>
> gene_nameupstream_genedownstream_geneoverlapped_gene
> gene1NAgene2NA
> gene2gene1gene4gene3
> gene3gene1gene4gene2
> gene4gene3gene5NA
>
> Currently ,  the strategy I use is like that,
> library(GenomicRanges)
> find_overlapped_gene <- function(idx, all_genes_gr) {
>   #cat(idx, "\n")
>   curr_gene <- all_genes_gr[idx]
>   other_genes <- all_genes_gr[-idx]
>   n <- countOverlaps(curr_gene, other_genes)
>   gene <- subsetByOverlaps(curr_gene, other_genes)
>   return(list(n, gene))
> }
>
> system.time(lapply(1:100, function(idx)  find_overlapped_gene(idx,
all_genes_gr)))
> However, for 100 genes, it use nearly ~8s by system.time().That means if I
had 50000 genes, nearly one hour for just find overlapped gene.
>
> I am just wondering any algorithm or strategy to do that efficiently,
perhaps 50000 genes in ~10min or even less
>
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Reasonably Related Threads

Is that an efficient way to find the overlapped , upstream and downstream rangess for a bunch of rangess

R help - Apr 2016 - Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

[R] Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges

Reasonably Related Threads