Yao He
2016-Apr-05 17:29 UTC
[R] Is that an efficient way to find the overlapped , upstream and downstream rangess for a bunch of rangess
I do have a bunch of genes ( nearly ~50000) from the whole genome, which read in genomic ranges A range(gene) can be seem as an observation has three columns chromosome, start and end, like that seqnames start end width strand gene1 chr1 1 5 5 + gene2 chr1 10 15 6 + gene3 chr1 12 17 6 + gene4 chr1 20 25 6 + gene5 chr1 30 40 11 + I just wondering is there an efficient way to find *overlapped, upstream and downstream genes for each gene in the granges* For example, assuming all_genes_gr is a ~50000 genes genomic range, the result I want like belows: gene_name upstream_gene downstream_gene overlapped_gene gene1 NA gene2 NA gene2 gene1 gene4 gene3 gene3 gene1 gene4 gene2 gene4 gene3 gene5 NA Currently , the strategy I use is like that, library(GenomicRanges) find_overlapped_gene <- function(idx, all_genes_gr) { #cat(idx, "\n") curr_gene <- all_genes_gr[idx] other_genes <- all_genes_gr[-idx] n <- countOverlaps(curr_gene, other_genes) gene <- subsetByOverlaps(curr_gene, other_genes) return(list(n, gene)) }? system.time(lapply(1:100, function(idx) find_overlapped_gene(idx, all_genes_gr))) However, for 100 genes, it use nearly ~8s by system.time().That means if I had 50000 genes, nearly one hour for just find overlapped gene. I am just wondering any algorithm or strategy to do that efficiently, perhaps 50000 genes in ~10min or even less Yao He [[alternative HTML version deleted]]
Maybe Matching Threads
- Is that an efficient way to find the overlapped , upstream and downstream ranges for a bunch of ranges
- help with reshape is needed again!
- Help needed in feature extraction from two input files
- plot columns
- Inversions in hierarchical clustering were they shouldn't be