thr3ads.net - R help - [R] Assigning genes to CBS segmented output: [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Angel Russo

2011-Oct-04 21:44 UTC

[R] Assigning genes to CBS segmented output:

Hi All,

I have an CBS segmentation algorithm output for 10 tumor samples each from 2
different tumors.

Now, I am in an urgent need to assign gene (followed by all genes present)
that belong to a particular segment after I removed all the CNVs from
segment data. The format of the data is:

Sample  Chromosome      Start   End     Num_Probes      Segment_Mean
Sample1A-TA  1       51598   76187   15      -1.115

Could anyone suggest an R library or code or method that I can quickly use
to get the genes assigned to CBS output.

Thanks so much,
Angel

	[[alternative HTML version deleted]]

Martin Morgan

2011-Oct-04 23:35 UTC

head link

[R] Assigning genes to CBS segmented output:

On 10/04/2011 02:44 PM, Angel Russo wrote:> Hi All,
>
> I have an CBS segmentation algorithm output for 10 tumor samples each from
2
> different tumors.
>
> Now, I am in an urgent need to assign gene (followed by all genes present)
> that belong to a particular segment after I removed all the CNVs from
> segment data. The format of the data is:
>
> Sample  Chromosome      Start   End     Num_Probes      Segment_Mean
> Sample1A-TA  1       51598   76187   15      -1.115
Hi Angel -- In Bioconductor

   http://bioconductor.org

for some model organism create a data frame of known Entrez genes and 
their begin / end locations. Start by installing necessary software and 
data packages

   source('http://bioconductor.org/biocLite.R")
   biocLite(c('org.Hs.eg.db', "GenomicRanges'))

then load the library with annotations about genic coordinates

   library(org.Hs.eg.db)
   anno = merge(toTable(org.Hs.egCHRLOC), toTable(org.Hs.egCHRLOCEND))

leading to

 > head(anno)
     gene_id Chromosome start_location end_location
1     10000          1     -243666483   -244006553
2     10000          1     -243666483   -244006553
3     10000          1     -243651534   -244006553
4     10000          1     -243651534   -244006553
5 100008586          X       49217770     49223847
6 100008586          X       49217770     49332715

For the simple question 'which genes are located on chromosome A 
starting at X and going to Y' you could

   subset(geno, Chromosome=="A" &
                  abs(start_location) > X &
                    abs(end_location) < Y)

This could also be done through the 'biomaRt' package or GenomicFeatures
/ TxDb packages. To get this for many segments filter 'anno' to remove 
funky genes, e.g., those that have negative length(!)

   idx = with(anno, abs(start_location) > abs(end_location))
   anno = anno[!idx,]

manipulate this to a GRanges object;

   library(GenomicRanges)
   gr = with(anno, GRanges(Chromosome,
                     IRanges(abs(start_location), abs(end_location)),
                     names=gene_id))

convert your CBS result into a GRanges

   seg = with(CBS, GRanges(Chromosome, IRanges(Start, End)))

then find overlaps

   olap = findOverlaps(gr, seg)

the 'gr' is called the 'query', 'seg' is called the
'subject'.
queryHits(olap) and subjectHits(olap) give equal-length vectors 
describing which queries overlap which subjects. You could group gene 
names by segment with

   split(names(gr)[queryHits(olap)], subjectHits(olap))

An important issue is to use the same genome build for annotations as 
you used for segmentation. Hope that helps / provides some hints for 
getting from A to B.

Martin
>
> Could anyone suggest an R library or code or method that I can quickly use
> to get the genes assigned to CBS output.
>
> Thanks so much,
> Angel
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

Reasonably Related Threads

Search for more possibly parallel threads

R help - Oct 2011 - Assigning genes to CBS segmented output:

[R] Assigning genes to CBS segmented output:

[R] Assigning genes to CBS segmented output:

Reasonably Related Threads