thr3ads.net - R help - [R] finding most highly transcribed genes - ranking, sorting and subsets? [Dec 2007]

If this information is useful, please help other people find it:
Share via:

alison waller

2007-Dec-06 19:03 UTC

[R] finding most highly transcribed genes - ranking, sorting and subsets?

Hello,

 

I am not only interested in finding out which genes are the most highly up-
or down-regulated (which I have done using the linear models and Bayesian
statistics in Limma), but I also want to know which genes are consistently
highly transcribed (ie. they have a high intensity in the channel of
interest eg. Cy5 or Cy3 across the set of experiments).  I might have missed
a straight forward way to do this, or a valuable function, but I've been
using my own methods and going around in circles.

 

So far I've normalized within and between arrays, then returned the RG
values using RG<-RG.MA, then I ranked each R and G values for each array as
below.

rankRG<-RG

rankRG$R[,1]<-rank(rankRG$R[,1])

rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as
well as the G's)

 

then I thought I could pull out a subset of rankRG using something like;

topRG<-rankRG

topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500)

 

However, this just returned me a matrix with one row of $R (the ranks were
<500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6).  However, I
can't believe that there is only one gene that is in the top 500 for R
intensitiy among those three arrays.

 

Am I doing something wrong?  Can someone think of a better way of doing
this?

 

Thanks

 

Alison

 

 

******************************************
Alison S. Waller  M.A.Sc.
Doctoral Candidate
awaller@chem-eng.utoronto.ca
416-978-4222 (lab)
Department of Chemical Engineering
Wallberg Building
200 College st.
Toronto, ON
M5S 3E5

  

 


	[[alternative HTML version deleted]]

alison waller

2007-Dec-07 16:17 UTC

head link

[R] finding most highly transcribed genes - ranking, sorting and subsets?

Thanks so much Martin,

This method is definitely more straightforward.  And you are right I don't
think I was doing anything wrong before. However, I thought that rank, would
rank the highest 1st, however after looking at the results using your
methods, I realized it ranks the lowest number 1.  So I modified it for
rank>18500.  And now I'm getting 300 rows for which the intensity is
consistenly high.

However, I am still laking some information.  For the results I can get a
matrix of 300 rows and the corresponding intensities (from m) or rank (from
h), but what I really want is the name of the original row, which
corresponds to a specific spot on the array).

I did msubset<-m[hrows,] and as mentioned I just get the rows numbered
1-300, while I want to essentially pickout the 300 rows from the original
19,000 rows maintaing the original row designation as it corresponds to a
specific gene.

Thanks again for any suggestions,

Alison

-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org] 
Sent: Thursday, December 06, 2007 4:06 PM
To: alison waller
Subject: Re: [R] finding most highly transcribed genes - ranking, sorting
and subsets?

Hi Alison --

I'm not sure where your problem is coming from, but R can help you to
more efficiently do your task. Skipping the bioc terminology and data
structures, you have a matrix
> m <- matrix(runif(100000), ncol=10)
you'd like to determine the rank of values in each column
> r <- apply(m, 2, rank)
identfiy those with high rank
> h <- r < 500
and find the rows for which the rank is always high
> hrows <- apply(h, 1, all)
you can then use hrows to subset your original matrix (m[hrows,]) or
otherwise, e.g., how many rows with high rank
> sum(hrows)[1] 0

or perhaps the distribution of the number of columns in which high
ranking genes occur.
> table(apply(h, 1, sum))
   0    1    2    3    4 
5996 3132  765  100    7 

Martin

"alison waller" <alison.waller at utoronto.ca> writes:
> Hello,
>
>  
>
> I am not only interested in finding out which genes are the most highly
up-> or down-regulated (which I have done using the linear models and Bayesian
> statistics in Limma), but I also want to know which genes are consistently
> highly transcribed (ie. they have a high intensity in the channel of
> interest eg. Cy5 or Cy3 across the set of experiments).  I might have
missed> a straight forward way to do this, or a valuable function, but I've
been
> using my own methods and going around in circles.
>
>  
>
> So far I've normalized within and between arrays, then returned the RG
> values using RG<-RG.MA, then I ranked each R and G values for each array
as> below.
>
> rankRG<-RG
>
> rankRG$R[,1]<-rank(rankRG$R[,1])
>
> rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays,
as
> well as the G's)
>
>  
>
> then I thought I could pull out a subset of rankRG using something like;
>
> topRG<-rankRG
>
>
topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500)
>
>  
>
> However, this just returned me a matrix with one row of $R (the ranks were
> <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). 
However,
I> can't believe that there is only one gene that is in the top 500 for R
> intensitiy among those three arrays.
>
>  
>
> Am I doing something wrong?  Can someone think of a better way of doing
> this?
>
>  
>
> Thanks
>
>  
>
> Alison
>
>  
>
>  
>
> ******************************************
> Alison S. Waller  M.A.Sc.
> Doctoral Candidate
> awaller at chem-eng.utoronto.ca
> 416-978-4222 (lab)
> Department of Chemical Engineering
> Wallberg Building
> 200 College st.
> Toronto, ON
> M5S 3E5
>
>   
>
>  
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
-- 
Dr. Martin Morgan, PhD
Computational Biology Shared Resource Director
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

Maybe Matching Threads

Search for more maybe matching threads

R help - Dec 2007 - finding most highly transcribed genes - ranking, sorting and subsets?

[R] finding most highly transcribed genes - ranking, sorting and subsets?

[R] finding most highly transcribed genes - ranking, sorting and subsets?

Maybe Matching Threads