alison waller
2007-Dec-06 19:03 UTC
[R] finding most highly transcribed genes - ranking, sorting and subsets?
Hello, I am not only interested in finding out which genes are the most highly up- or down-regulated (which I have done using the linear models and Bayesian statistics in Limma), but I also want to know which genes are consistently highly transcribed (ie. they have a high intensity in the channel of interest eg. Cy5 or Cy3 across the set of experiments). I might have missed a straight forward way to do this, or a valuable function, but I've been using my own methods and going around in circles. So far I've normalized within and between arrays, then returned the RG values using RG<-RG.MA, then I ranked each R and G values for each array as below. rankRG<-RG rankRG$R[,1]<-rank(rankRG$R[,1]) rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as well as the G's) then I thought I could pull out a subset of rankRG using something like; topRG<-rankRG topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500) However, this just returned me a matrix with one row of $R (the ranks were <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). However, I can't believe that there is only one gene that is in the top 500 for R intensitiy among those three arrays. Am I doing something wrong? Can someone think of a better way of doing this? Thanks Alison ****************************************** Alison S. Waller M.A.Sc. Doctoral Candidate awaller@chem-eng.utoronto.ca 416-978-4222 (lab) Department of Chemical Engineering Wallberg Building 200 College st. Toronto, ON M5S 3E5 [[alternative HTML version deleted]]
alison waller
2007-Dec-07 16:17 UTC
[R] finding most highly transcribed genes - ranking, sorting and subsets?
Thanks so much Martin, This method is definitely more straightforward. And you are right I don't think I was doing anything wrong before. However, I thought that rank, would rank the highest 1st, however after looking at the results using your methods, I realized it ranks the lowest number 1. So I modified it for rank>18500. And now I'm getting 300 rows for which the intensity is consistenly high. However, I am still laking some information. For the results I can get a matrix of 300 rows and the corresponding intensities (from m) or rank (from h), but what I really want is the name of the original row, which corresponds to a specific spot on the array). I did msubset<-m[hrows,] and as mentioned I just get the rows numbered 1-300, while I want to essentially pickout the 300 rows from the original 19,000 rows maintaing the original row designation as it corresponds to a specific gene. Thanks again for any suggestions, Alison -----Original Message----- From: Martin Morgan [mailto:mtmorgan at fhcrc.org] Sent: Thursday, December 06, 2007 4:06 PM To: alison waller Subject: Re: [R] finding most highly transcribed genes - ranking, sorting and subsets? Hi Alison -- I'm not sure where your problem is coming from, but R can help you to more efficiently do your task. Skipping the bioc terminology and data structures, you have a matrix> m <- matrix(runif(100000), ncol=10)you'd like to determine the rank of values in each column> r <- apply(m, 2, rank)identfiy those with high rank> h <- r < 500and find the rows for which the rank is always high> hrows <- apply(h, 1, all)you can then use hrows to subset your original matrix (m[hrows,]) or otherwise, e.g., how many rows with high rank> sum(hrows)[1] 0 or perhaps the distribution of the number of columns in which high ranking genes occur.> table(apply(h, 1, sum))0 1 2 3 4 5996 3132 765 100 7 Martin "alison waller" <alison.waller at utoronto.ca> writes:> Hello, > > > > I am not only interested in finding out which genes are the most highlyup-> or down-regulated (which I have done using the linear models and Bayesian > statistics in Limma), but I also want to know which genes are consistently > highly transcribed (ie. they have a high intensity in the channel of > interest eg. Cy5 or Cy3 across the set of experiments). I might havemissed> a straight forward way to do this, or a valuable function, but I've been > using my own methods and going around in circles. > > > > So far I've normalized within and between arrays, then returned the RG > values using RG<-RG.MA, then I ranked each R and G values for each arrayas> below. > > rankRG<-RG > > rankRG$R[,1]<-rank(rankRG$R[,1]) > > rankRG$R[,2]<-rank(rankRG$R[,2]) .. and so on for 6 columns(ie. arrays, as > well as the G's) > > > > then I thought I could pull out a subset of rankRG using something like; > > topRG<-rankRG > > topRG$R<-subset(topRG$R,topRG$R[,1]<500&topRG$R[,2]<500&topRG$R[,5]<500) > > > > However, this just returned me a matrix with one row of $R (the ranks were > <500 for columns 1,2, and 5 and greater than 500 for 3,4,and 6). However,I> can't believe that there is only one gene that is in the top 500 for R > intensitiy among those three arrays. > > > > Am I doing something wrong? Can someone think of a better way of doing > this? > > > > Thanks > > > > Alison > > > > > > ****************************************** > Alison S. Waller M.A.Sc. > Doctoral Candidate > awaller at chem-eng.utoronto.ca > 416-978-4222 (lab) > Department of Chemical Engineering > Wallberg Building > 200 College st. > Toronto, ON > M5S 3E5 > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.-- Dr. Martin Morgan, PhD Computational Biology Shared Resource Director Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793