Hello, I am new to R and have a problem I have had some trouble with. Basically I have a list of some 80000 genes or so with data points for expression levels at various time points/conditions. I also have subsets of these, usually only a few hundred genes in size, known to be associated with some biological process. What I want to do is correlate the entire list with my subset and then return out the gene names and correlations for correlations above a certain threshold - the idea being that these would be good candidate genes to look at more closely for possible interaction with said biological process. What functions should I use to correlate the subset to the whole set - keeping in mind that I need to keep track of gene names in both sets? And how, then, from what I presume would be something like a 80000 x 200 sized matrix would I extract gene pair correlations over a certain threshold? It would also probably be good if I could somehow remove the subset genes from the whole gene list before computation so as to avoid useless 1 correlations. I'll keep trying on my own, but any help would be appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Help-with-correlation-matrices-thresholding-tp4636697.html Sent from the R help mailing list archive at Nabble.com.
Try this as a starting point; dat is your whole data set, and r = correlation coefficient threshold: spec.cor <- function (dat, r, ...) { x <- cor(dat, ...) x[upper.tri(x, TRUE)] <- NA i <- which(abs(x) >= r, arr.ind = TRUE) data.frame(matrix(colnames(x)[as.vector(i)], ncol = 2), value = x[i]) } Drinniol wrote> > Hello, I am new to R and have a problem I have had some trouble with. > > Basically I have a list of some 80000 genes or so with data points for > expression levels at various time points/conditions. I also have subsets > of these, usually only a few hundred genes in size, known to be associated > with some biological process. > > What I want to do is correlate the entire list with my subset and then > return out the gene names and correlations for correlations above a > certain threshold - the idea being that these would be good candidate > genes to look at more closely for possible interaction with said > biological process. > > What functions should I use to correlate the subset to the whole set - > keeping in mind that I need to keep track of gene names in both sets? And > how, then, from what I presume would be something like a 80000 x 200 sized > matrix would I extract gene pair correlations over a certain threshold? > It would also probably be good if I could somehow remove the subset genes > from the whole gene list before computation so as to avoid useless 1 > correlations. > > I'll keep trying on my own, but any help would be appreciated! >-- View this message in context: http://r.789695.n4.nabble.com/Help-with-correlation-matrices-thresholding-tp4636697p4636704.html Sent from the R help mailing list archive at Nabble.com.
I suggest you post this on the Bioconductor list instead, as it would seem much more relevant to your concerns. -- Bert On Mon, Jul 16, 2012 at 1:55 PM, Drinniol <drinniol@gmail.com> wrote:> Hello, I am new to R and have a problem I have had some trouble with. > > Basically I have a list of some 80000 genes or so with data points for > expression levels at various time points/conditions. I also have subsets > of > these, usually only a few hundred genes in size, known to be associated > with > some biological process. > > What I want to do is correlate the entire list with my subset and then > return out the gene names and correlations for correlations above a certain > threshold - the idea being that these would be good candidate genes to look > at more closely for possible interaction with said biological process. > > What functions should I use to correlate the subset to the whole set - > keeping in mind that I need to keep track of gene names in both sets? And > how, then, from what I presume would be something like a 80000 x 200 sized > matrix would I extract gene pair correlations over a certain threshold? It > would also probably be good if I could somehow remove the subset genes from > the whole gene list before computation so as to avoid useless 1 > correlations. > > I'll keep trying on my own, but any help would be appreciated! > > -- > View this message in context: > http://r.789695.n4.nabble.com/Help-with-correlation-matrices-thresholding-tp4636697.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm [[alternative HTML version deleted]]