I am in need of someone's help in correlating gene expression. I'm somewhat new to R, and can't seem to find anyone local to help me with what I think is a simple problem. I need to obtain pearson and spearman correlation coefficients, and corresponding p-values for all of the genes in my dataset that correlate to one specific gene of interest. I'm working with mouse Affymetrix Mouse 430 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column containing identifiers) and 30 biological replicates (columns; with the top row containing the header information). I've looked through several Intro manuals and the R help files. I know that "cor(x,y, use ="everything", method = c("pearson")) " can help obtain the coefficients. I also know that "cor.test()" is supposed to test the significance of a single correlation coefficients. I've also found the bioconductor package "genefilter" / "genefinder" that looks for correlations to a given gene (although I can't get it to work). So far I've been able to: #Read in the csv file data<-read.csv("my data.csv") #Check the dimensions, names, class, fix(data) to ensure the file was loaded properly dim(data) names(data) class(data) fix(data) #So far I've been able to successfully correlate the entire 'column' matrix through: x <- data[,2:30] y <- data[,2:30] corr.data<-cor(x,y, use = "everything", method = c("pearson")) write.csv(corr.data, file = "correlation of my data by columns.csv") ----------------------------------- Now if I try and run the 'cor.test()' function on the same matrix, I get and error message with 'x' must be a numeric vector. This I don't understand. And this is not my goal, but rather me trying to learn how to go about doing correlation analysis in R. I've also tried transposing the data.frame using "as.data.frame(t(data))" and doing so gives the same error message as above. Can anyone help me with figuring out how to conduct a correlation analysis for specific gene/probeset, and help me understand why I get the above error message? I know it probably is a simple analysis, that is probably just over my head right now since I'm still new to R. But I can't figure it out and have been trying with a bunch of different variations for the past week. Thank you in advance for your help. [[alternative HTML version deleted]]
I do not know the bioconductor packages you mentioned, but the corr.test function in the psych package, or the rcorr function in the Hmisc package should do the work. Also note that the c() in method=c("pearson") is redundant. Just write method="pearson" instead (or nothing, since this is the default for both functions). HTH, Denes> I am in need of someone's help in correlating gene expression. I'm > somewhat > new to R, and can't seem to find anyone local to help me with what I think > is a simple problem. > > I need to obtain pearson and spearman correlation coefficients, and > corresponding p-values for all of the genes in my dataset that correlate > to > one specific gene of interest. I'm working with mouse Affymetrix Mouse 430 > 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column > containing identifiers) and 30 biological replicates (columns; with the > top > row containing the header information). > > I've looked through several Intro manuals and the R help files. > > I know that "cor(x,y, use ="everything", method = c("pearson")) " can help > obtain the coefficients. > > I also know that "cor.test()" is supposed to test the significance of a > single correlation coefficients. > > I've also found the bioconductor package "genefilter" / "genefinder" that > looks for correlations to a given gene (although I can't get it to work). > > So far I've been able to: > > #Read in the csv file > data<-read.csv("my data.csv") > > #Check the dimensions, names, class, fix(data) to ensure the file was > loaded properly > dim(data) > names(data) > class(data) > fix(data) > > #So far I've been able to successfully correlate the entire 'column' > matrix > through: > x <- data[,2:30] > y <- data[,2:30] > > corr.data<-cor(x,y, use = "everything", method = c("pearson")) > > write.csv(corr.data, file = "correlation of my data by columns.csv") > > ----------------------------------- > > Now if I try and run the 'cor.test()' function on the same matrix, I get > and > error message with 'x' must be a numeric vector. This I don't understand. > And this is not my goal, but rather me trying to learn how to go about > doing > correlation analysis in R. > > I've also tried transposing the data.frame using "as.data.frame(t(data))" > and doing so gives the same error message as above. > > Can anyone help me with figuring out how to conduct a correlation analysis > for specific gene/probeset, and help me understand why I get the above > error > message? I know it probably is a simple analysis, that is probably just > over > my head right now since I'm still new to R. But I can't figure it out and > have been trying with a bunch of different variations for the past week. > > Thank you in advance for your help. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Hi r-help-bounces at r-project.org napsal dne 09.04.2011 19:24:38:> I am in need of someone's help in correlating gene expression. I'msomewhat> new to R, and can't seem to find anyone local to help me with what Ithink> is a simple problem. > > I need to obtain pearson and spearman correlation coefficients, and > corresponding p-values for all of the genes in my dataset that correlateto> one specific gene of interest. I'm working with mouse Affymetrix Mouse430> 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column > containing identifiers) and 30 biological replicates (columns; with thetop> row containing the header information). > > I've looked through several Intro manuals and the R help files. > > I know that "cor(x,y, use ="everything", method = c("pearson")) " canhelp> obtain the coefficients. > > I also know that "cor.test()" is supposed to test the significance of a > single correlation coefficients. > > I've also found the bioconductor package "genefilter" / "genefinder"that> looks for correlations to a given gene (although I can't get it towork).> > So far I've been able to: > > #Read in the csv file > data<-read.csv("my data.csv") > > #Check the dimensions, names, class, fix(data) to ensure the file was > loaded properly > dim(data) > names(data) > class(data) > fix(data) > > #So far I've been able to successfully correlate the entire 'column'matrix> through: > x <- data[,2:30] > y <- data[,2:30] > > corr.data<-cor(x,y, use = "everything", method = c("pearson")) > > write.csv(corr.data, file = "correlation of my data by columns.csv") > > ----------------------------------- > > Now if I try and run the 'cor.test()' function on the same matrix, I getand> error message with 'x' must be a numeric vector. This I don'tunderstand. In cor.test help page it is said x, y: numeric vectors of data values. ?x? and ?y? must have the same length. however your data[,2:30] is most probably data frame, see str(data[,2:20]) To be able to do cor.test you need to do cor.test like cor.test(data[,2], data[,3]) or to do it in some cycle (untested) result <- matrix(NA, 20,20) for( i in 2:20) { for(j in i+1:20) { result[i,j] <- cor.test(data[,i], data[,j]) }} But most probably there are other ways. Regards Petr> And this is not my goal, but rather me trying to learn how to go aboutdoing> correlation analysis in R. > > I've also tried transposing the data.frame using"as.data.frame(t(data))"> and doing so gives the same error message as above. > > Can anyone help me with figuring out how to conduct a correlationanalysis> for specific gene/probeset, and help me understand why I get the aboveerror> message? I know it probably is a simple analysis, that is probably justover> my head right now since I'm still new to R. But I can't figure it outand> have been trying with a bunch of different variations for the past week. > > Thank you in advance for your help. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
On Sat, Apr 9, 2011 at 10:24 AM, Sean Farris <farrissp2 at vcu.edu> wrote:> I am in need of someone's help in correlating gene expression. I'm somewhat > new to R, and can't seem to find anyone local to help me with what I think > is a simple problem. > > I need to obtain pearson and spearman correlation coefficients, and > corresponding p-values for all of the genes in my dataset that correlate to > one specific gene of interest. I'm working with mouse Affymetrix Mouse 430 > 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column > containing identifiers) and 30 biological replicates (columns; with the top > row containing the header information).Sean, I'm the maintainer of the package WGCNA that does correlation network analysis of gene expression data. I recommend you check out the package and the tutorials at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html The package contains a couple useful functions for correlation p-values. Unlike cor.test which only takes two vectors (not matrices), you can use the function corAndPvalue to calculate Pearson correlations and the corresponding p-values for matrices. If you already have the correlation matrix pre-calculated AND you have no missing data (i.e., constant number of observations), you can also use corPvalueStudent to calculate the p-values. We don't use Spearman correlations much (we prefer the biweight midcorrelation, functions bicor and bicorAndPvalue, as a robust alternative to Pearson correlation), but you can approximate the Spearman p-values by the Student p-values (that are used for Pearson correlations). Statisticians who read this, please don't execute me for this suggestion :) To use the function cor(), you need to transpose the data so that genes are in columns and samples in rows. Just be aware that to correlate all probe sets at a time you need a 40k+ times 40k+ matrix to hold the result. Only a large computer (at least 32GB of memory, possibly needing 64GB) will be able to handle such a matrix and the necessary manipulations. The WGCNA package contains methods to construct co-expression networks on such big sets if necessary. HTH, Peter