Hi, I was trying to evaluate the pearson correlation and the p-values for an nxm matrix, where each row represents a vector. One way to do it would be to iterate through each row, and find its correlation value( and the p-value) with respect to the other rows. Is there some function by which I can use the matrix as input? Ideally, the output would be an nxn matrix, containing the p-values between the respective vectors. I have tried cor.test for the iterations, but couldn't find a function that would take the matrix as input. Thanks for the help. Dren --------------------------------- [[alternative HTML version deleted]]
Dear Dren, How about the following? cor.pvalues <- function(X){ nc <- ncol(X) res <- matrix(0, nc, nc) for (i in 2:nc){ for (j in 1:(i - 1)){ res[i, j] <- res[j, i] <- cor.test(X[,i], X[,j])$p.value } } res } What one then does with all of those non-independent test is another question, I guess. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dren Scott > Sent: Friday, April 15, 2005 4:33 PM > To: r-help at stat.math.ethz.ch > Subject: [R] Pearson corelation and p-value for matrix > > Hi, > > I was trying to evaluate the pearson correlation and the > p-values for an nxm matrix, where each row represents a > vector. One way to do it would be to iterate through each > row, and find its correlation value( and the p-value) with > respect to the other rows. Is there some function by which I > can use the matrix as input? Ideally, the output would be an > nxn matrix, containing the p-values between the respective vectors. > > I have tried cor.test for the iterations, but couldn't find a > function that would take the matrix as input. > > Thanks for the help. > > Dren > > > > > --------------------------------- > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html
We can be a bit sneaky and `borrow' code from cor.test.default: cor.pval <- function(x, alternative="two-sided", ...) { corMat <- cor(x, ...) n <- nrow(x) df <- n - 2 STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if (alternative == "less") { p } else if (alternative == "greater") { 1 - p } else 2 * pmin(p, 1 - p) p } The test:> system.time(c1 <- cor.pvals(X), gcFirst=TRUE)[1] 13.19 0.01 13.58 NA NA> system.time(c2 <- cor.pvalues(X), gcFirst=TRUE)[1] 6.22 0.00 6.42 NA NA> system.time(c3 <- cor.pval(X), gcFirst=TRUE)[1] 0.07 0.00 0.07 NA NA Cheers, Andy> From: John Fox > > Dear Mark, > > I think that the reflex of trying to avoid loops in R is > often mistaken, and > so I decided to try to time the two approaches (on a 3GHz Windows XP > system). > > I discovered, first, that there is a bug in your function -- > you appear to > have indexed rows instead of columns; fixing that: > > cor.pvals <- function(mat) > { > cols <- expand.grid(1:ncol(mat), 1:ncol(mat)) > matrix(apply(cols, 1, > function(x) cor.test(mat[, x[1]], mat[, > x[2]])$p.value), > ncol = ncol(mat)) > } > > > My function is cor.pvalues and yours cor.pvals. This is for a > data matrix > with 1000 observations on 100 variables: > > > R <- diag(100) > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > library(mvtnorm) > > X <- rmvnorm(1000, sigma=R) > > dim(X) > [1] 1000 100 > > > > system.time(cor.pvalues(X)) > [1] 5.53 0.00 5.53 NA NA > > > > system.time(cor.pvals(X)) > [1] 12.66 0.00 12.66 NA NA > > > > I frankly didn't expect the advantage of my approach to be > this large, but > there it is. > > Regards, > John > > -------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > -------------------------------- > > > -----Original Message----- > > From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com] > > Sent: Friday, April 15, 2005 7:08 PM > > To: John Fox > > Cc: 'Dren Scott'; R-Help > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > Here is what might be a slightly more efficient way to get to John's > > question: > > > > cor.pvals <- function(mat) > > { > > rows <- expand.grid(1:nrow(mat), 1:nrow(mat)) > > matrix(apply(rows, 1, > > function(x) cor.test(mat[x[1], ], mat[x[2], > > ])$p.value), > > ncol = nrow(mat)) > > } > > > > HTH, > > > > Marc Schwartz > > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote: > > > Dear Dren, > > > > > > How about the following? > > > > > > cor.pvalues <- function(X){ > > > nc <- ncol(X) > > > res <- matrix(0, nc, nc) > > > for (i in 2:nc){ > > > for (j in 1:(i - 1)){ > > > res[i, j] <- res[j, i] <- cor.test(X[,i], > X[,j])$p.value > > > } > > > } > > > res > > > } > > > > > > What one then does with all of those non-independent test > > is another > > > question, I guess. > > > > > > I hope this helps, > > > John > > > > > > -----Original Message----- > > > > From: r-help-bounces at stat.math.ethz.ch > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Dren Scott > > > > Sent: Friday, April 15, 2005 4:33 PM > > > > To: r-help at stat.math.ethz.ch > > > > Subject: [R] Pearson corelation and p-value for matrix > > > > > > > > Hi, > > > > > > > > I was trying to evaluate the pearson correlation and > the p-values > > > > for an nxm matrix, where each row represents a vector. > > One way to do > > > > it would be to iterate through each row, and find its > correlation > > > > value( and the p-value) with respect to the other rows. > Is there > > > > some function by which I can use the matrix as input? > > Ideally, the > > > > output would be an nxn matrix, containing the p-values > > between the > > > > respective vectors. > > > > > > > > I have tried cor.test for the iterations, but couldn't find a > > > > function that would take the matrix as input. > > > > > > > > Thanks for the help. > > > > > > > > Dren > > > > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
> From: John Fox > > Dear Andy, > > That's clearly much better -- and illustrates an effective > strategy for > vectorizing (or "matricizing") a computation. I think I'll > add this to my > list of programming examples. It might be a little dangerous > to pass ... > through to cor(), since someone could specify > type="spearman", for example.Ah, yes, the "..." isn't likely to help here! Also, it will only work correctly if there are no NA's, for example (or else the degree of freedom would be wrong). Best, Andy> Thanks, > John > > -------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > -------------------------------- > > > -----Original Message----- > > From: Liaw, Andy [mailto:andy_liaw at merck.com] > > Sent: Friday, April 15, 2005 9:51 PM > > To: 'John Fox'; MSchwartz at medanalytics.com > > Cc: 'R-Help'; 'Dren Scott' > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > We can be a bit sneaky and `borrow' code from cor.test.default: > > > > cor.pval <- function(x, alternative="two-sided", ...) { > > corMat <- cor(x, ...) > > n <- nrow(x) > > df <- n - 2 > > STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2) > > p <- pt(STATISTIC, df) > > p <- if (alternative == "less") { > > p > > } else if (alternative == "greater") { > > 1 - p > > } else 2 * pmin(p, 1 - p) > > p > > } > > > > The test: > > > > > system.time(c1 <- cor.pvals(X), gcFirst=TRUE) > > [1] 13.19 0.01 13.58 NA NA > > > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE) > > [1] 6.22 0.00 6.42 NA NA > > > system.time(c3 <- cor.pval(X), gcFirst=TRUE) > > [1] 0.07 0.00 0.07 NA NA > > > > Cheers, > > Andy > > > > > From: John Fox > > > > > > Dear Mark, > > > > > > I think that the reflex of trying to avoid loops in R is often > > > mistaken, and so I decided to try to time the two > approaches (on a > > > 3GHz Windows XP system). > > > > > > I discovered, first, that there is a bug in your function -- you > > > appear to have indexed rows instead of columns; fixing that: > > > > > > cor.pvals <- function(mat) > > > { > > > cols <- expand.grid(1:ncol(mat), 1:ncol(mat)) > > > matrix(apply(cols, 1, > > > function(x) cor.test(mat[, x[1]], mat[, > > > x[2]])$p.value), > > > ncol = ncol(mat)) > > > } > > > > > > > > > My function is cor.pvalues and yours cor.pvals. This is > for a data > > > matrix with 1000 observations on 100 variables: > > > > > > > R <- diag(100) > > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > > > library(mvtnorm) > > > > X <- rmvnorm(1000, sigma=R) > > > > dim(X) > > > [1] 1000 100 > > > > > > > > system.time(cor.pvalues(X)) > > > [1] 5.53 0.00 5.53 NA NA > > > > > > > > system.time(cor.pvals(X)) > > > [1] 12.66 0.00 12.66 NA NA > > > > > > > > > > I frankly didn't expect the advantage of my approach to be > > this large, > > > but there it is. > > > > > > Regards, > > > John > > > > > > -------------------------------- > > > John Fox > > > Department of Sociology > > > McMaster University > > > Hamilton, Ontario > > > Canada L8S 4M4 > > > 905-525-9140x23604 > > > http://socserv.mcmaster.ca/jfox > > > -------------------------------- > > > > > > > -----Original Message----- > > > > From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com] > > > > Sent: Friday, April 15, 2005 7:08 PM > > > > To: John Fox > > > > Cc: 'Dren Scott'; R-Help > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > Here is what might be a slightly more efficient way to > > get to John's > > > > question: > > > > > > > > cor.pvals <- function(mat) > > > > { > > > > rows <- expand.grid(1:nrow(mat), 1:nrow(mat)) > > > > matrix(apply(rows, 1, > > > > function(x) cor.test(mat[x[1], ], mat[x[2], > > > > ])$p.value), > > > > ncol = nrow(mat)) > > > > } > > > > > > > > HTH, > > > > > > > > Marc Schwartz > > > > > > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote: > > > > > Dear Dren, > > > > > > > > > > How about the following? > > > > > > > > > > cor.pvalues <- function(X){ > > > > > nc <- ncol(X) > > > > > res <- matrix(0, nc, nc) > > > > > for (i in 2:nc){ > > > > > for (j in 1:(i - 1)){ > > > > > res[i, j] <- res[j, i] <- cor.test(X[,i], > > > X[,j])$p.value > > > > > } > > > > > } > > > > > res > > > > > } > > > > > > > > > > What one then does with all of those non-independent test > > > > is another > > > > > question, I guess. > > > > > > > > > > I hope this helps, > > > > > John > > > > > > > > > > -----Original Message----- > > > > > > From: r-help-bounces at stat.math.ethz.ch > > > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > > > Dren Scott > > > > > > Sent: Friday, April 15, 2005 4:33 PM > > > > > > To: r-help at stat.math.ethz.ch > > > > > > Subject: [R] Pearson corelation and p-value for matrix > > > > > > > > > > > > Hi, > > > > > > > > > > > > I was trying to evaluate the pearson correlation and > > > the p-values > > > > > > for an nxm matrix, where each row represents a vector. > > > > One way to do > > > > > > it would be to iterate through each row, and find its > > > correlation > > > > > > value( and the p-value) with respect to the other rows. > > > Is there > > > > > > some function by which I can use the matrix as input? > > > > Ideally, the > > > > > > output would be an nxn matrix, containing the p-values > > > > between the > > > > > > respective vectors. > > > > > > > > > > > > I have tried cor.test for the iterations, but > couldn't find a > > > > > > function that would take the matrix as input. > > > > > > > > > > > > Thanks for the help. > > > > > > > > > > > > Dren > > > > > > > > > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------------- > > Notice: This e-mail message, together with any attachments, > > contains information of Merck & Co., Inc. (One Merck Drive, > > Whitehouse Station, New Jersey, USA 08889), and/or its > > affiliates (which may be known outside the United States as > > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > > Banyu) that may be confidential, proprietary copyrighted > > and/or legally privileged. It is intended solely for the use > > of the individual or entity named on this message. If you > > are not the intended recipient, and have received this > > message in error, please notify us immediately by reply > > e-mail and then delete it from your system. > > -------------------------------------------------------------- > > ---------------- > > >
I believe this will do: cor.pval2 <- function(x, alternative="two-sided") { corMat <- cor(x, use=if (any(is.na(x))) "pairwise.complete.obs" else "all") df <- crossprod(!is.na(x)) - 2 STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if (alternative == "less") { p } else if (alternative == "greater") { 1 - p } else 2 * pmin(p, 1 - p) p } Some test:> set.seed(1) > x <- matrix(runif(2e3 * 1e2), 2e3) > system.time(res1 <- cor.pval(t(x)), gcFirst=TRUE)[1] 17.28 0.77 18.16 NA NA> system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE)[1] 19.51 1.05 20.70 NA NA> max(abs(res1 - res2))[1] 0> x[c(1, 3, 6), c(2, 4)] <- NA > x[30, 61] <- NA > system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE)[1] 24.48 0.71 25.28 NA NA This is a bit slower because of the extra computation for "df". One can try to save some computation by only computing with the lower (or upper) triangular part. Cheers, Andy> From: John Fox > > Dear Dren, > > Since cor(), on which Andy's solution is based, can compute > pairwise-present > correlations, you could adapt his function -- you'll have to > adjust the df > for each pair. Alternatively, you could probably save some > time (programming > time + computer time) by just using my solution: > > > R <- diag(100) > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > library(mvtnorm) > > X <- rmvnorm(6000, sigma=R) > > system.time(for (i in 1:50) cor.pvalues(X), gc=TRUE) > [1] 518.19 1.11 520.23 NA NA > > I know that time is money, but nine minutes (on my machine) > probably won't > bankrupt anyone. > > Regards, > John > > -------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > -------------------------------- > > > -----Original Message----- > > From: r-help-bounces at stat.math.ethz.ch > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Dren Scott > > Sent: Monday, April 18, 2005 12:41 PM > > To: 'R-Help' > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > Hi all, > > > > Thanks Andy, Mark and John for all the help. I really > > appreciate it. I'm new to both R and statistics, so please > > excuse any gaffes on my part. > > > > Essentially what I'm trying to do, is to evaluate for each > > row, how many other rows would have a p-value < 0.05. So, > > after I get my N x N p-value matrix, I'll just filter out > > values that are > 0.05. > > > > Each of my datasets (6000 rows x 100 columns) would consist > > of some NA's. The iterative procedure (cor.pvalues) proposed > > by John would yield the values, but it would take an > > inordinately long time (I have 50 of these datasets to > > process). The solution proposed by Andy, although fast, would > > not be able to incorporate the NA's. > > > > Is there any workaround for the NA's? Or possibly do you > > think I could try something else? > > > > Thanks very much. > > > > Dren > > > > > > "Liaw, Andy" <andy_liaw at merck.com> wrote: > > > From: John Fox > > > > > > Dear Andy, > > > > > > That's clearly much better -- and illustrates an > effective strategy > > > for vectorizing (or "matricizing") a computation. I think > I'll add > > > this to my list of programming examples. It might be a little > > > dangerous to pass ... > > > through to cor(), since someone could specify > type="spearman", for > > > example. > > > > Ah, yes, the "..." isn't likely to help here! Also, it will > > only work correctly if there are no NA's, for example (or > > else the degree of freedom would be wrong). > > > > Best, > > Andy > > > > > Thanks, > > > John > > > > > > -------------------------------- > > > John Fox > > > Department of Sociology > > > McMaster University > > > Hamilton, Ontario > > > Canada L8S 4M4 > > > 905-525-9140x23604 > > > http://socserv.mcmaster.ca/jfox > > > -------------------------------- > > > > > > > -----Original Message----- > > > > From: Liaw, Andy [mailto:andy_liaw at merck.com] > > > > Sent: Friday, April 15, 2005 9:51 PM > > > > To: 'John Fox'; MSchwartz at medanalytics.com > > > > Cc: 'R-Help'; 'Dren Scott' > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > We can be a bit sneaky and `borrow' code from cor.test.default: > > > > > > > > cor.pval <- function(x, alternative="two-sided", ...) { > corMat <- > > > > cor(x, ...) n <- nrow(x) df <- n - 2 STATISTIC <- > > sqrt(df) * corMat > > > > / sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if > > (alternative == > > > > "less") { p } else if (alternative == "greater") { > > > > 1 - p > > > > } else 2 * pmin(p, 1 - p) > > > > p > > > > } > > > > > > > > The test: > > > > > > > > > system.time(c1 <- cor.pvals(X), gcFirst=TRUE) > > > > [1] 13.19 0.01 13.58 NA NA > > > > > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE) > > > > [1] 6.22 0.00 6.42 NA NA > > > > > system.time(c3 <- cor.pval(X), gcFirst=TRUE) > > > > [1] 0.07 0.00 0.07 NA NA > > > > > > > > Cheers, > > > > Andy > > > > > > > > > From: John Fox > > > > > > > > > > Dear Mark, > > > > > > > > > > I think that the reflex of trying to avoid loops in R > is often > > > > > mistaken, and so I decided to try to time the two > > > approaches (on a > > > > > 3GHz Windows XP system). > > > > > > > > > > I discovered, first, that there is a bug in your > > function -- you > > > > > appear to have indexed rows instead of columns; fixing that: > > > > > > > > > > cor.pvals <- function(mat) > > > > > { > > > > > cols <- expand.grid(1:ncol(mat), 1:ncol(mat)) > > matrix(apply(cols, > > > > > 1, > > > > > function(x) cor.test(mat[, x[1]], mat[, > x[2]])$p.value), ncol = > > > > > ncol(mat)) } > > > > > > > > > > > > > > > My function is cor.pvalues and yours cor.pvals. This is > > > for a data > > > > > matrix with 1000 observations on 100 variables: > > > > > > > > > > > R <- diag(100) > > > > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > > > > > library(mvtnorm) > > > > > > X <- rmvnorm(1000, sigma=R) > > > > > > dim(X) > > > > > [1] 1000 100 > > > > > > > > > > > > system.time(cor.pvalues(X)) > > > > > [1] 5.53 0.00 5.53 NA NA > > > > > > > > > > > > system.time(cor.pvals(X)) > > > > > [1] 12.66 0.00 12.66 NA NA > > > > > > > > > > > > > > > > I frankly didn't expect the advantage of my approach to be > > > > this large, > > > > > but there it is. > > > > > > > > > > Regards, > > > > > John > > > > > > > > > > -------------------------------- > > > > > John Fox > > > > > Department of Sociology > > > > > McMaster University > > > > > Hamilton, Ontario > > > > > Canada L8S 4M4 > > > > > 905-525-9140x23604 > > > > > http://socserv.mcmaster.ca/jfox > > > > > -------------------------------- > > > > > > > > > > > -----Original Message----- > > > > > > From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com] > > > > > > Sent: Friday, April 15, 2005 7:08 PM > > > > > > To: John Fox > > > > > > Cc: 'Dren Scott'; R-Help > > > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > > > > > Here is what might be a slightly more efficient way to > > > > get to John's > > > > > > question: > > > > > > > > > > > > cor.pvals <- function(mat) > > > > > > { > > > > > > rows <- expand.grid(1:nrow(mat), 1:nrow(mat)) > > matrix(apply(rows, > > > > > > 1, > > > > > > function(x) cor.test(mat[x[1], ], mat[x[2], > > ])$p.value), ncol = > > > > > > nrow(mat)) } > > > > > > > > > > > > HTH, > > > > > > > > > > > > Marc Schwartz > > > > > > > > > > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote: > > > > > > > Dear Dren, > > > > > > > > > > > > > > How about the following? > > > > > > > > > > > > > > cor.pvalues <- function(X){ > > > > > > > nc <- ncol(X) > > > > > > > res <- matrix(0, nc, nc) > > > > > > > for (i in 2:nc){ > > > > > > > for (j in 1:(i - 1)){ > > > > > > > res[i, j] <- res[j, i] <- cor.test(X[,i], > > > > > X[,j])$p.value > > > > > > > } > > > > > > > } > > > > > > > res > > > > > > > } > > > > > > > > > > > > > > What one then does with all of those non-independent test > > > > > > is another > > > > > > > question, I guess. > > > > > > > > > > > > > > I hope this helps, > > > > > > > John > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: r-help-bounces at stat.math.ethz.ch > > > > > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > > > > > Dren Scott > > > > > > > > Sent: Friday, April 15, 2005 4:33 PM > > > > > > > > To: r-help at stat.math.ethz.ch > > > > > > > > Subject: [R] Pearson corelation and p-value for matrix > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > I was trying to evaluate the pearson correlation and > > > > > the p-values > > > > > > > > for an nxm matrix, where each row represents a vector. > > > > > > One way to do > > > > > > > > it would be to iterate through each row, and find its > > > > > correlation > > > > > > > > value( and the p-value) with respect to the other rows. > > > > > Is there > > > > > > > > some function by which I can use the matrix as input? > > > > > > Ideally, the > > > > > > > > output would be an nxn matrix, containing the p-values > > > > > > between the > > > > > > > > respective vectors. > > > > > > > > > > > > > > > > I have tried cor.test for the iterations, but > > > couldn't find a > > > > > > > > function that would take the matrix as input. > > > > > > > > > > > > > > > > Thanks for the help. > > > > > > > > > > > > > > > > Dren > > > > > > > > > > > > > > > > > > > > > > ______________________________________________ > > > > > R-help at stat.math.ethz.ch mailing list > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > PLEASE do read the posting guide! > > > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > ---------------- > > > > Notice: This e-mail message, together with any > > attachments, contains > > > > information of Merck & Co., Inc. (One Merck Drive, Whitehouse > > > > Station, New Jersey, USA 08889), and/or its affiliates > > (which may be > > > > known outside the United States as Merck Frosst, Merck > > Sharp & Dohme > > > > or MSD and in Japan, as > > > > Banyu) that may be confidential, proprietary copyrighted and/or > > > > legally privileged. It is intended solely for the use of the > > > > individual or entity named on this message. If you are not the > > > > intended recipient, and have received this message in > > error, please > > > > notify us immediately by reply e-mail and then delete it > > from your > > > > system. > > > > -------------------------------------------------------------- > > > > ---------------- > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------------- > > > > -------------------------------------------------------------- > > ---------------- > > > > > > --------------------------------- > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
> From: John Fox > > Dear Andy, > > Very nice! (My point was that if this is a one-time thing, for Dren to > puzzle over it is probably more time-consuming than simply doing it > inefficiently.)There's a memory/speed tradeoff. The loop avoids creating multiple 6000x6000 matrices that my version would require, and one of those would take up 274MB! I could not run my function on a 6000x6000 case on my laptop (nor rcorr). The example I showed was 2000x2000. Also, as Frank pointed out, it's not flexible. As John said, it's good for one-time use. For more general consumption, one would want to write more flexible and robust code, in which case one might very well re-invent rcorr... Cheers, Andy> Regards, > John > > -------------------------------- > John Fox > Department of Sociology > McMaster University > Hamilton, Ontario > Canada L8S 4M4 > 905-525-9140x23604 > http://socserv.mcmaster.ca/jfox > -------------------------------- > > > -----Original Message----- > > From: Liaw, Andy [mailto:andy_liaw at merck.com] > > Sent: Monday, April 18, 2005 2:40 PM > > To: 'John Fox'; 'Dren Scott' > > Cc: 'R-Help' > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > I believe this will do: > > > > cor.pval2 <- function(x, alternative="two-sided") { > > corMat <- cor(x, use=if (any(is.na(x))) > > "pairwise.complete.obs" else > > "all") > > df <- crossprod(!is.na(x)) - 2 > > STATISTIC <- sqrt(df) * corMat / sqrt(1 - corMat^2) > > p <- pt(STATISTIC, df) > > p <- if (alternative == "less") { > > p > > } else if (alternative == "greater") { > > 1 - p > > } else 2 * pmin(p, 1 - p) > > p > > } > > > > Some test: > > > > > set.seed(1) > > > x <- matrix(runif(2e3 * 1e2), 2e3) > > > system.time(res1 <- cor.pval(t(x)), gcFirst=TRUE) > > [1] 17.28 0.77 18.16 NA NA > > > system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE) > > [1] 19.51 1.05 20.70 NA NA > > > max(abs(res1 - res2)) > > [1] 0 > > > x[c(1, 3, 6), c(2, 4)] <- NA > > > x[30, 61] <- NA > > > system.time(res2 <- cor.pval2(t(x)), gcFirst=TRUE) > > [1] 24.48 0.71 25.28 NA NA > > > > This is a bit slower because of the extra computation for > > "df". One can try to save some computation by only computing > > with the lower (or upper) triangular part. > > > > Cheers, > > Andy > > > > > From: John Fox > > > > > > Dear Dren, > > > > > > Since cor(), on which Andy's solution is based, can compute > > > pairwise-present correlations, you could adapt his function > > -- you'll > > > have to adjust the df for each pair. Alternatively, you > > could probably > > > save some time (programming time + computer time) by just > using my > > > solution: > > > > > > > R <- diag(100) > > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > > > library(mvtnorm) > > > > X <- rmvnorm(6000, sigma=R) > > > > system.time(for (i in 1:50) cor.pvalues(X), gc=TRUE) > > > [1] 518.19 1.11 520.23 NA NA > > > > > > I know that time is money, but nine minutes (on my machine) > > probably > > > won't bankrupt anyone. > > > > > > Regards, > > > John > > > > > > -------------------------------- > > > John Fox > > > Department of Sociology > > > McMaster University > > > Hamilton, Ontario > > > Canada L8S 4M4 > > > 905-525-9140x23604 > > > http://socserv.mcmaster.ca/jfox > > > -------------------------------- > > > > > > > -----Original Message----- > > > > From: r-help-bounces at stat.math.ethz.ch > > > > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of > Dren Scott > > > > Sent: Monday, April 18, 2005 12:41 PM > > > > To: 'R-Help' > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > Hi all, > > > > > > > > Thanks Andy, Mark and John for all the help. I really > > appreciate it. > > > > I'm new to both R and statistics, so please excuse any > > gaffes on my > > > > part. > > > > > > > > Essentially what I'm trying to do, is to evaluate for > > each row, how > > > > many other rows would have a p-value < 0.05. So, after I > > get my N x > > > > N p-value matrix, I'll just filter out values that are > 0.05. > > > > > > > > Each of my datasets (6000 rows x 100 columns) would > > consist of some > > > > NA's. The iterative procedure (cor.pvalues) proposed by > > John would > > > > yield the values, but it would take an inordinately > long time (I > > > > have 50 of these datasets to process). The solution proposed by > > > > Andy, although fast, would not be able to incorporate the NA's. > > > > > > > > Is there any workaround for the NA's? Or possibly do > you think I > > > > could try something else? > > > > > > > > Thanks very much. > > > > > > > > Dren > > > > > > > > > > > > "Liaw, Andy" <andy_liaw at merck.com> wrote: > > > > > From: John Fox > > > > > > > > > > Dear Andy, > > > > > > > > > > That's clearly much better -- and illustrates an > > > effective strategy > > > > > for vectorizing (or "matricizing") a computation. I think > > > I'll add > > > > > this to my list of programming examples. It might be a little > > > > > dangerous to pass ... > > > > > through to cor(), since someone could specify > > > type="spearman", for > > > > > example. > > > > > > > > Ah, yes, the "..." isn't likely to help here! Also, it > will only > > > > work correctly if there are no NA's, for example (or else > > the degree > > > > of freedom would be wrong). > > > > > > > > Best, > > > > Andy > > > > > > > > > Thanks, > > > > > John > > > > > > > > > > -------------------------------- > > > > > John Fox > > > > > Department of Sociology > > > > > McMaster University > > > > > Hamilton, Ontario > > > > > Canada L8S 4M4 > > > > > 905-525-9140x23604 > > > > > http://socserv.mcmaster.ca/jfox > > > > > -------------------------------- > > > > > > > > > > > -----Original Message----- > > > > > > From: Liaw, Andy [mailto:andy_liaw at merck.com] > > > > > > Sent: Friday, April 15, 2005 9:51 PM > > > > > > To: 'John Fox'; MSchwartz at medanalytics.com > > > > > > Cc: 'R-Help'; 'Dren Scott' > > > > > > Subject: RE: [R] Pearson corelation and p-value for matrix > > > > > > > > > > > > We can be a bit sneaky and `borrow' code from > > cor.test.default: > > > > > > > > > > > > cor.pval <- function(x, alternative="two-sided", ...) { > > > corMat <- > > > > > > cor(x, ...) n <- nrow(x) df <- n - 2 STATISTIC <- > > > > sqrt(df) * corMat > > > > > > / sqrt(1 - corMat^2) p <- pt(STATISTIC, df) p <- if > > > > (alternative => > > > > > "less") { p } else if (alternative == "greater") { > > > > > > 1 - p > > > > > > } else 2 * pmin(p, 1 - p) > > > > > > p > > > > > > } > > > > > > > > > > > > The test: > > > > > > > > > > > > > system.time(c1 <- cor.pvals(X), gcFirst=TRUE) > > > > > > [1] 13.19 0.01 13.58 NA NA > > > > > > > system.time(c2 <- cor.pvalues(X), gcFirst=TRUE) > > > > > > [1] 6.22 0.00 6.42 NA NA > > > > > > > system.time(c3 <- cor.pval(X), gcFirst=TRUE) > > > > > > [1] 0.07 0.00 0.07 NA NA > > > > > > > > > > > > Cheers, > > > > > > Andy > > > > > > > > > > > > > From: John Fox > > > > > > > > > > > > > > Dear Mark, > > > > > > > > > > > > > > I think that the reflex of trying to avoid loops in R > > > is often > > > > > > > mistaken, and so I decided to try to time the two > > > > > approaches (on a > > > > > > > 3GHz Windows XP system). > > > > > > > > > > > > > > I discovered, first, that there is a bug in your > > > > function -- you > > > > > > > appear to have indexed rows instead of columns; > fixing that: > > > > > > > > > > > > > > cor.pvals <- function(mat) > > > > > > > { > > > > > > > cols <- expand.grid(1:ncol(mat), 1:ncol(mat)) > > > > matrix(apply(cols, > > > > > > > 1, > > > > > > > function(x) cor.test(mat[, x[1]], mat[, > > > x[2]])$p.value), ncol > > > > > > > ncol(mat)) } > > > > > > > > > > > > > > > > > > > > > My function is cor.pvalues and yours cor.pvals. This is > > > > > for a data > > > > > > > matrix with 1000 observations on 100 variables: > > > > > > > > > > > > > > > R <- diag(100) > > > > > > > > R[upper.tri(R)] <- R[lower.tri(R)] <- .5 > > > > > > > > library(mvtnorm) > > > > > > > > X <- rmvnorm(1000, sigma=R) > > > > > > > > dim(X) > > > > > > > [1] 1000 100 > > > > > > > > > > > > > > > > system.time(cor.pvalues(X)) > > > > > > > [1] 5.53 0.00 5.53 NA NA > > > > > > > > > > > > > > > > system.time(cor.pvals(X)) > > > > > > > [1] 12.66 0.00 12.66 NA NA > > > > > > > > > > > > > > > > > > > > > > I frankly didn't expect the advantage of my approach to be > > > > > > this large, > > > > > > > but there it is. > > > > > > > > > > > > > > Regards, > > > > > > > John > > > > > > > > > > > > > > -------------------------------- John Fox Department of > > > > > > > Sociology McMaster University Hamilton, Ontario > > Canada L8S 4M4 > > > > > > > 905-525-9140x23604 > > > > > > > http://socserv.mcmaster.ca/jfox > > > > > > > -------------------------------- > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Marc Schwartz [mailto:MSchwartz at MedAnalytics.com] > > > > > > > > Sent: Friday, April 15, 2005 7:08 PM > > > > > > > > To: John Fox > > > > > > > > Cc: 'Dren Scott'; R-Help > > > > > > > > Subject: RE: [R] Pearson corelation and p-value > for matrix > > > > > > > > > > > > > > > > Here is what might be a slightly more efficient way to > > > > > > get to John's > > > > > > > > question: > > > > > > > > > > > > > > > > cor.pvals <- function(mat) > > > > > > > > { > > > > > > > > rows <- expand.grid(1:nrow(mat), 1:nrow(mat)) > > > > matrix(apply(rows, > > > > > > > > 1, > > > > > > > > function(x) cor.test(mat[x[1], ], mat[x[2], > > > > ])$p.value), ncol = > > > > > > > > nrow(mat)) } > > > > > > > > > > > > > > > > HTH, > > > > > > > > > > > > > > > > Marc Schwartz > > > > > > > > > > > > > > > > On Fri, 2005-04-15 at 18:26 -0400, John Fox wrote: > > > > > > > > > Dear Dren, > > > > > > > > > > > > > > > > > > How about the following? > > > > > > > > > > > > > > > > > > cor.pvalues <- function(X){ > > > > > > > > > nc <- ncol(X) > > > > > > > > > res <- matrix(0, nc, nc) > > > > > > > > > for (i in 2:nc){ > > > > > > > > > for (j in 1:(i - 1)){ > > > > > > > > > res[i, j] <- res[j, i] <- cor.test(X[,i], > > > > > > > X[,j])$p.value > > > > > > > > > } > > > > > > > > > } > > > > > > > > > res > > > > > > > > > } > > > > > > > > > > > > > > > > > > What one then does with all of those > > non-independent test > > > > > > > > is another > > > > > > > > > question, I guess. > > > > > > > > > > > > > > > > > > I hope this helps, > > > > > > > > > John > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: r-help-bounces at stat.math.ethz.ch > > > > > > > > > > [mailto:r-help-bounces at stat.math.ethz.ch] > On Behalf Of > > > > > > > Dren Scott > > > > > > > > > > Sent: Friday, April 15, 2005 4:33 PM > > > > > > > > > > To: r-help at stat.math.ethz.ch > > > > > > > > > > Subject: [R] Pearson corelation and p-value > for matrix > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I was trying to evaluate the pearson correlation and > > > > > > > the p-values > > > > > > > > > > for an nxm matrix, where each row represents > > a vector. > > > > > > > > One way to do > > > > > > > > > > it would be to iterate through each row, > and find its > > > > > > > correlation > > > > > > > > > > value( and the p-value) with respect to the > > other rows. > > > > > > > Is there > > > > > > > > > > some function by which I can use the matrix > as input? > > > > > > > > Ideally, the > > > > > > > > > > output would be an nxn matrix, containing > the p-values > > > > > > > > between the > > > > > > > > > > respective vectors. > > > > > > > > > > > > > > > > > > > > I have tried cor.test for the iterations, but > > > > > couldn't find a > > > > > > > > > > function that would take the matrix as input. > > > > > > > > > > > > > > > > > > > > Thanks for the help. > > > > > > > > > > > > > > > > > > > > Dren > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ______________________________________________ > > > > > > > R-help at stat.math.ethz.ch mailing list > > > > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > > > > PLEASE do read the posting guide! > > > > > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > > > ---------------- > > > > > > Notice: This e-mail message, together with any > > > > attachments, contains > > > > > > information of Merck & Co., Inc. (One Merck Drive, > Whitehouse > > > > > > Station, New Jersey, USA 08889), and/or its affiliates > > > > (which may be > > > > > > known outside the United States as Merck Frosst, Merck > > > > Sharp & Dohme > > > > > > or MSD and in Japan, as > > > > > > Banyu) that may be confidential, proprietary > > copyrighted and/or > > > > > > legally privileged. It is intended solely for the > use of the > > > > > > individual or entity named on this message. If you > > are not the > > > > > > intended recipient, and have received this message in > > > > error, please > > > > > > notify us immediately by reply e-mail and then delete it > > > > from your > > > > > > system. > > > > > > > -------------------------------------------------------------- > > > > > > ---------------- > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > > > ---------------- > > > > > > > > -------------------------------------------------------------- > > > > ---------------- > > > > > > > > > > > > --------------------------------- > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > > > ______________________________________________ > > > > R-help at stat.math.ethz.ch mailing list > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide! > > > > http://www.R-project.org/posting-guide.html > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > > ---------------- > > Notice: This e-mail message, together with any attachments, > > contains information of Merck & Co., Inc. (One Merck Drive, > > Whitehouse Station, New Jersey, USA 08889), and/or its > > affiliates (which may be known outside the United States as > > Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as > > Banyu) that may be confidential, proprietary copyrighted > > and/or legally privileged. It is intended solely for the use > > of the individual or entity named on this message. If you > > are not the intended recipient, and have received this > > message in error, please notify us immediately by reply > > e-mail and then delete it from your system. > > -------------------------------------------------------------- > > ---------------- > > > >