Tan, Richard
2009-Feb-12 16:19 UTC
[R] get top 50 correlated item from a correlation matrix for each item
Hi, I have a correlation matrix of about 3000 items, i.e., a 3000*3000 matrix. For each of the 3000 items, I want to get the top 50 items that have the highest correlation with it (excluding itself) and generate a data frame with 3 columns like ("ID", "ID2", "cor"), where ID is those 3000 items each repeat 50 times, and ID2 is the top 50 correlated items with ID, and cor is the correlation of ID and ID2. I know I can use two for loops to do it but it is very time consuming considering the correlation matrix is generated for each month of the past 20 years. Is there a better way to do it? Regards, Richard [[alternative HTML version deleted]]
Dimitris Rizopoulos
2009-Feb-12 17:10 UTC
[R] get top 50 correlated item from a correlation matrix for each item
a possible vectorized solution is the following: cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100)) p <- 30 # how many top items n <- ncol(cor.mat) cmat <- col(cor.mat) ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n) dim(ind) <- dim(cor.mat) ind <- ind[seq(2, p + 1), ] out <- cbind(ID = c(col(ind)), ID2 = c(ind)) as.data.frame(cbind(out, cor = cor.mat[out])) I hope it helps. Best, Dimitris Tan, Richard wrote:> Hi, > > I have a correlation matrix of about 3000 items, i.e., a 3000*3000 > matrix. For each of the 3000 items, I want to get the top 50 items that > have the highest correlation with it (excluding itself) and generate a > data frame with 3 columns like ("ID", "ID2", "cor"), where ID is those > 3000 items each repeat 50 times, and ID2 is the top 50 correlated items > with ID, and cor is the correlation of ID and ID2. I know I can use two > for loops to do it but it is very time consuming considering the > correlation matrix is generated for each month of the past 20 years. Is > there a better way to do it? > > Regards, > > Richard > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
Tan, Richard
2009-Feb-12 17:26 UTC
[R] get top 50 correlated item from a correlation matrix for each item
Works like a charm, thank you! -----Original Message----- From: Dimitris Rizopoulos [mailto:d.rizopoulos at erasmusmc.nl] Sent: Thursday, February 12, 2009 12:11 PM To: Tan, Richard Cc: r-help at r-project.org Subject: Re: [R] get top 50 correlated item from a correlation matrix for each item a possible vectorized solution is the following: cor.mat <- cor(matrix(rnorm(100*1000), 1000, 100)) p <- 30 # how many top items n <- ncol(cor.mat) cmat <- col(cor.mat) ind <- order(-cmat, cor.mat, decreasing = TRUE) - (n * cmat - n) dim(ind) <- dim(cor.mat) ind <- ind[seq(2, p + 1), ] out <- cbind(ID = c(col(ind)), ID2 = c(ind)) as.data.frame(cbind(out, cor = cor.mat[out])) I hope it helps. Best, Dimitris Tan, Richard wrote:> Hi, > > I have a correlation matrix of about 3000 items, i.e., a 3000*3000 > matrix. For each of the 3000 items, I want to get the top 50 items > that have the highest correlation with it (excluding itself) and > generate a data frame with 3 columns like ("ID", "ID2", "cor"), where > ID is those 3000 items each repeat 50 times, and ID2 is the top 50 > correlated items with ID, and cor is the correlation of ID and ID2. I> know I can use two for loops to do it but it is very time consuming > considering the correlation matrix is generated for each month of the > past 20 years. Is there a better way to do it? > > Regards, > > Richard > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
JLucke at ria.buffalo.edu
2009-Feb-12 17:31 UTC
[R] get top 50 correlated item from a correlation matrix for each item
A solution using a toy example r <- cor(mvrnorm(1000,mu=rep(0,10),Sigma=diag(10))) #assume a 10 x 10 matrix j <- i<-1:dim(r)[1] #generate matrix indices lt <- outer(i,j,'>') #get boolean lower triangle sort(r[lt],decreasing=TRUE)[1:5] #extract top 5 correlations Joseph F. Lucke Senior Statistician Research Institute on Addictions University at Buffalo SUNY "Tan, Richard" <RTan@panagora.com> Sent by: r-help-bounces@r-project.org 02/12/2009 11:19 AM To <r-help@r-project.org> cc Subject [R] get top 50 correlated item from a correlation matrix for each item Hi, I have a correlation matrix of about 3000 items, i.e., a 3000*3000 matrix. For each of the 3000 items, I want to get the top 50 items that have the highest correlation with it (excluding itself) and generate a data frame with 3 columns like ("ID", "ID2", "cor"), where ID is those 3000 items each repeat 50 times, and ID2 is the top 50 correlated items with ID, and cor is the correlation of ID and ID2. I know I can use two for loops to do it but it is very time consuming considering the correlation matrix is generated for each month of the past 20 years. Is there a better way to do it? Regards, Richard [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]