Tan, Richard
2009-Feb-23 17:59 UTC
[R] Get top cluster for each item in a correlation matrix
Hi, I posted a question a few days ago and got extremely well response. https://stat.ethz.ch/pipermail/r-help/2009-February/188225.html. Now I have a somewhat related question: I have a correlation matrix of about 3000 items, with 1 on diagonal ( for example, cor.mat <- cor(matrix(rnorm(3000*1000), 1000, 3000)) ). For each item in the matrix, I want to find the cluster of which 1 belongs to, i.e., the cluster with the highest correlation coeffs, and generate a data frame with 3 columns like ("ID", "ID2", "cor"), where in each row ID is one of those 3000 items, and ID2 is ID of items with in that top cluster, and cor is the correlation of ID and ID2. The cluster method is fanny, setting number of clusters to 60. It is very time consuming to do a for loop like this: for (i in 1:ncol(cor.mat)) { f <- fanny(cor.mat[,i],60) temp <- cbind(ID = i,ID2 = f$clustering, cor = cor.mat[,i]) temp <- temp[which(temp[,2]==f$clustering[i]),] if (i == 1) { out <- temp } else { out <- rbind(out,temp) } } out Is there a better way to do it? Thanks, Richard [[alternative HTML version deleted]]