Hello, I need to calculate the correlation for all pairwise combinations in a very large matrix. I have 25,000 elements and need to calculate the pairwise correlation with a different set of 5,000 elements. I have written code that works, but it is extremely slow. At the current rate, it will take a few weeks to finish running. I'm looking for suggestions on performing the calculations more efficiently. Here's what I currently have: for (j in 1:dim(genes2)[1]){ for(i in 1:dim(genes1)[1]){ peak1<-as.vector(t(expression.data[genes1[i,1],])) peak2<-as.vector(t(expression.data[genes2[j,1],])) Cor.matrix[i,j]<-cor(peak1,peak2,method='s') } } Thanks so much. Alayne
Actually, I've answered my own question. It turns out that transposing the expression matrix first, outside of the loop, significantly improves the speed. It now looks like the entire matrix should be calculated in a day or two. So I think this solution should be fine. I now have this: expression.data<-t(expression.data) for (j in 1:dim(genes2)[1]){ for(i in 1:dim(genes1)[1]){ peak1<-as.vector(expression.data[genes1[i,1],]) peak2<-as.vector(expression.data[genes2[j,1],]) Cor.matrix[i,j]<-cor(peak1,peak2,method='s') } } Thanks for reading. Alayne
Seemingly Similar Threads
- Distance calculation
- matlab norm(h) command in R: sqrt(sum(h^2)) - use in an expression
- How to identify the two largest peaks in a trimodal distribution
- matlab norm(h) command in R: sqrt(sum(h^2)) - use in an e xpression
- Correlation matrix from a vector of pairwise correlations