Daren Tan
2008-Nov-26 04:55 UTC
[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. m1 <- matrix(rnorm(100000), ncol=100) m2 <- matrix(rnorm(10000000), ncol=100) cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) })
jim holtman
2008-Nov-26 14:14 UTC
[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Your time is being taken up in cor.test because you are calling it 100,000 times. So grin and bear it with the amount of work you are asking it to do. Here I am only calling it 100 time:> m1 <- matrix(rnorm(10000), ncol=100) > m2 <- matrix(rnorm(10000), ncol=100) > Rprof('/tempxx.txt') > system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }))user system elapsed 8.86 0.00 8.89>so my guess is that calling it 100,000 times will take: 100,000 * 0.0886 seconds or about 3 hours. If you run Rprof, you will see if is spending most of its time there: 0 8.8 root 1. 8.8 apply 2. . 8.8 FUN 3. . . 8.8 apply 4. . . . 8.7 FUN 5. . . . . 8.6 cor.test 6. . . . . . 8.4 cor.test.default 7. . . . . . . 2.4 match.arg 8. . . . . . . . 1.7 eval 9. . . . . . . . . 1.4 deparse 10. . . . . . . . . . 0.6 .deparseOpts 11. . . . . . . . . . . 0.2 pmatch 11. . . . . . . . . . . 0.1 sum 10. . . . . . . . . . 0.5 %in% 11. . . . . . . . . . . 0.3 match 12. . . . . . . . . . . . 0.3 is.factor 13. . . . . . . . . . . . . 0.3 inherits 8. . . . . . . . 0.2 formals 9. . . . . . . . . 0.2 sys.function 7. . . . . . . 2.1 cor 8. . . . . . . . 1.1 match.arg 9. . . . . . . . . 0.7 eval 10. . . . . . . . . . 0.6 deparse 11. . . . . . . . . . . 0.3 .deparseOpts 12. . . . . . . . . . . . 0.1 pmatch 11. . . . . . . . . . . 0.2 %in% 12. . . . . . . . . . . . 0.2 match 13. . . . . . . . . . . . . 0.1 is.factor 14. . . . . . . . . . . . . . 0.1 inherits 9. . . . . . . . . 0.1 formals 8. . . . . . . . 0.5 stopifnot 9. . . . . . . . . 0.2 match.call 8. . . . . . . . 0.1 pmatch 8. . . . . . . . 0.1 is.data.frame 9. . . . . . . . . 0.1 inherits 7. . . . . . . 1.5 paste 8. . . . . . . . 1.4 deparse 9. . . . . . . . . 0.6 .deparseOpts 10. . . . . . . . . . 0.3 pmatch 10. . . . . . . . . . 0.1 any 9. . . . . . . . . 0.6 %in% 10. . . . . . . . . . 0.6 match 11. . . . . . . . . . . 0.5 is.factor 12. . . . . . . . . . . . 0.4 inherits 13. . . . . . . . . . . . . 0.2 mode 7. . . . . . . 0.4 switch 8. . . . . . . . 0.1 qnorm 7. . . . . . . 0.2 pt 5. . . . . 0.1 $ On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76 at hotmail.com> wrote:> > My two matrices are roughly the sizes of m1 and m2. I tried using two apply and cor.test to compute the correlation p.values. More than an hour, and the codes are still running. Please help to make it more efficient. > > m1 <- matrix(rnorm(100000), ncol=100) > m2 <- matrix(rnorm(10000000), ncol=100) > > cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { cor.test(x,y)$p.value }) }) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Jorge Ivan Velez
2008-Nov-26 15:16 UTC
[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices
Hi Daren, Here is another aproach a little bit faster taking into account that I'm using your original matrices. My session info is at the end. I'm using a 2.4 GHz Core 2-Duo processor and 3 GB of RAM. # Data set.seed(123) m1 <- matrix(rnorm(100000), ncol=100) m2 <- matrix(rnorm(100000), ncol=100) colnames(m1)=paste('m1_',1:100,sep="") colnames(m2)=paste('m2_',1:100,sep="") # Combinations combs=expand.grid(colnames(m1),colnames(m2)) # --------------- # Option 1 #---------------- system.time(apply(combs,1,function(x) cor.test(m1[,x[1]],m2[,x[2]])$p.value)->pvalues1) # user system elapsed # 8.12 0.01 8.20 # --------------- # Option 2 #---------------- require(Hmisc) system.time(apply(combs,1,function(x) rcorr(m1[,x[1]],m2[,x[2]])$P[2])->pvalues2) # user system elapsed # 7.00 0.00 7.02 HTH, Jorge # ------------- Session Info ---------------------------- R version 2.8.0 Patched (2008-11-08 r46864) i386-pc-mingw32 locale: LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76@hotmail.com> wrote:> > My two matrices are roughly the sizes of m1 and m2. I tried using two apply > and cor.test to compute the correlation p.values. More than an hour, and the > codes are still running. Please help to make it more efficient. > > m1 <- matrix(rnorm(100000), ncol=100) > m2 <- matrix(rnorm(10000000), ncol=100) > > cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) { > cor.test(x,y)$p.value }) }) > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Reasonably Related Threads
- Any simple way to subset a vector of strings that do contain a particular substring ?
- Identifying common prefixes from a vector of words, and delete those prefixes
- counting number of "G" in "TCGGGGGACAATCGGTAACCCGTCT"
- Beautify R scripts in microsoft word
- Can R do this ?