thr3ads.net - R help - [R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices [Nov 2008]

If this information is useful, please help other people find it:
Share via:

Daren Tan

2008-Nov-26 04:55 UTC

[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

My two matrices are roughly the sizes of m1 and m2. I tried using two apply and
cor.test to compute the correlation p.values. More than an hour, and the codes
are still running. Please help to make it more efficient.
 
m1 <- matrix(rnorm(100000), ncol=100)
m2 <- matrix(rnorm(10000000), ncol=100)

cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
cor.test(x,y)$p.value }) })

jim holtman

2008-Nov-26 14:14 UTC

head link

[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

Your time is being taken up in cor.test because you are calling it
100,000 times.  So grin and bear it with the amount of work you are
asking it to do.

Here I am only calling it 100 time:
> m1 <- matrix(rnorm(10000), ncol=100)
> m2 <- matrix(rnorm(10000), ncol=100)
> Rprof('/tempxx.txt')
> system.time(cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1,
function(y) { cor.test(x,y)$p.value }) }))   user  system elapsed
   8.86    0.00    8.89>
so my guess is that calling it 100,000 times will take:  100,000 *
0.0886 seconds or about 3 hours.

If you run Rprof, you will see if is spending most of its time there:

  0   8.8 root
  1.    8.8 apply
  2. .    8.8 FUN
  3. . .    8.8 apply
  4. . . .    8.7 FUN
  5. . . . .    8.6 cor.test
  6. . . . . .    8.4 cor.test.default
  7. . . . . . .    2.4 match.arg
  8. . . . . . . .    1.7 eval
  9. . . . . . . . .    1.4 deparse
 10. . . . . . . . . .    0.6 .deparseOpts
 11. . . . . . . . . . .    0.2 pmatch
 11. . . . . . . . . . .    0.1 sum
 10. . . . . . . . . .    0.5 %in%
 11. . . . . . . . . . .    0.3 match
 12. . . . . . . . . . . .    0.3 is.factor
 13. . . . . . . . . . . . .    0.3 inherits
  8. . . . . . . .    0.2 formals
  9. . . . . . . . .    0.2 sys.function
  7. . . . . . .    2.1 cor
  8. . . . . . . .    1.1 match.arg
  9. . . . . . . . .    0.7 eval
 10. . . . . . . . . .    0.6 deparse
 11. . . . . . . . . . .    0.3 .deparseOpts
 12. . . . . . . . . . . .    0.1 pmatch
 11. . . . . . . . . . .    0.2 %in%
 12. . . . . . . . . . . .    0.2 match
 13. . . . . . . . . . . . .    0.1 is.factor
 14. . . . . . . . . . . . . .    0.1 inherits
  9. . . . . . . . .    0.1 formals
  8. . . . . . . .    0.5 stopifnot
  9. . . . . . . . .    0.2 match.call
  8. . . . . . . .    0.1 pmatch
  8. . . . . . . .    0.1 is.data.frame
  9. . . . . . . . .    0.1 inherits
  7. . . . . . .    1.5 paste
  8. . . . . . . .    1.4 deparse
  9. . . . . . . . .    0.6 .deparseOpts
 10. . . . . . . . . .    0.3 pmatch
 10. . . . . . . . . .    0.1 any
  9. . . . . . . . .    0.6 %in%
 10. . . . . . . . . .    0.6 match
 11. . . . . . . . . . .    0.5 is.factor
 12. . . . . . . . . . . .    0.4 inherits
 13. . . . . . . . . . . . .    0.2 mode
  7. . . . . . .    0.4 switch
  8. . . . . . . .    0.1 qnorm
  7. . . . . . .    0.2 pt
  5. . . . .    0.1 $

On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76 at hotmail.com>
wrote:>
> My two matrices are roughly the sizes of m1 and m2. I tried using two apply
and cor.test to compute the correlation p.values. More than an hour, and the
codes are still running. Please help to make it more efficient.
>
> m1 <- matrix(rnorm(100000), ncol=100)
> m2 <- matrix(rnorm(10000000), ncol=100)
>
> cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
cor.test(x,y)$p.value }) })
>
> ______________________________________________
> R-help at r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Jorge Ivan Velez

2008-Nov-26 15:16 UTC

head link

[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

Hi Daren,
Here is another aproach a little bit faster taking into account that I'm
using your original matrices.  My session info is at the end. I'm using a
2.4 GHz Core 2-Duo processor and 3 GB of RAM.

 # Data
 set.seed(123)
 m1 <- matrix(rnorm(100000), ncol=100)
 m2 <- matrix(rnorm(100000), ncol=100)
 colnames(m1)=paste('m1_',1:100,sep="")
 colnames(m2)=paste('m2_',1:100,sep="")

# Combinations
 combs=expand.grid(colnames(m1),colnames(m2))

# ---------------
# Option 1
#----------------
system.time(apply(combs,1,function(x)
cor.test(m1[,x[1]],m2[,x[2]])$p.value)->pvalues1)
#  user  system elapsed
#   8.12    0.01    8.20

# ---------------
# Option 2
#----------------
require(Hmisc)
system.time(apply(combs,1,function(x)
rcorr(m1[,x[1]],m2[,x[2]])$P[2])->pvalues2)
#   user  system elapsed
#   7.00    0.00    7.02


HTH,

Jorge


# -------------  Session Info ----------------------------
R version 2.8.0 Patched (2008-11-08 r46864)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base



On Tue, Nov 25, 2008 at 11:55 PM, Daren Tan <daren76@hotmail.com> wrote:
>
> My two matrices are roughly the sizes of m1 and m2. I tried using two apply
> and cor.test to compute the correlation p.values. More than an hour, and
the
> codes are still running. Please help to make it more efficient.
>
> m1 <- matrix(rnorm(100000), ncol=100)
> m2 <- matrix(rnorm(10000000), ncol=100)
>
> cor.pvalues <- apply(m1, 1, function(x) { apply(m2, 1, function(y) {
> cor.test(x,y)$p.value }) })
>
> ______________________________________________
> R-help@r-project.org mailing list
> stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more reasonably related threads

R help - Nov 2008 - Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

[R] Very slow: using double apply and cor.test to compute correlation p.values for 2 matrices

Reasonably Related Threads