thr3ads.net - R help - FW: [R] Optimising code [Oct 2003]

If this information is useful, please help other people find it:
Share via:

Crispin Miller

2003-Oct-07 09:43 UTC

FW: [R] Optimising code

>> I have a function that applies a wilcoxon test to 12 sets of about a
quarter of a million pairs
> ... and let me guess: everything is significiant to an almost arbitrary 
> value of \alpha?
:-) For each of quarter of a million sets, I do a wilcoxon between two pairs
each containing twenty numbers...
I do this 12 times...

> > (and takes about 3 hours). I've replaced the inner loop I 
> had originally with a function call via mapply, and also 
> considered different approximations  of the wilcoxon, rather 
> than that which is implemented in wilcox.test, but that makes 
> little difference 9and if anything slows things down :-).
> 
> Are you using wilcox.test(), or a self written function?
> 
I started off with wilcox.test, and tried to speed it up by hacking out as much
from the original function as possible, since a fair proportion of the code
being executed was logic deciding between the different flavours of the test (I
didn't expect that to make much difference, but tried it anyway), there was
also a call to pnorm() which I replaced with a numeric approximation. This is
not inlined into the original function.

Crispin
 
--------------------------------------------------------

 
This email is confidential and intended solely for the use o...{{dropped}}

Uwe Ligges

2003-Oct-07 10:18 UTC

head link

FW: [R] Optimising code

Crispin Miller wrote:
>>>I have a function that applies a wilcoxon test to 12 sets of about a
quarter of a million pairs
> 
> 
>>... and let me guess: everything is significiant to an almost arbitrary 
>>value of \alpha?
> 
> 
> :-) For each of quarter of a million sets, I do a wilcoxon between two
pairs each containing twenty numbers...
> I do this 12 times...
Ah. Sorry for misunderstanding.

>>>(and takes about 3 hours). I've replaced the inner loop I 
>>
>>had originally with a function call via mapply, and also 
>>considered different approximations  of the wilcoxon, rather 
>>than that which is implemented in wilcox.test, but that makes 
>>little difference 9and if anything slows things down :-).
>>
>>Are you using wilcox.test(), or a self written function?
>>
> 
> 
> I started off with wilcox.test, and tried to speed it up by hacking out as
much from the original function as possible, since a fair proportion of the code
being executed was logic deciding between the different flavours of the test (I
didn't expect that to make much difference, but tried it anyway), there was
also a call to pnorm() which I replaced with a numeric approximation. This is
not inlined into the original function.

In that case you cannot do very much except for writing it in a language 
like C / Fortran, if you really need the speed.

Uwe Ligges
> Crispin
>  
> --------------------------------------------------------
> 
>  
> This email is confidential and intended solely for the use o...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Crispin Miller

2003-Oct-07 11:21 UTC

head link

FW: [R] Optimising code

Yup, my mistake for not being very clear! Alas the computer is already very fast
- looks like it's C for me...
> -----Original Message-----
> From: Patrick Burns [mailto:pburns at pburns.seanet.com]
> Sent: 07 October 2003 12:00
> Cc: Crispin Miller
> Subject: Re: FW: [R] Optimising code
> 
> 
> I had misunderstood also.  I agree with Uwe either move
> to C or buy a faster computer -- or at least another one.
> 
> Pat
> 
> Uwe Ligges wrote:
> 
> > Crispin Miller wrote:
> >
> >>>> I have a function that applies a wilcoxon test to 12 
> sets of about 
> >>>> a quarter of a million pairs
> >>>
> >>
> >>
> >>> ... and let me guess: everything is significiant to an almost 
> >>> arbitrary value of \alpha?
> >>
> >>
> >>
> >> :-) For each of quarter of a million sets, I do a wilcoxon between
> >> two pairs each containing twenty numbers...
> >> I do this 12 times...
> >
> >
> > Ah. Sorry for misunderstanding.
> >
> >
> >>>> (and takes about 3 hours). I've replaced the inner
loop I
> >>>
> >>>
> >>> had originally with a function call via mapply, and also 
> considered 
> >>> different approximations  of the wilcoxon, rather than 
> that which is 
> >>> implemented in wilcox.test, but that makes little 
> difference 9and if 
> >>> anything slows things down :-).
> >>>
> >>> Are you using wilcox.test(), or a self written function?
> >>>
> >>
> >>
> >> I started off with wilcox.test, and tried to speed it up 
> by hacking 
> >> out as much from the original function as possible, since a fair 
> >> proportion of the code being executed was logic deciding 
> between the 
> >> different flavours of the test (I didn't expect that to make
much
> >> difference, but tried it anyway), there was also a call to pnorm()
> >> which I replaced with a numeric approximation. This is not inlined
> >> into the original function.
> >
> >
> >
> > In that case you cannot do very much except for writing it in a 
> > language like C / Fortran, if you really need the speed.
> >
> > Uwe Ligges
> >
> >> Crispin
> >>  
> >> --------------------------------------------------------
> >>
> >>  
> >> This email is confidential and intended solely for the use 
> >> o...{{dropped}}
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list
> >> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> >
> >
> 
> 
> 
> 
--------------------------------------------------------

 
This email is confidential and intended solely for the use o...{{dropped}}

Adaikalavan RAMASAMY

2003-Oct-07 11:46 UTC

head link

FW: [R] Optimising code

If you are only interested in calculating the wilcoxon statistic (as one
would with calculating permutated p-values for example), the following
should suffice.

length.na <- function(x, ...){
  tmp <- !(is.na(x) | is.infinite(x))
  length(x[tmp], ...)
}

rank.na <- function(x){
  y <- x[!is.na(x)]; 
  x[!is.na(x)] <- rank(y);
  return(x)
}

wstats.fn <- function(data=NULL, g1=NULL, g2=NULL){
  temp <- t( apply(data, 1, rank.na));
  n1 <- apply(temp[ ,g1], 1, length.na);
  n2 <- apply(temp[ ,g2], 1, length.na);

  wstats <- rowSums( temp[,g1], na.rm=TRUE ) - n1*(n2 + 1)/2;
  return(wstats);
}

m <- matrix( rnorm( 2500*100 ), ncol=100 )
w1 <- wstats.fn(data=m, g1=1:50, g2=51:100)
w2 <- apply(m, 1, function(x) wilcox.test( x[1:50], x[51:100] )$stat)

On my machine, it take 2.2s to calculate w1 compared to 15.6s for w2.
rank.na and length.na are not necessary if your data does not have
missing values. 

Since you are doing large enough times it might be wise to code in C. My
C version of this is at least 20-40 times faster than the one in R but
like the one above it does not return p-values.


--
Adaikalavan Ramasamy 


-----Original Message-----
From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] 
Sent: Tuesday, October 07, 2003 6:18 PM
To: Crispin Miller
Cc: R-help (E-mail)
Subject: Re: FW: [R] Optimising code


Crispin Miller wrote:
>>>I have a function that applies a wilcoxon test to 12 sets of about a
>>>quarter of a million pairs
> 
> 
>>... and let me guess: everything is significiant to an almost 
>>arbitrary
>>value of \alpha?
> 
> 
> :-) For each of quarter of a million sets, I do a wilcoxon between two
> pairs each containing twenty numbers... I do this 12 times...
Ah. Sorry for misunderstanding.

>>>(and takes about 3 hours). I've replaced the inner loop I
>>
>>had originally with a function call via mapply, and also
>>considered different approximations  of the wilcoxon, rather 
>>than that which is implemented in wilcox.test, but that makes 
>>little difference 9and if anything slows things down :-).
>>
>>Are you using wilcox.test(), or a self written function?
>>
> 
> 
> I started off with wilcox.test, and tried to speed it up by hacking 
> out as much from the original function as possible, since a fair 
> proportion of the code being executed was logic deciding between the 
> different flavours of the test (I didn't expect that to make much 
> difference, but tried it anyway), there was also a call to pnorm() 
> which I replaced with a numeric approximation. This is not inlined 
> into the original function.

In that case you cannot do very much except for writing it in a language

like C / Fortran, if you really need the speed.

Uwe Ligges
> Crispin
>  
> --------------------------------------------------------
> 
>  
> This email is confidential and intended solely for the use 
> o...{{dropped}}
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Reasonably Related Threads

Search for more seemingly similar threads

R help - Oct 2003 - FW: Optimising code

FW: [R] Optimising code

FW: [R] Optimising code

FW: [R] Optimising code

FW: [R] Optimising code

Reasonably Related Threads