>> I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs> ... and let me guess: everything is significiant to an almost arbitrary > value of \alpha?:-) For each of quarter of a million sets, I do a wilcoxon between two pairs each containing twenty numbers... I do this 12 times...> > (and takes about 3 hours). I've replaced the inner loop I > had originally with a function call via mapply, and also > considered different approximations of the wilcoxon, rather > than that which is implemented in wilcox.test, but that makes > little difference 9and if anything slows things down :-). > > Are you using wilcox.test(), or a self written function? >I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function. Crispin -------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}
Crispin Miller wrote:>>>I have a function that applies a wilcoxon test to 12 sets of about a quarter of a million pairs > > >>... and let me guess: everything is significiant to an almost arbitrary >>value of \alpha? > > > :-) For each of quarter of a million sets, I do a wilcoxon between two pairs each containing twenty numbers... > I do this 12 times...Ah. Sorry for misunderstanding.>>>(and takes about 3 hours). I've replaced the inner loop I >> >>had originally with a function call via mapply, and also >>considered different approximations of the wilcoxon, rather >>than that which is implemented in wilcox.test, but that makes >>little difference 9and if anything slows things down :-). >> >>Are you using wilcox.test(), or a self written function? >> > > > I started off with wilcox.test, and tried to speed it up by hacking out as much from the original function as possible, since a fair proportion of the code being executed was logic deciding between the different flavours of the test (I didn't expect that to make much difference, but tried it anyway), there was also a call to pnorm() which I replaced with a numeric approximation. This is not inlined into the original function.In that case you cannot do very much except for writing it in a language like C / Fortran, if you really need the speed. Uwe Ligges> Crispin > > -------------------------------------------------------- > > > This email is confidential and intended solely for the use o...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Yup, my mistake for not being very clear! Alas the computer is already very fast - looks like it's C for me...> -----Original Message----- > From: Patrick Burns [mailto:pburns at pburns.seanet.com] > Sent: 07 October 2003 12:00 > Cc: Crispin Miller > Subject: Re: FW: [R] Optimising code > > > I had misunderstood also. I agree with Uwe either move > to C or buy a faster computer -- or at least another one. > > Pat > > Uwe Ligges wrote: > > > Crispin Miller wrote: > > > >>>> I have a function that applies a wilcoxon test to 12 > sets of about > >>>> a quarter of a million pairs > >>> > >> > >> > >>> ... and let me guess: everything is significiant to an almost > >>> arbitrary value of \alpha? > >> > >> > >> > >> :-) For each of quarter of a million sets, I do a wilcoxon between > >> two pairs each containing twenty numbers... > >> I do this 12 times... > > > > > > Ah. Sorry for misunderstanding. > > > > > >>>> (and takes about 3 hours). I've replaced the inner loop I > >>> > >>> > >>> had originally with a function call via mapply, and also > considered > >>> different approximations of the wilcoxon, rather than > that which is > >>> implemented in wilcox.test, but that makes little > difference 9and if > >>> anything slows things down :-). > >>> > >>> Are you using wilcox.test(), or a self written function? > >>> > >> > >> > >> I started off with wilcox.test, and tried to speed it up > by hacking > >> out as much from the original function as possible, since a fair > >> proportion of the code being executed was logic deciding > between the > >> different flavours of the test (I didn't expect that to make much > >> difference, but tried it anyway), there was also a call to pnorm() > >> which I replaced with a numeric approximation. This is not inlined > >> into the original function. > > > > > > > > In that case you cannot do very much except for writing it in a > > language like C / Fortran, if you really need the speed. > > > > Uwe Ligges > > > >> Crispin > >> > >> -------------------------------------------------------- > >> > >> > >> This email is confidential and intended solely for the use > >> o...{{dropped}} > >> > >> ______________________________________________ > >> R-help at stat.math.ethz.ch mailing list > >> https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > > >-------------------------------------------------------- This email is confidential and intended solely for the use o...{{dropped}}
If you are only interested in calculating the wilcoxon statistic (as one would with calculating permutated p-values for example), the following should suffice. length.na <- function(x, ...){ tmp <- !(is.na(x) | is.infinite(x)) length(x[tmp], ...) } rank.na <- function(x){ y <- x[!is.na(x)]; x[!is.na(x)] <- rank(y); return(x) } wstats.fn <- function(data=NULL, g1=NULL, g2=NULL){ temp <- t( apply(data, 1, rank.na)); n1 <- apply(temp[ ,g1], 1, length.na); n2 <- apply(temp[ ,g2], 1, length.na); wstats <- rowSums( temp[,g1], na.rm=TRUE ) - n1*(n2 + 1)/2; return(wstats); } m <- matrix( rnorm( 2500*100 ), ncol=100 ) w1 <- wstats.fn(data=m, g1=1:50, g2=51:100) w2 <- apply(m, 1, function(x) wilcox.test( x[1:50], x[51:100] )$stat) On my machine, it take 2.2s to calculate w1 compared to 15.6s for w2. rank.na and length.na are not necessary if your data does not have missing values. Since you are doing large enough times it might be wise to code in C. My C version of this is at least 20-40 times faster than the one in R but like the one above it does not return p-values. -- Adaikalavan Ramasamy -----Original Message----- From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] Sent: Tuesday, October 07, 2003 6:18 PM To: Crispin Miller Cc: R-help (E-mail) Subject: Re: FW: [R] Optimising code Crispin Miller wrote:>>>I have a function that applies a wilcoxon test to 12 sets of about a >>>quarter of a million pairs > > >>... and let me guess: everything is significiant to an almost >>arbitrary >>value of \alpha? > > > :-) For each of quarter of a million sets, I do a wilcoxon between two> pairs each containing twenty numbers... I do this 12 times...Ah. Sorry for misunderstanding.>>>(and takes about 3 hours). I've replaced the inner loop I >> >>had originally with a function call via mapply, and also >>considered different approximations of the wilcoxon, rather >>than that which is implemented in wilcox.test, but that makes >>little difference 9and if anything slows things down :-). >> >>Are you using wilcox.test(), or a self written function? >> > > > I started off with wilcox.test, and tried to speed it up by hacking > out as much from the original function as possible, since a fair > proportion of the code being executed was logic deciding between the > different flavours of the test (I didn't expect that to make much > difference, but tried it anyway), there was also a call to pnorm() > which I replaced with a numeric approximation. This is not inlined > into the original function.In that case you cannot do very much except for writing it in a language like C / Fortran, if you really need the speed. Uwe Ligges> Crispin > > -------------------------------------------------------- > > > This email is confidential and intended solely for the use > o...{{dropped}} > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help