Hi, I believe I have significantly improved [dpq]wilcox functions by implementing Harding's algorithm: Harding, E.F. (1984): An Efficient, Minimal-storage Procedure for Calculating the Mann-Whitney U, Generalized U and Similar Distributions, App. Statist., 33, 1-6 Results on my computer show (against R-2.9.1):> system.time( dwilcox( 800, 800, 80) )user system elapsed 0.240 0.180 0.443> system.time( dwilcox( (1:2800), 80, 80) )user system elapsed 0.290 0.020 0.30> system.time( dwilcox( (1:2800), 160, 160) )user system elapsed 0.810 0.060 0.868> system.time( dwilcox( (1:28000), 210, 210) )user system elapsed 17.700 0.600 18.313 RAM: ~ 700MB> system.time( pwilcox( 21000, 211, 211) )user system elapsed 18.110 0.640 18.762> system.time( a <- qwilcox( 0.43, 200, 400) )^C Timing stopped at: 14.39 1.43 18.794 RAM: > 1.4GB at interrupt time> system.time( dwilcox(800, 800, 80) )user system elapsed 0.140 0.000 0.144> system.time( dwilcox( (1:2800), 80, 80) )user system elapsed 0.010 0.000 0.007> system.time( dwilcox( (1:2800), 160, 160) )user system elapsed 0.020 0.000 0.016> system.time( dwilcox( (1:28000), 210, 210) )user system elapsed 0.050 0.000 0.05 RAM: < 1MB> system.time( pwilcox( 21000, 211, 211) )user system elapsed 0.020 0.000 0.025> system.time( a <- qwilcox( 0.43, 200, 400) )user system elapsed 0.040 0.000 0.07 RAM: < 1MB There is no more need for wilcox_free at [dpq]wilcox in src/library/stats/distn.R (every other call after the first one with the same m,n will just read the results from the array so it will be really fast) and for #define WILCOX_MAX 50 in src/nmath/nmath.h p.s. modified files are in the attachment have fun, -- Ivo Ugrina << http://web.math.hr/~iugrina >> Teaching/Research Assistant at Department of Mathematics University of Zagreb, Croatia -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: distn.R URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090709/330b9a86/attachment.pl> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: nmath.h URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090709/330b9a86/attachment.h> -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: wilcox.c URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20090709/330b9a86/attachment.c>
Ivo Ugrina wrote:> Hi, > > I believe I have significantly improved [dpq]wilcox > functions by implementing Harding's algorithm: > Harding, E.F. (1984): An Efficient, Minimal-storage Procedure > for Calculating the Mann-Whitney U, Generalized U and Similar > Distributions, App. Statist., 33, 1-6 > > Results on my computer show (against R-2.9.1): > >> system.time( dwilcox( 800, 800, 80) ) > user system elapsed > 0.240 0.180 0.443 >> system.time( dwilcox( (1:2800), 80, 80) ) > user system elapsed > 0.290 0.020 0.30 >> system.time( dwilcox( (1:2800), 160, 160) ) > user system elapsed > 0.810 0.060 0.868 >> system.time( dwilcox( (1:28000), 210, 210) ) > user system elapsed > 17.700 0.600 18.313 > RAM: ~ 700MB >> system.time( pwilcox( 21000, 211, 211) ) > user system elapsed > 18.110 0.640 18.762 >> system.time( a <- qwilcox( 0.43, 200, 400) ) > ^C > Timing stopped at: 14.39 1.43 18.794 > RAM: > 1.4GB at interrupt time > >> system.time( dwilcox(800, 800, 80) ) > user system elapsed > 0.140 0.000 0.144 >> system.time( dwilcox( (1:2800), 80, 80) ) > user system elapsed > 0.010 0.000 0.007 >> system.time( dwilcox( (1:2800), 160, 160) ) > user system elapsed > 0.020 0.000 0.016 >> system.time( dwilcox( (1:28000), 210, 210) ) > user system elapsed > 0.050 0.000 0.05 > RAM: < 1MB >> system.time( pwilcox( 21000, 211, 211) ) > user system elapsed > 0.020 0.000 0.025 >> system.time( a <- qwilcox( 0.43, 200, 400) ) > user system elapsed > 0.040 0.000 0.07 > RAM: < 1MB > > There is no more need for > wilcox_free at [dpq]wilcox in src/library/stats/distn.R > (every other call after the first one with the same m,n > will just read the results from the array so it will be > really fast) and for > #define WILCOX_MAX 50 in src/nmath/nmath.h > > p.s. modified files are in the attachment > > have fun, >So is this an improvement or not? -- Ivo Ugrina << http://web.math.hr/~iugrina >> Teaching/Research Assistant at Department of Mathematics University of Zagreb, Croatia
Hi Ivo, Excerpts from Ivo Ugrina's message of Thu Jul 09 17:05:27 +0200 2009:> I believe I have significantly improved [dpq]wilcox > functions by implementing Harding's algorithm: > Harding, E.F. (1984): An Efficient, Minimal-storage Procedure > for Calculating the Mann-Whitney U, Generalized U and Similar > Distributions, App. Statist., 33, 1-6I've looked at your code and it is indeed quite a bit faster. Sadly in some cases it produces inaccurate results. See for example: R 2.9.1: > sum(dwilcox(1:(500*100), 500, 100)) [1] 1 R 2.9.1 + your patch: > sum(dwilcox(1:(500*100), 500, 100)) [1] 1.001377 > sum(dwilcox(1:(500*200), 500, 200)) [1] -132443.2 > sum(dwilcox(1:(1000*200), 1000, 200)) [1] 3.412391e+13 The last two examples run out of memory on my machine in an unpatched R. I think if you can sort out the numerical issuses your patch would be interesting since it is indeed quite a bit faster than the current implementation. Cheers, Olaf Mersmann