On Mon, 16 May 2005, Jim BRINDLE wrote:
> Hello,
>
> I am hoping someone could shed some light into the Wilcoxon Rank Sum Test
> for me? In looking through Stats references, the Mann-Whitney U-test and
> the Wilcoxon Rank Sum Test are statistically equivalent.
Yes, but not numerically: they differ by a constant (in the data, a
function of the data size).
> When using the
> following dataset:
>
> m <- c(2.0863,2.1340,2.1008,1.9565,2.0413,NA,NA)
> f <- c(1.8938,1.9709,1.8613,2.0836,1.9485,2.0630,1.9143)
>
> and the wilcox.test command as below:
>
> wilcox.test(m,f, paired = FALSE, alternative = c("two.sided"))
>
> I get a test statistic (W) of 30. When I perform this test by hand
> utilizing the methodology laid out in Ch. 6 of Ott & Longnecker I get a
> value of 45. Any insight or good reference(s) as to the algorithm R is
> using or this issue in general would be most appreciated.
I don't know that book but the R help page does have references. Also,
?pwilcox says
This distribution is obtained as follows. Let 'x' and 'y'
be two
random, independent samples of size 'm' and 'n'. Then the
Wilcoxon
rank sum statistic is the number of all pairs '(x[i], y[j])' for
which 'y[j]' is not greater than 'x[i]'. This statistic
takes
values between '0' and 'm * n', and its mean and variance
are 'm *
n / 2' and 'm * n * (m + n + 1) / 12', respectively.
Your samples have length 5 (after removing NAs) and 7 and no ties. The R
code is readable by
> getAnywhere("wilcox.test.default")
as essentially
r <- rank(c(x,y))
sum(r[seq(along = x)]) - n.x * (n.x + 1)/2
I guess your reference just uses the first term.
Another way of looking at this is whether ranks start at 0 or at 1 (as in
rank()): R's definition is the rank sum with 0-based ranks.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595