thr3ads.net - R devel - [Rd] Inconsistencies in wilcox.test [Dec 2019]

If this information is useful, please help other people find it:
Share via:

Karolis Koncevičius

2019-Dec-07 18:55 UTC

[Rd] Inconsistencies in wilcox.test

Hello,

Writing to share some things I've found about wilcox.test() that seem a
a bit inconsistent.

1. Inf values are not removed if paired=TRUE

# returns different results (Inf is removed):
wilcox.test(c(1,2,3,4), c(0,9,8,7))
wilcox.test(c(1,2,3,4), c(0,9,8,Inf))

# returns the same result (Inf is left as value with highest rank):
wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)

2. tolerance issues with paired=TRUE.

wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
# ...
# Warning:  cannot compute exact p-value with ties

wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
# ...
# no warning

3. Always 'x observations are missing' when paired=TRUE

wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
# ...
# Error:  not enough (finite) 'x' observations

4. No indication if normal approximation was used:

# different numbers, but same "method" name
wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)


 From all of these I am pretty sure the 1st one is likely unintended,
so attaching a small patch to adjust it. Can also try patching others if 
consensus is reached that the behavioiur has to be modified.

Kind regards,
Karolis Koncevi?ius.

---

Index: wilcox.test.R
==================================================================---
wilcox.test.R  (revision 77540)
+++ wilcox.test.R  (working copy)
@@ -42,7 +42,7 @@
          if(paired) {
              if(length(x) != length(y))
                  stop("'x' and 'y' must have the same
length")
-            OK <- complete.cases(x, y)
+            OK <- is.finite(x) & is.finite(y)
              x <- x[OK] - y[OK]
              y <- NULL
          }

Ben Bolker

2019-Dec-07 19:41 UTC

head link

[Rd] Inconsistencies in wilcox.test

Your second issue seems like a more or less unavoidable floating-point
computation issue.  The paired test operates by computing differences
between corresponding values of x and y.

  It's not impossible to try to detect "almost-ties" (by testing
for
differences less than, say, sqrt(.Machine$double.eps)), but it's
delicate and somewhat subjective/problem-dependent.

  Example:

options(digits=20)> unique(c(4,3,2)-c(3,2,1))
[1] 1> unique(c(0.4,0.3,0.2)-c(0.3,0.2,0.1))[1] 0.100000000000000033307 0.099999999999999977796 0.100000000000000005551

On 2019-12-07 1:55 p.m., Karolis Koncevi?ius wrote:> Hello,
> 
> Writing to share some things I've found about wilcox.test() that seem a
> a bit inconsistent.
> 
> 1. Inf values are not removed if paired=TRUE
> 
> # returns different results (Inf is removed):
> wilcox.test(c(1,2,3,4), c(0,9,8,7))
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
> 
> # returns the same result (Inf is left as value with highest rank):
> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
> 
> 2. tolerance issues with paired=TRUE.
> 
> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
> # ...
> # Warning:? cannot compute exact p-value with ties
> 
> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
> # ...
> # no warning
> 
> 3. Always 'x observations are missing' when paired=TRUE
> 
> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
> # ...
> # Error:? not enough (finite) 'x' observations
> 
> 4. No indication if normal approximation was used:
> 
> # different numbers, but same "method" name
> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
> 
> 
> From all of these I am pretty sure the 1st one is likely unintended,
> so attaching a small patch to adjust it. Can also try patching others if
> consensus is reached that the behavioiur has to be modified.
> 
> Kind regards,
> Karolis Koncevi?ius.
> 
> ---
> 
> Index: wilcox.test.R
> ==================================================================> ---
wilcox.test.R? (revision 77540)
> +++ wilcox.test.R? (working copy)
> @@ -42,7 +42,7 @@
> ???????? if(paired) {
> ???????????? if(length(x) != length(y))
> ???????????????? stop("'x' and 'y' must have the same
length")
> -??????????? OK <- complete.cases(x, y)
> +??????????? OK <- is.finite(x) & is.finite(y)
> ???????????? x <- x[OK] - y[OK]
> ???????????? y <- NULL
> ???????? }
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Karolis Koncevičius

2019-Dec-07 20:38 UTC

head link

[Rd] Inconsistencies in wilcox.test

Thank you for responding, and so quickly at that.

Yes, I do understand that this is a floating point issue.
However, since wilcox.test() works on ranks this might be a bit 
dangerous in my opinion. Maybe more so than for magnitude based tests.
Any small precision error will be ranked and it becomes a matter of 
errors being systematically >0 or <0 in one group.

Here is one example that I do not like:

x <- seq(0.9, 0.2, -0.1)
y <- seq(0.8, 0.1, -0.1)
wilcox.test(x, y, paired=TRUE, mu=0.1)

   Wilcoxon signed rank test with continuity correction

   data:  x and y
   V = 0, p-value = 0.01471
   alternative hypothesis: true location shift is not equal to 0.1
   # ... Warning, due to some precision deviations being duplicated ...

x-y
[1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

sign(x-y - 0.1)
[1] -1 -1 -1 -1 -1 -1 -1  0


t.test() uses .Machine$double.eps with stderr and avoids this issue:

t.test(x, y, paired=TRUE, mu=0.1)
   Error in t.test.default(x, y, paired = TRUE, mu = 0.1) :
     data are essentially constant

On 2019-12-07 14:41, Ben Bolker wrote:>
> Your second issue seems like a more or less unavoidable floating-point
>computation issue.  The paired test operates by computing differences
>between corresponding values of x and y.
>
>  It's not impossible to try to detect "almost-ties" (by
testing for
>differences less than, say, sqrt(.Machine$double.eps)), but it's
>delicate and somewhat subjective/problem-dependent.
>
>  Example:
>
>options(digits=20)
>> unique(c(4,3,2)-c(3,2,1))
>[1] 1
>> unique(c(0.4,0.3,0.2)-c(0.3,0.2,0.1))
>[1] 0.100000000000000033307 0.099999999999999977796 0.100000000000000005551
>
>On 2019-12-07 1:55 p.m., Karolis Koncevi?ius wrote:
>> Hello,
>>
>> Writing to share some things I've found about wilcox.test() that
seem a
>> a bit inconsistent.
>>
>> 1. Inf values are not removed if paired=TRUE
>>
>> # returns different results (Inf is removed):
>> wilcox.test(c(1,2,3,4), c(0,9,8,7))
>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>>
>> # returns the same result (Inf is left as value with highest rank):
>> wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
>> wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>>
>> 2. tolerance issues with paired=TRUE.
>>
>> wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
>> # ...
>> # Warning:? cannot compute exact p-value with ties
>>
>> wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
>> # ...
>> # no warning
>>
>> 3. Always 'x observations are missing' when paired=TRUE
>>
>> wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
>> # ...
>> # Error:? not enough (finite) 'x' observations
>>
>> 4. No indication if normal approximation was used:
>>
>> # different numbers, but same "method" name
>> wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
>> wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>>
>>
>> From all of these I am pretty sure the 1st one is likely unintended,
>> so attaching a small patch to adjust it. Can also try patching others
if
>> consensus is reached that the behavioiur has to be modified.
>>
>> Kind regards,
>> Karolis Koncevi?ius.
>>
>> ---
>>
>> Index: wilcox.test.R
>>
==================================================================>> ---
wilcox.test.R? (revision 77540)
>> +++ wilcox.test.R? (working copy)
>> @@ -42,7 +42,7 @@
>> ???????? if(paired) {
>> ???????????? if(length(x) != length(y))
>> ???????????????? stop("'x' and 'y' must have the
same length")
>> -??????????? OK <- complete.cases(x, y)
>> +??????????? OK <- is.finite(x) & is.finite(y)
>> ???????????? x <- x[OK] - y[OK]
>> ???????????? y <- NULL
>> ???????? }
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>______________________________________________
>R-devel at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-devel

Martin Maechler

2019-Dec-07 20:59 UTC

head link

[Rd] Inconsistencies in wilcox.test

>>>>> Karolis Koncevi?ius 
>>>>>     on Sat, 7 Dec 2019 20:55:36 +0200 writes:
    > Hello,
    > Writing to share some things I've found about wilcox.test() that
seem a
    > a bit inconsistent.

    > 1. Inf values are not removed if paired=TRUE

    > # returns different results (Inf is removed):
    > wilcox.test(c(1,2,3,4), c(0,9,8,7))
    > wilcox.test(c(1,2,3,4), c(0,9,8,Inf))

    > # returns the same result (Inf is left as value with highest rank):
    > wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
    > wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)

Now which of the two cases do you consider correct ?

IHMO, the 2nd one is correct: it is exactly one property of the
*robustness* of the wilcoxon test and very desirable that any
(positive) outlier is treated the same as just "the largest
value" and the test statistic (and hence the p-value) should not
change.

So I think the first case is wrong, notably if modified, (such
that the last  y  is the overall maximal value (slightly larger sample):
> wilcox.test(1:7, 1/8+ c(9:4, 12))
	Wilcoxon rank sum test

data:  1:7 and 1/8 + c(9:4, 12)
W = 6, p-value = 0.01748
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(1:7, 1/8+ c(9:4, 10000))
	Wilcoxon rank sum test

data:  1:7 and 1/8 + c(9:4, 10000)
W = 6, p-value = 0.01748
alternative hypothesis: true location shift is not equal to 0
> wilcox.test(1:7, 1/8+ c(9:4, Inf))
	Wilcoxon rank sum test

data:  1:7 and 1/8 + c(9:4, Inf)
W = 6, p-value = 0.03497
alternative hypothesis: true location shift is not equal to 0

The  Inf  case should definitely give the same as the  10'000 case.
That's exactly one property of a robust statistic.

Thank you, Karolis, this is pretty embarrassing to only be detected now after
25+ years of R in use ...

The correct fix starts with replacing the  is.finite()  by   !is.na() and keep
the 'Inf' in the rank computations...
(but then probably also deal with the case of more than one Inf, notably the 
Inf - Inf  "exception" which is not triggered by your example...)


---

Ben addressed the "rounding" / numerical issues unavoidable for
the other problems.

    > 2. tolerance issues with paired=TRUE.

    > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
    > # ...
    > # Warning:  cannot compute exact p-value with ties

    > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
    > # ...
    > # no warning

    > 3. Always 'x observations are missing' when paired=TRUE

    > wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
    > # ...
    > # Error:  not enough (finite) 'x' observations

    > 4. No indication if normal approximation was used:

    > # different numbers, but same "method" name
    > wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
    > wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)


    > From all of these I am pretty sure the 1st one is likely unintended,
    > so attaching a small patch to adjust it. Can also try patching others
if
    > consensus is reached that the behavioiur has to be modified.

    > Kind regards,
    > Karolis Koncevi?ius.

    > ---

    > Index: wilcox.test.R
    > ==================================================================   
> --- wilcox.test.R  (revision 77540)
    > +++ wilcox.test.R  (working copy)
    > @@ -42,7 +42,7 @@
    > if(paired) {
    > if(length(x) != length(y))
    > stop("'x' and 'y' must have the same length")
    > -            OK <- complete.cases(x, y)
    > +            OK <- is.finite(x) & is.finite(y)
    > x <- x[OK] - y[OK]
    > y <- NULL
    > }

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Karolis Koncevičius

2019-Dec-07 21:18 UTC

head link

[Rd] Inconsistencies in wilcox.test

Thank you for a fast response. Nice to see this mailing list being so 
alive.

Regarding Inf issue: I agree with your assessment that Inf should not be 
removed. The code gave me an impression that Inf values were 
intentionally removed (since is.finite() was used everywhere, except for 
paired case). I will try to adjust my patch according to your feedback.

One more thing: it seems like you assumed that issues 2:4 are all 
related to machine precision, which is not the case - only 2nd issue is.
Just wanted to draw this to your attention, in case you might have some 
feedback and guidelines about issues 3 and 4 as well.



On 2019-12-07 21:59, Martin Maechler wrote:>>>>>> Karolis Koncevi?ius
>>>>>>     on Sat, 7 Dec 2019 20:55:36 +0200 writes:
>
>    > Hello,
>    > Writing to share some things I've found about wilcox.test()
that seem a
>    > a bit inconsistent.
>
>    > 1. Inf values are not removed if paired=TRUE
>
>    > # returns different results (Inf is removed):
>    > wilcox.test(c(1,2,3,4), c(0,9,8,7))
>    > wilcox.test(c(1,2,3,4), c(0,9,8,Inf))
>
>    > # returns the same result (Inf is left as value with highest rank):
>    > wilcox.test(c(1,2,3,4), c(0,9,8,7), paired=TRUE)
>    > wilcox.test(c(1,2,3,4), c(0,9,8,Inf), paired=TRUE)
>
>Now which of the two cases do you consider correct ?
>
>IHMO, the 2nd one is correct: it is exactly one property of the
>*robustness* of the wilcoxon test and very desirable that any
>(positive) outlier is treated the same as just "the largest
>value" and the test statistic (and hence the p-value) should not
>change.
>
>So I think the first case is wrong, notably if modified, (such
>that the last  y  is the overall maximal value (slightly larger sample):
>
>> wilcox.test(1:7, 1/8+ c(9:4, 12))
>
>	Wilcoxon rank sum test
>
>data:  1:7 and 1/8 + c(9:4, 12)
>W = 6, p-value = 0.01748
>alternative hypothesis: true location shift is not equal to 0
>
>> wilcox.test(1:7, 1/8+ c(9:4, 10000))
>
>	Wilcoxon rank sum test
>
>data:  1:7 and 1/8 + c(9:4, 10000)
>W = 6, p-value = 0.01748
>alternative hypothesis: true location shift is not equal to 0
>
>> wilcox.test(1:7, 1/8+ c(9:4, Inf))
>
>	Wilcoxon rank sum test
>
>data:  1:7 and 1/8 + c(9:4, Inf)
>W = 6, p-value = 0.03497
>alternative hypothesis: true location shift is not equal to 0
>
>The  Inf  case should definitely give the same as the  10'000 case.
>That's exactly one property of a robust statistic.
>
>Thank you, Karolis, this is pretty embarrassing to only be detected now
after 25+ years of R in use ...
>
>The correct fix starts with replacing the  is.finite()  by   !is.na() and
keep the 'Inf' in the rank computations...
>(but then probably also deal with the case of more than one Inf, notably the
Inf - Inf  "exception" which is not triggered by your example...)
>
>
>---
>
>Ben addressed the "rounding" / numerical issues unavoidable for
>the other problems.
>
>    > 2. tolerance issues with paired=TRUE.
>
>    > wilcox.test(c(4, 3, 2), c(3, 2, 1), paired=TRUE)
>    > # ...
>    > # Warning:  cannot compute exact p-value with ties
>
>    > wilcox.test(c(0.4,0.3,0.2), c(0.3,0.2,0.1), paired=TRUE)
>    > # ...
>    > # no warning
>
>    > 3. Always 'x observations are missing' when paired=TRUE
>
>    > wilcox.test(c(1,2), c(NA_integer_,NA_integer_), paired=TRUE)
>    > # ...
>    > # Error:  not enough (finite) 'x' observations
>
>    > 4. No indication if normal approximation was used:
>
>    > # different numbers, but same "method" name
>    > wilcox.test(rnorm(10), exact=FALSE, correct=FALSE)
>    > wilcox.test(rnorm(10), exact=TRUE, correct=FALSE)
>
>
>    > From all of these I am pretty sure the 1st one is likely
unintended,
>    > so attaching a small patch to adjust it. Can also try patching
others if
>    > consensus is reached that the behavioiur has to be modified.
>
>    > Kind regards,
>    > Karolis Koncevi?ius.
>
>    > ---
>
>    > Index: wilcox.test.R
>    >
==================================================================>    >
--- wilcox.test.R  (revision 77540)
>    > +++ wilcox.test.R  (working copy)
>    > @@ -42,7 +42,7 @@
>    > if(paired) {
>    > if(length(x) != length(y))
>    > stop("'x' and 'y' must have the same
length")
>    > -            OK <- complete.cases(x, y)
>    > +            OK <- is.finite(x) & is.finite(y)
>    > x <- x[OK] - y[OK]
>    > y <- NULL
>    > }
>
>    > ______________________________________________
>    > R-devel at r-project.org mailing list
>    > https://stat.ethz.ch/mailman/listinfo/r-devel

Pavel S. Ruzankin

2019-Dec-09 02:10 UTC

head link

[Rd] Inconsistencies in wilcox.test

I'd like to ask the developers to include some exact computation for 
ties into wilcox.test(). Just try

wilcox.test(c(1,1,5),c(10,11))

wilcox.test(c(1,2,5),c(10,11))

The p-values differ significantly.

But if I try

library(exactRankTests)

wilcox.exact(c(1,1,5),c(10,11))

wilcox.exact(c(1,2,5),c(10,11))

then the p-values coincide as expected. (R is used by many 
non-mathematicians/non-engineers who pay no attention at warnings.)

Reasonably Related Threads

Search for more apparently analagous threads

R devel - Dec 2019 - Inconsistencies in wilcox.test

[Rd] Inconsistencies in wilcox.test

[Rd] Inconsistencies in wilcox.test

[Rd] Inconsistencies in wilcox.test

[Rd] Inconsistencies in wilcox.test

[Rd] Inconsistencies in wilcox.test

[Rd] Inconsistencies in wilcox.test

Reasonably Related Threads