thr3ads.net - R devel - [Rd] Incorrect behavior of ks.test and psmirnov functions with exact=TRUE [Mar 2023]

If this information is useful, please help other people find it:
Share via:

Alexey Sergushichev

2023-Mar-01 00:19 UTC

[Rd] Incorrect behavior of ks.test and psmirnov functions with exact=TRUE

HI,

I've noticed what I think is an incorrect behavior of stats::psmirnov
function and consequently of ks.test when run in an exact mode.

For example:
psmirnov(1, sizes=c(50, 50), z=1:100, two.sided = FALSE, lower.tail = F,
exact=TRUE)

produces 2.775558e-15

However, the exact value should be 1/combination(100, 50), which is
9.9e-30. While the absolute error is small, the relative error is huge, and
it is not fixed by setting option log.p=T

To compare, SciPy has a correct implementation in scipy.stats.ks_2samp:
scipy.stats.ks_2samp(list(range(1,51)), list(range(51, 101)),
alternative="greater", method="exact")
returns 9.911653021418333e-30.

I've tried to dig in a bit and the problem comes down to how the final
value is calculated in psmirnov function:

    if (log.p & !lower.tail)
        return(log1p(-ret/exp(logdenom)))
    if (!log.p & !lower.tail)
        return(1 - ret/exp(logdenom))

There exp(logdenom) is a relatively good (but not perfect) approximation of
combination(100, 50) = 1.008913e+29, ret is also a good approximation of
combination(100, 50)-1 = 1.008913e+29 but there is not enough double
precision for 1 - ret/exp(logdenom) to capture 1/combination(100, 50).

I don't have time to provide a fix, at least not now, but I think this
behavior (good absolute error, but poor relative error for small values)
should at least be mentioned in the manual of the methods psmirnov and/or
ks.test

Best,
Alexey Sergushichev

	[[alternative HTML version deleted]]

Kurt Hornik

2023-Mar-29 08:44 UTC

head link

[Rd] Incorrect behavior of ks.test and psmirnov functions with exact=TRUE

>>>>> Alexey Sergushichev writes:
Thanks.  This is now fixed for the upcoming 4.3.0 release.

Best
-k
> HI,
> I've noticed what I think is an incorrect behavior of stats::psmirnov
> function and consequently of ks.test when run in an exact mode.
> For example:
> psmirnov(1, sizes=c(50, 50), z=1:100, two.sided = FALSE, lower.tail = F,
> exact=TRUE)
> produces 2.775558e-15
> However, the exact value should be 1/combination(100, 50), which is
> 9.9e-30. While the absolute error is small, the relative error is huge, and
> it is not fixed by setting option log.p=T
> To compare, SciPy has a correct implementation in scipy.stats.ks_2samp:
> scipy.stats.ks_2samp(list(range(1,51)), list(range(51, 101)),
> alternative="greater", method="exact")
> returns 9.911653021418333e-30.
> I've tried to dig in a bit and the problem comes down to how the final
> value is calculated in psmirnov function:
>     if (log.p & !lower.tail)
>         return(log1p(-ret/exp(logdenom)))
>     if (!log.p & !lower.tail)
>         return(1 - ret/exp(logdenom))
> There exp(logdenom) is a relatively good (but not perfect) approximation of
> combination(100, 50) = 1.008913e+29, ret is also a good approximation of
> combination(100, 50)-1 = 1.008913e+29 but there is not enough double
> precision for 1 - ret/exp(logdenom) to capture 1/combination(100, 50).
> I don't have time to provide a fix, at least not now, but I think this
> behavior (good absolute error, but poor relative error for small values)
> should at least be mentioned in the manual of the methods psmirnov and/or
> ks.test
> Best,
> Alexey Sergushichev
> 	[[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Apparently Analagous Threads

Search for more reasonably related threads

R devel - Mar 2023 - Incorrect behavior of ks.test and psmirnov functions with exact=TRUE

[Rd] Incorrect behavior of ks.test and psmirnov functions with exact=TRUE

[Rd] Incorrect behavior of ks.test and psmirnov functions with exact=TRUE

Apparently Analagous Threads