Karolis Koncevičius
2023-Apr-21 08:32 UTC
[Rd] Confusion about ks.test() handling of ties and exact vs approximate results
Hello, Today I was investigating ks.test() with two numerical arguments (x and y) and was left a bit confused about the policy behind handling ties. I might be missing something, so sorry in advance, but here is what confuses me: The documentation states: "The presence of ties always generates a warning, since continuous distributions do not generate them" But when I run a test with ties there is no warning: ks.test(1:4, 4:7) However, when I specify that I do not want an exact test, there appears a warning saying that the computation will be approximate: ks.test(1:4, 4:7, exact=FALSE) # Warning: p-value will be approximate in the presence of ties But isn?t specifying exact=FALSE already makes the test approximate? I tried inspecting the source code for guidance but also was left a bit puzzled. In ks.test.R under if(is.numeric(y)) clause there is a variable called TIES that is set and changed, but is never used anywhere. Here are examples: line 55 TIES <- FALSE line 61 TIES <- TRUE line 74 if (TIES) line 75 z <- w But later this z variable is not used as a variable in the code. It looks to me that this TIES variable can be deleted without affecting anything else. What I gathered from the investigation is that probably now ties are handled by psmirnov() and for numeric x and y the computations are exact even with ties, however I am a bit puzzled about the warning for approximate values, when exact = FALSE is set anyway. So my question - is everything currently OK with the code and the documentation?
Martin Maechler
2023-Apr-21 12:56 UTC
[Rd] Confusion about ks.test() handling of ties and exact vs approximate results
>>>>> Karolis Koncevi?ius >>>>> on Fri, 21 Apr 2023 11:32:41 +0300 writes:> Hello, > Today I was investigating ks.test() with two numerical arguments (x and y) and was left a bit confused about the policy behind handling ties. > I might be missing something, so sorry in advance, but here is what confuses me: > The documentation states: "The presence of ties always generates a warning, since continuous distributions do not generate them" Indeed, that has not correct anymore for quite a while I think. The current default is `exact = NULL` and that will be made into TRUE in certain circumstances, notably for all(*) small data situations. -- *) The help page gives details. > But when I run a test with ties there is no warning: > ks.test(1:4, 4:7) and indeed the printed output explicitly says that the *exact* test was used. > However, when I specify that I do not want an exact test, there appears a warning saying that the computation will be approximate: > ks.test(1:4, 4:7, exact=FALSE) > # Warning: p-value will be approximate in the presence of ties > But isn?t specifying exact=FALSE already makes the test approximate? yes, but I think the idea is you'd look twice, and see that in this case it is recommended to also use simulate.p.value = TRUE, > I tried inspecting the source code for guidance but also was left a bit puzzled. In ks.test.R under if(is.numeric(y)) clause there is a variable called TIES that is set and changed, but is never used anywhere. Here are examples: > line 55 TIES <- FALSE > line 61 TIES <- TRUE > line 74 if (TIES) > line 75 z <- w > But later this z variable is not used as a variable in the code. It looks to me that this TIES variable can be deleted without affecting anything else. That is correct. It is indeed a remainder from before the recent improvements and psmirnov(). [TIES is used in the other branch in the same ks.test.default() function]