thr3ads.net - R devel - [Rd] ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007) [Jul 2001]

If this information is useful, please help other people find it:
Share via:

mcdowella@mcdowella.demon.co.uk

2001-Jul-01 05:53 UTC

[Rd] ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007)

Full_Name: Andrew Grant McDowell
Version: R 1.1.1 (but source in 1.3.0 looks fishy as well)
OS: Windows 2K Professional (Consumer)
Submission from: (NULL) (194.222.243.209)


In article <xeQ_6.1949$xd.353840@typhoon.snet.net>,
johnt@tman.dnsalias.com writes>Can someone help?  In R, I am generating a vector of 1000 samples from 
>Bin (1000, 0.25).  I then do a Kolmogorov Smirnov test to test if the 
>vector has been drawn from a population of Bin (1000, 0.25).  I would
>expect a reasonably high p-value.....
>
>Either I am doing something wrong in R, or I am misunderstanding how this
>test should work (both quite possible)...
>
>
>Thanks,
>JT..
>
>
>
>> #### 1000 random samples from binomial dist with mean =.25, n=100...
>> o<-rbinom (1000, 100, .25)
>> mean (o);
>[1] 25.178
>> var (o);
>[1] 19.61193
>> ks.test (o, "pbinom", 100, .25);
>
>        One-sample Kolmogorov-Smirnov test 
>
>data:  o 
>D = 0.0967, p-value = 1.487e-08 
>alternative hypothesis: two.sided
>
>
>
>p-value is mighty small, leading me to reject the null hypothesis that
>the sample has been drawn from the Bin(100, 0.25) distribution!!!
>
>
>
Some more oddities:
> o<-rbinom(10000, 1, 0.25)
> ks.test(o, "pbinom", 1, 0.25)
         One-sample Kolmogorov-Smirnov test 

data:  o 
D = 0.75, p-value = < 2.2e-16 
alternative hypothesis: two.sided 
> length(o[o==0])
[1] 7491> length(o[o==1])
[1] 2509> o<-rep(0,10000)
> ks.test(o, "pbinom", 1, 0.25)
         One-sample Kolmogorov-Smirnov test 

data:  o 
D = 0.75, p-value = < 2.2e-16 
alternative hypothesis: two.sided 
> length(o[o==0])
[1] 10000> length(o[o==1])[1] 0

Here zeroing out the data does not change the reported D value

After playing about with
ks.test(c(rep(0, X), rep(1, 1000-x)), "pbinom", 1, p)
for a bit I conjecture that ks.test() takes no account
whatsoever of ties, but merely sorts the input values
and looks for max (position/N - pbinom(value, 1, p)).
Anybody got the source handy?
-- 
A. G. McDowell

After 30 minutes of download, the relevant part of ks.test.R would appear to be

        METHOD <- "One-sample Kolmogorov-Smirnov test"
        n <- length(x)
        x <- y(sort(x), ...) - (0 : (n-1)) / n
        STATISTIC <- switch(alternative,
                            "two.sided" = max(c(x, 1/n - x)),
                            "greater" = max(1/n - x),
                            "less" = max(x))

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To:
r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - Jul 2001 - ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007)

[Rd] ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007)

Possibly Parallel Threads

Wisdom of the Ancients