mcdowella@mcdowella.demon.co.uk
2001-Jul-01  05:53 UTC
[Rd] ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007)
Full_Name: Andrew Grant McDowell Version: R 1.1.1 (but source in 1.3.0 looks fishy as well) OS: Windows 2K Professional (Consumer) Submission from: (NULL) (194.222.243.209) In article <xeQ_6.1949$xd.353840@typhoon.snet.net>, johnt@tman.dnsalias.com writes>Can someone help? In R, I am generating a vector of 1000 samples from >Bin (1000, 0.25). I then do a Kolmogorov Smirnov test to test if the >vector has been drawn from a population of Bin (1000, 0.25). I would >expect a reasonably high p-value..... > >Either I am doing something wrong in R, or I am misunderstanding how this >test should work (both quite possible)... > > >Thanks, >JT.. > > > >> #### 1000 random samples from binomial dist with mean =.25, n=100... >> o<-rbinom (1000, 100, .25) >> mean (o); >[1] 25.178 >> var (o); >[1] 19.61193 >> ks.test (o, "pbinom", 100, .25); > > One-sample Kolmogorov-Smirnov test > >data: o >D = 0.0967, p-value = 1.487e-08 >alternative hypothesis: two.sided > > > >p-value is mighty small, leading me to reject the null hypothesis that >the sample has been drawn from the Bin(100, 0.25) distribution!!! > > >Some more oddities:> o<-rbinom(10000, 1, 0.25) > ks.test(o, "pbinom", 1, 0.25)One-sample Kolmogorov-Smirnov test data: o D = 0.75, p-value = < 2.2e-16 alternative hypothesis: two.sided> length(o[o==0])[1] 7491> length(o[o==1])[1] 2509> o<-rep(0,10000) > ks.test(o, "pbinom", 1, 0.25)One-sample Kolmogorov-Smirnov test data: o D = 0.75, p-value = < 2.2e-16 alternative hypothesis: two.sided> length(o[o==0])[1] 10000> length(o[o==1])[1] 0 Here zeroing out the data does not change the reported D value After playing about with ks.test(c(rep(0, X), rep(1, 1000-x)), "pbinom", 1, p) for a bit I conjecture that ks.test() takes no account whatsoever of ties, but merely sorts the input values and looks for max (position/N - pbinom(value, 1, p)). Anybody got the source handy? -- A. G. McDowell After 30 minutes of download, the relevant part of ks.test.R would appear to be METHOD <- "One-sample Kolmogorov-Smirnov test" n <- length(x) x <- y(sort(x), ...) - (0 : (n-1)) / n STATISTIC <- switch(alternative, "two.sided" = max(c(x, 1/n - x)), "greater" = max(1/n - x), "less" = max(x)) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Seemingly Similar Threads
- (PR#1007) ks.test doesn't compute correct empirical
- (PR#1007) ks.test doesn't compute correct empirical distribution if there are ties in the data
- What can I use instead of ks.test for the binomial distribution ?
- Problems with ks.test
- ks.test - The two-sample two-sided Kolmogorov-Smirnov test with ties (PR#13848)
