mcdowella@mcdowella.demon.co.uk
2001-Jul-03 04:45 UTC
(PR#1007) [Rd] ks.test doesn't compute correct empirical distribution if there are ties in the data
In message <Pine.GSO.4.31.0107010731110.7616-100000@auk.stats>, Prof Brian D Ripley <ripley@stats.ox.ac.uk> writes> >You do realize that the Kolmogorov tests (and the Kolmogorov-Smirnov >extension) assume continuous distributions, so the distribution theory >is not valid in this case? > >S-PLUS does stop you doing this: > >> ks.gof(o, dist="binomial", size=100, prob=0.25) >Problem in not.cont1(ttest = d.test, nx = nx, alt.ex..: For testing >discrete distributions when sample size > 50, use the > Chi-square test >Thank you for your prompt reply to my bug report. While I agree that the distribution theory for the Kolmogorov tests assumes a continuous distribution, I would like to request a modification to the existing routines. The purpose of this would be to provide a result that would represent a conservative test in the case when the underlying distribution is discrete. This would be in accord with P 432 of the 3rd edition of "Practical Nonparametric Statistics", by Conover, and section 25.38 of "Kendall's Advanced Theory of Statistics, 6th Edition, Vol 2A", by Stewart, Ord, and Arnold, both of which refer to Noether (1963) "Note on the Kolmogorov Statistic in the discrete case", Metrika, 7, 115. Users reared on these and similar textbooks would be less surprised at the behaviour of R if this modification was made, whereas users who do not attempt to apply the Kolmogorov-Smirnov test to discrete distributions would not notice any difference. It would also be in accord with the behaviour of R in the two-sample case, where the effect of the existing code seems to be to provide a conservative test (since the statistic returned is no larger than might be returned in any possible tie-breaking) coupled with a warning, (to which I would have no objection in the one-sample case). It seems to me that the following modification would suffice: replace x <- y(sort(x), ...) - (0 : (n-1)) / n with x <- sort(x) untied <- c(x[1:n-1] != x[2:n], TRUE) x <- y(x, ...) - (0 : (n-1)) / n x <- x[untied] Users dealing with data derived from continuous distributions would not see any difference, because (except with very small probability due to floating point inaccuracy) they would never produce tied data. -- A. G. McDowell -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Reasonably Related Threads
- (PR#1007) ks.test doesn't compute correct empirical
- ks.test doesn't compute correct empirical distribution if there are ties in the data (PR#1007)
- Problems with ks.test
- ks.test - The two-sample two-sided Kolmogorov-Smirnov test with ties (PR#13848)
- Pb with ks.test pvalue