Is the kolmogorov-smirnov test valid on both continuous and discrete data? I don't think so, and the example below helped me understand why. A suggestion on testing the discrete data would be appreciated. Thanks, a <- rnorm(1000, 10, 1);a # normal distribution a b <- rnorm(1000, 12, 1.5);b # normal distribution b c <- rnorm(1000, 8, 1);c # normal distribution c d <- rnorm(1000, 12, 2.5);d # normal distribution d par(mfrow=c(2,2), las=1) ahist<-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a bhist<-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b chist<-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c dhist<-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d ks.test(c(a,b), c(c,d), alternative="two.sided") # kolmogorov-smirnov on continuous data ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density), alternative="two.sided") # kolmogorov-smirnov on discrete data [[alternative HTML version deleted]]
The KS test was designed for continuous variables. The vcd package has tools for exploring categorical variables and distributions. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111> -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of tsippel > Sent: Friday, February 18, 2011 7:52 PM > To: r-help at r-project.org > Subject: [R] Kolmogorov-smirnov test > > Is the kolmogorov-smirnov test valid on both continuous and discrete > data? > I don't think so, and the example below helped me understand why. > > A suggestion on testing the discrete data would be appreciated. > > Thanks, > > a <- rnorm(1000, 10, 1);a # normal distribution a > b <- rnorm(1000, 12, 1.5);b # normal distribution b > c <- rnorm(1000, 8, 1);c # normal distribution c > d <- rnorm(1000, 12, 2.5);d # normal distribution d > > par(mfrow=c(2,2), las=1) > ahist<-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms > of a > bhist<-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms > of b > chist<-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms > of c > dhist<-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms > of d > > ks.test(c(a,b), c(c,d), alternative="two.sided") # kolmogorov-smirnov > on > continuous data > ks.test(c(ahist$density, bhist$density), c(chist$density, > dhist$density), > alternative="two.sided") # kolmogorov-smirnov on discrete data > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
It's designed for continuous distributions. See the first sentence here: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test K-S is conservative on discrete distributions On Sat, Feb 19, 2011 at 1:52 PM, tsippel <tsippel at gmail.com> wrote:> Is the kolmogorov-smirnov test valid on both continuous and discrete data? > ?I don't think so, and the example below helped me understand why. > > A suggestion on testing the discrete data would be appreciated. > > Thanks, > > a <- rnorm(1000, 10, 1);a # normal distribution a > b <- rnorm(1000, 12, 1.5);b # normal distribution b > c <- rnorm(1000, 8, 1);c # normal distribution c > d <- rnorm(1000, 12, 2.5);d # normal distribution d > > par(mfrow=c(2,2), las=1) > ahist<-hist(a, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of a > bhist<-hist(b, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of b > chist<-hist(c, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of c > dhist<-hist(d, breaks=1:25, prob=T, ylim=c(0,0.4));box() # histograms of d > > ks.test(c(a,b), c(c,d), alternative="two.sided") # kolmogorov-smirnov on > continuous data > ks.test(c(ahist$density, bhist$density), c(chist$density, dhist$density), > alternative="two.sided") # kolmogorov-smirnov on discrete data > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Taylor Arnold and I have developed a package ks.test (available on R-Forge in beta version) that modifies stats::ks.test to handle discrete null distributions for one-sample tests. We also have a draft of a paper we could provide (email us). The package uses methodology of Conover (1972) and Gleser (1985) to provide exact p-values. It also corrects an algorithmic problem with stats::ks.test in the calculation of the test statistic. This is not a bug, per se, because it was never intended to be used this way. We will submit this new function for inclusion in package stats once we're done testing. So, for example: # With the default ks.test (ouch):> stats::ks.test(c(0,1), ecdf(c(0,1)))One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0.5, p-value = 0.5 alternative hypothesis: two-sided # With our new function (what you would want in this toy example):> ks.test::ks.test(c(0,1), ecdf(c(0,1)))One-sample Kolmogorov-Smirnov test data: c(0, 1) D = 0, p-value = 1 alternative hypothesis: two-sided Original Message: Date: Mon, 28 Feb 2011 21:31:26 +1100 From: Glen Barnett <glnbrntt at gmail.com> To: tsippel <tsippel at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Kolmogorov-smirnov test Message-ID: <AANLkTikcjigrgJuOtkOZqFXFqatiN6arZJvT_apPiVCj at mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 It's designed for continuous distributions. See the first sentence here: http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test K-S is conservative on discrete distributions On Sat, Feb 19, 2011 at 1:52 PM, tsippel <tsippel at gmail.com> wrote:> Is the kolmogorov-smirnov test valid on both continuous and discrete data? > ?I don't think so, and the example below helped me understand why. > > A suggestion on testing the discrete data would be appreciated. > > Thanks,-- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay