Hi, I was trying to draw an empirical distribution function with uniform confidence bands. So I tried to find a way to calculate values of the Kolmogorov-Smirnov Distribution but failed. I guess it must be hidden somewhere (since the ks-test is implemented), but I was unable to find it. Is there any way to do this? Thanks Leif Boysen
kjetil brinchmann halvorsen
2003-Jul-21 21:23 UTC
[R] Confidence Band for empirical distribution function
On 21 Jul 2003 at 15:42, Leif.Boysen wrote: Here are some functions doing this using the package stepfun: ecdf.ksCI <- function(x, main = NULL, sub = NULL, xlab = deparse(substitute(x)), ...) { require(stepfun) xlab if(is.null(main)) main <- paste("ecdf(",deparse(substitute(x)),") + 95% K.S.bands", sep="") n <- length(x) if(is.null(sub)) sub <- paste("n = ", n) ec <- ecdf(x) xx <- get("x", envir=environment(ec))# = sort(x) yy <- get("y", envir=environment(ec)) D <- approx.ksD(n) yyu <- pmin(yy+D, 1) yyl <- pmax(yy-D, 0) ecu <- stepfun(xx, c(yyu, 1) ) ecl <- stepfun(xx, c(yyl, yyl[n]) ) ## Plots -- all calling plot.stepfun plot(ec, main = main, sub = sub, xlab = xlab, ...) plot(ecu, add=TRUE, verticals=TRUE, do.points=FALSE, col.hor="red" , col.vert="red", ...) plot(ecl, add=TRUE, verticals=TRUE, do.points=FALSE, col.hor="red", col.vert="red", ...) } approx.ksD <- function(n) { ## approximations for the critical level for Kolmogorov-Smirnov ## statistic D, ## for confidence level 0.95. Taken from Bickel & Doksum, table IX, ## p.483 ## and Lienert G.A.(1975) who attributes to Miller,L.H.(1956), JASA ifelse(n > 80, 1.358 /( sqrt(n) + .12 + .11/sqrt(n)),##Bickel&Doksum, table ##IX,p.483 splinefun(c(1:9, 10, 15, 10 * 2:8),# from Lienert c(.975, .84189, .70760, .62394, .56328,# 1:5 .51926, .48342, .45427, .43001, .40925,# 6:10 .33760, .29408, .24170, .21012,# 15,20,30,40 .18841, .17231, .15975, .14960)) (n)) } \name{ecdf.ksCI} \alias{ecdf.ksCI} \title{ Plotting the empirical distribution function together with confidence curves. } \description{ Plots the empirical distribution function for one- dimensional data, together with upper and lower confidence curves. Always uses pointwise confidence level of 95\%. } \usage{ ecdf.ksCI(x, main=NULL, sub=NULL, xlab = deparse(substitute(x)), ...) } %- maybe also `usage' for other objects documented here. \arguments{ \item{x}{ \code{x} numerical vector of observations. } \item{\dots}{ \code{\dots} arguments given to \code{\link{plot.stepfun}}.} } \details{ Uses the \code{\link{stepfun}} package. } \value{ Nothing. Used for its side effect, to produce a plot. } \references{ Peter J. Bickel & Kjell A. Doksum: Mathematical Statistics, Basic Ideas and Selected Topics. Holden-Day, 1977. } \author{ Kjetil Halvorsen } \seealso{ \code{\link{ecdf}} and \code{\link{plot.stepfun}} in package \code{\link{stepfun}}. } \examples{ ecdf.ksCI( rchisq(50,3) ) } \keyword{ hplot } \keyword{nonparametric} Kjetil Halvorsen> Hi, > > I was trying to draw an empirical distribution function with uniform > confidence bands. So I tried to find a way to calculate values of the > Kolmogorov-Smirnov Distribution but failed. > I guess it must be hidden somewhere (since the ks-test is implemented), > but I was unable to find it. > > Is there any way to do this? > > Thanks > > Leif Boysen > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Dear Leif, If you look at the definition of ks.test, you'll find the lines pkstwo <- function(x, tol = 1e-06) { if (is.numeric(x)) x <- as.vector(x) else stop("Argument x must be numeric") p <- rep(0, length(x)) p[is.na(x)] <- NA IND <- which(!is.na(x) & (x > 0)) if (length(IND) > 0) { p[IND] <- .C("pkstwo", as.integer(length(x)), p = as.double(x[IND]), as.double(tol), PACKAGE = "ctest")$p } return(p) } which calls C code to calculate the p-values given the test statistic. You'll find explanations on what this function does in the original C file src/library/ctest/src/ks.c I haven't tried that but I assume that you could use it to calculate p-values given the test-statistics yourself. Please also note that ks.test() returns the p-value as well. If you need quantiles, I assume you need to invert the cdf yourself, e.g. using uniroot(). HTH Thomas --- Thomas Hotz Research Associate in Medical Statistics University of Leicester United Kingdom Department of Epidemiology and Public Health 22-28 Princess Road West Leicester LE1 6TP Tel +44 116 252-5410 Fax +44 116 252-5423 Division of Medicine for the Elderly Department of Medicine The Glenfield Hospital Leicester LE3 9QP Tel +44 116 256-3643 Fax +44 116 232-2976> -----Original Message----- > From: Leif.Boysen [mailto:boysen at math.uni-goettingen.de] > Sent: 21 July 2003 14:42 > To: r-help at stat.math.ethz.ch > Subject: [R] Confidence Band for empirical distribution function > > > Hi, > > I was trying to draw an empirical distribution function with uniform > confidence bands. So I tried to find a way to calculate values of the > Kolmogorov-Smirnov Distribution but failed. > I guess it must be hidden somewhere (since the ks-test is > implemented), > but I was unable to find it. > > Is there any way to do this? > > Thanks > > Leif Boysen > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Dear Kjetil, As I already mentioned, it appears that there isn't a function available calculating the quantiles directly (at least, it doesn't appear in the C source of ctest). So as I already suggested, uniroot (or a similar C routine which calls the corresponding C code directly) is probably the best you can do (apart from writing it completely yourself). I didn't program this using uniroot, but I'd certainly try the following for speed-up: - For symmetry reasons, you only need to compute half of the quantiles. - The quantiles depend smoothly on the probabilities (of your reference distribution). Therefore, calculating only a "few" for probabilities between 0 and 0.5, and using (e.g. linear) interpolation should be satisfying. I am sorry not be of more help. HTH anyway Thomas> -----Original Message----- > From: kjetil brinchmann halvorsen [mailto:kjetil at entelnet.bo] > Sent: 23 July 2003 15:46 > To: Hotz, T. > Subject: RE: [R] Confidence Band for empirical distribution function > > > On 22 Jul 2003 at 11:37, Hotz, T. wrote: > > > Dear Leif, > > > > If you look at the definition of ks.test, you'll find the lines > > > > pkstwo <- function(x, tol = 1e-06) { > > if (is.numeric(x)) > > x <- as.vector(x) > > else stop("Argument x must be numeric") > > p <- rep(0, length(x)) > > p[is.na(x)] <- NA > > IND <- which(!is.na(x) & (x > 0)) > > if (length(IND) > 0) { > > p[IND] <- .C("pkstwo", as.integer(length(x)), p = > as.double(x[IND]), > > as.double(tol), PACKAGE = "ctest")$p > > } > > return(p) > > } > > > > which calls C code to calculate the p-values given the test > statistic. > > You'll find explanations on what this function does in the > original C file > > src/library/ctest/src/ks.c > > > > I haven't tried that but I assume that you could use it to > calculate p-values > > given the test-statistics yourself. > > That could certainly be done, but what was asked for is the inverse, > which can be calculated, using for instance uniroot(). I tried that, > but it is to slow, .C() will be called repeatedly in a loop. For me > it took several minutes. > > Kjetil Halvorsen > > > > > Please also note that ks.test() returns the p-value as well. > > > > If you need quantiles, I assume you need to invert the cdf yourself, > > e.g. using uniroot(). > > > > HTH > > > > Thomas > > > > --- > > > > Thomas Hotz > > Research Associate in Medical Statistics > > University of Leicester > > United Kingdom > > > > Department of Epidemiology and Public Health > > 22-28 Princess Road West > > Leicester > > LE1 6TP > > Tel +44 116 252-5410 > > Fax +44 116 252-5423 > > > > Division of Medicine for the Elderly > > Department of Medicine > > The Glenfield Hospital > > Leicester > > LE3 9QP > > Tel +44 116 256-3643 > > Fax +44 116 232-2976 > > > > > > > -----Original Message----- > > > From: Leif.Boysen [mailto:boysen at math.uni-goettingen.de] > > > Sent: 21 July 2003 14:42 > > > To: r-help at stat.math.ethz.ch > > > Subject: [R] Confidence Band for empirical distribution function > > > > > > > > > Hi, > > > > > > I was trying to draw an empirical distribution function > with uniform > > > confidence bands. So I tried to find a way to calculate > values of the > > > Kolmogorov-Smirnov Distribution but failed. > > > I guess it must be hidden somewhere (since the ks-test is > > > implemented), > > > but I was unable to find it. > > > > > > Is there any way to do this? > > > > > > Thanks > > > > > > Leif Boysen > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > > > > > > ______________________________________________ > > R-help at stat.math.ethz.ch mailing list > > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > >