Dear All I am trying to replicate a numerical application (not computed on R) from an article. Using, ks.test() I computed the exact D value shown in the article but the p-values I obtain are quite different from the one shown in the article. The tests are performed on a sample of 37 values (please see "[0] DATA" below) for truncated Exponential, Pareto and truncated LogNormal (please see "[1] FUNCTIONS" below). You can find below the commands I use to compute the tests ("[2] COMMANDS") and the p-values presented in the article. Can anyone explain the difference between these two set of p-values? Maybe the explanation is linked to my second question: Can anyone confirm me that p-values calculated in the Kolmogrov-Smirnov g-o-f are independent from the family of the distribution assumed in H0? Anderson-Darling test is also performed in the article. would anyone have functions to calculate pvalues for exponential, lognormal, weibull, etc? Many thanks for your help, Franck Allaire R 1.6.2 ############### # [0] DATA # truncation point is 30 i.e. all values are above 30 ############### 39.7 582 439.9 47.1 83.9 41.2 893.1 192 106.2 216.7 1243.4 52.8 351.6 36.2 431.5 57.3 1602.1 822.2 260.1 58.7 6299.9 814.9 137.2 203.8 1263.5 53.7 1313 118.4 167.8 70.1 503.7 64.8 529.9 87.8 2465.4 317.9 2753.9 ############### # [1] FUNCTIONS ############### #EXP exptfit_function(x,b, plot.it = F, lty = 1){ if (mode(x) != "numeric") stop("need numeric data") x <- x[!is.na(x)] x <- sort(x) lambda <- length(x)/sum(x-b) if(plot.it) { lines(x, dexpt(x, lambda = lambda, b = b), lty = lty,col="red")} result <- list(lambda = lambda, b = b) class(result) <- "expt" result} dexpt<-function(x,lambda,b) dexp(x-b,rate=lambda) pexpt<-function(x,lambda,b) pexp(x-b,rate=lambda) qexpt<-function(p,lambda,b) qexp(p,rate=lambda)+b rexpt<-function(n,lambda,b) rexp(n,rate=lambda)+b #PARETO paretofit<-function(x,b, plot.it = F, lty = 1){ x <- sort(x) alpha <- length(x)/sum(log(x/b)) if(plot.it) { lines(x, dpareto(x, alpha = alpha, b = b), lty = lty,col="red")} result <- list(alpha = alpha, b = b) class(result) <- "pareto" result} dpareto<-function(x, alpha,b) ifelse(x<b,0,alpha*(b^alpha)/(x^(alpha+1))) ppareto<-function(x, alpha,b) ifelse(x<b,0,1-(b/x)^alpha) qpareto<-function(p, alpha,b) b*exp(-log(1-p)/alpha) rpareto<-function(n, alpha,b) qpareto(runif(n),alpha=alpha,b=b) #LN lnormtfit_function(x,b, plot.it = F, lty = 1){ if (mode(x) != "numeric") stop("need numeric data") x <- x[!is.na(x)] x <- sort(x) y <- log(x-b) n <- length(y) mu <- mean(y) sigma <- sd(y) if(plot.it) { lines(x, dlnormt(x, meanlog=mu,sdlog=sigma,b=b), lty = lty,col="red")} result <- list(mu=mu,sigma=sigma,b=b) class(result) <- "lnormt" result} dlnormt<-function(x,mu,sigma,b) dlnorm(x-b,meanlog=mu,sdlog=sigma) plnormt<-function(x,mu,sigma,b) plnorm(x-b,meanlog=mu,sdlog=sigma) qlnormt<-function(p,mu,sigma,b) qlnorm(p,meanlog=mu,sdlog=sigma)+b rlnormt<-function(n,mu,sigma,b) rlnorm(n,meanlog=mu,sdlog=sigma)+b ############### # [2] COMMANDS ############### # EXP tmp <- exptfit(data,b=30) ks.test(data,"pexpt",lambda=tmp$lambda,b=tmp$b) # Article: D= 0.2599 pvalue<<0.005 # PARETO tmp <- paretofit(data,b=30) ks.test(data, "ppareto", alpha=tmp$alpha,b=tmp$b) # Article: D=0.14586 pvalue=0.16 # LN tmp <- lnormtfit(data,b=30) ks.test(data,"plnormt",mu=tmp$mu,sigma=tmp$sigma,b=tmp$b) # Article: D=0.07939 pvalue>>0.15

You appear to be applying the KS test after estimating parameters. The distribution theory is for an iid sample from a known continuous distribution (and does not otherwise depend on the distribution). Since your H_0 is not pre-specified, that distribution theory is not correct. (Some corrections have been worked out for say ML fitting of exponential and normal distributions -- by Michael Stephens as I recall.) Also, your `truncated LogNormal'' does not appear to be truncated, rather to be shifted. That''s the same thing for an exponential (for a positive shift), but not for any other distribution. On Thu, 28 Aug 2003, franck allaire wrote:> Dear All > > I am trying to replicate a numerical application (not computed on R) from an > article. Using, ks.test() I computed the exact D value shown in the article > but the p-values I obtain are quite different from the one shown in the > article.And what is the H_0 and H_1 used in the article?> The tests are performed on a sample of 37 values (please see "[0] DATA" > below) for truncated Exponential, Pareto and truncated LogNormal (please see > "[1] FUNCTIONS" below). You can find below the commands I use to compute the > tests ("[2] COMMANDS") and the p-values presented in the article. > > Can anyone explain the difference between these two set of p-values? Maybe > the explanation is linked to my second question: Can anyone confirm me that > p-values calculated in the Kolmogrov-Smirnov g-o-f are independent from the > family of the distribution assumed in H0? > > Anderson-Darling test is also performed in the article. would anyone have > functions to calculate pvalues for exponential, lognormal, weibull, etc? > > Many thanks for your help, > > Franck Allaire > R 1.6.2 > > ############### > # [0] DATA # truncation point is 30 i.e. all values are above 30 > ############### > 39.7 > 582 > 439.9 > 47.1 > 83.9 > 41.2 > 893.1 > 192 > 106.2 > 216.7 > 1243.4 > 52.8 > 351.6 > 36.2 > 431.5 > 57.3 > 1602.1 > 822.2 > 260.1 > 58.7 > 6299.9 > 814.9 > 137.2 > 203.8 > 1263.5 > 53.7 > 1313 > 118.4 > 167.8 > 70.1 > 503.7 > 64.8 > 529.9 > 87.8 > 2465.4 > 317.9 > 2753.9 > > ############### > # [1] FUNCTIONS > ############### > > #EXP > exptfit_function(x,b, plot.it = F, lty = 1){ > > if (mode(x) != "numeric") > stop("need numeric data") > x <- x[!is.na(x)] > x <- sort(x) > lambda <- length(x)/sum(x-b) > if(plot.it) { lines(x, dexpt(x, lambda = lambda, b = b), lty = > lty,col="red")} > result <- list(lambda = lambda, b = b) > class(result) <- "expt" > result} > > dexpt<-function(x,lambda,b) dexp(x-b,rate=lambda) > pexpt<-function(x,lambda,b) pexp(x-b,rate=lambda) > qexpt<-function(p,lambda,b) qexp(p,rate=lambda)+b > rexpt<-function(n,lambda,b) rexp(n,rate=lambda)+b > > #PARETO > paretofit<-function(x,b, plot.it = F, lty = 1){ > > x <- sort(x) > alpha <- length(x)/sum(log(x/b)) > if(plot.it) { lines(x, dpareto(x, alpha = alpha, b = b), lty = > lty,col="red")} > result <- list(alpha = alpha, b = b) > class(result) <- "pareto" > result} > > dpareto<-function(x, alpha,b) ifelse(x<b,0,alpha*(b^alpha)/(x^(alpha+1))) > ppareto<-function(x, alpha,b) ifelse(x<b,0,1-(b/x)^alpha) > qpareto<-function(p, alpha,b) b*exp(-log(1-p)/alpha) > rpareto<-function(n, alpha,b) qpareto(runif(n),alpha=alpha,b=b) > > #LN > lnormtfit_function(x,b, plot.it = F, lty = 1){ > > if (mode(x) != "numeric") > stop("need numeric data") > x <- x[!is.na(x)] > x <- sort(x) > y <- log(x-b) > n <- length(y) > mu <- mean(y) > sigma <- sd(y) > if(plot.it) { lines(x, dlnormt(x, meanlog=mu,sdlog=sigma,b=b), lty > = lty,col="red")} > result <- list(mu=mu,sigma=sigma,b=b) > class(result) <- "lnormt" > result} > > dlnormt<-function(x,mu,sigma,b) dlnorm(x-b,meanlog=mu,sdlog=sigma) > plnormt<-function(x,mu,sigma,b) plnorm(x-b,meanlog=mu,sdlog=sigma) > qlnormt<-function(p,mu,sigma,b) qlnorm(p,meanlog=mu,sdlog=sigma)+b > rlnormt<-function(n,mu,sigma,b) rlnorm(n,meanlog=mu,sdlog=sigma)+b > > ############### > # [2] COMMANDS > ############### > > # EXP > tmp <- exptfit(data,b=30) > ks.test(data,"pexpt",lambda=tmp$lambda,b=tmp$b) > # Article: D= 0.2599 pvalue<<0.005 > > # PARETO > tmp <- paretofit(data,b=30) > ks.test(data, "ppareto", alpha=tmp$alpha,b=tmp$b) > # Article: D=0.14586 pvalue=0.16 > > # LN > tmp <- lnormtfit(data,b=30) > ks.test(data,"plnormt",mu=tmp$mu,sigma=tmp$sigma,b=tmp$b) > # Article: D=0.07939 pvalue>>0.15 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595

With the Shifted Exponential test, H_0 is "data is a sample coming from a Shifted Exponential distribution with shift=30 and lambda = 0.001566907">You appear to be applying the KS test after estimating parameters.I do in order to define H_0 as explained above>The distribution theory is for an iid sample from a known continuous >distribution (and does not otherwise depend on the distribution).That''s what I thougth but this is not what I understand reading: http://www.isi.edu/~kclan/paper/ramp/node11.html>Since your H_0 is not pre-specified, that distribution theory is not >correct.please see above>(Some corrections have been worked out for say ML fitting of exponential >and normal distributions -- by Michael Stephens as I recall.) >Also, your `truncated LogNormal'' does not appear to be truncated, rather >to be shifted. That''s the same thing for an exponential (for a positive >shift), but not for any other distribution.Thanks for this clarification>And what is the H_0 and H_1 used in the article?H_1 is that the sample is not coming from the specified distribution in H_0

On Thu, 28 Aug 2003, Prof Brian Ripley wrote:> You appear to be applying the KS test after estimating parameters. The > distribution theory is for an iid sample from a known continuous > distribution (and does not otherwise depend on the distribution). Since > your H_0 is not pre-specified, that distribution theory is not correct. > (Some corrections have been worked out for say ML fitting of exponential > and normal distributions -- by Michael Stephens as I recall.)Just to amplify this comment a bit, I''m a little worried that the current documentation of of ks.test may make it appear that estimated parameters are ok, or that somehow the p-values computed are "corrected" in some way for their existence -- which I very much doubt. The standard reference on this sort of thing was Durbin''s (1973) SIAM monograph. There is a very nice approach due to Khmaladze (1981) based on the Doob-Meyer decomposition - this is the closest thing that I''m aware of for handling KS type tests with estimated parameters in a general context. url: www.econ.uiuc.edu/~roger/my.html Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820

On Thu, 28 Aug 2003, franck allaire wrote:> With the Shifted Exponential test, H_0 is "data is a sample coming from a > Shifted Exponential distribution with shift=30 and lambda = 0.001566907"You got that after looking at the data. H_0 has to be specified before looking at the data.> >You appear to be applying the KS test after estimating parameters. > > I do in order to define H_0 as explained aboveThat''s invalid.> >The distribution theory is for an iid sample from a known continuous > >distribution (and does not otherwise depend on the distribution). > > That''s what I thougth but this is not what I understand reading:Learn not to believe everything you read on the Web.> >Since your H_0 is not pre-specified, that distribution theory is not > >correct. > > please see aboveRather, please take careful note yourself.> >(Some corrections have been worked out for say ML fitting of exponential > >and normal distributions -- by Michael Stephens as I recall.) > >Also, your `truncated LogNormal'' does not appear to be truncated, rather > >to be shifted. That''s the same thing for an exponential (for a positive > >shift), but not for any other distribution. > > Thanks for this clarification > > >And what is the H_0 and H_1 used in the article? > > H_1 is that the sample is not coming from the specified distribution in H_0Not so: perhaps an iid sample from some other continuous distribution? -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595

On 28 Aug 2003 at 8:06, Roger Koenker wrote: But is it worth it to modify Kolmogorov-Smirnof fot estimated parameters? It has very low power anyhow. If the null hypothesis is "exponential distributio" (which is a scale family), what about using the quantile transformation twice new <- qnorm(pexp(old)) to transform from exponential to normal distribution and the applying shapiro.test ? Kjetil Halvorsen> > On Thu, 28 Aug 2003, Prof Brian Ripley wrote: > > > You appear to be applying the KS test after estimating parameters. The > > distribution theory is for an iid sample from a known continuous > > distribution (and does not otherwise depend on the distribution). Since > > your H_0 is not pre-specified, that distribution theory is not correct. > > (Some corrections have been worked out for say ML fitting of exponential > > and normal distributions -- by Michael Stephens as I recall.) > > Just to amplify this comment a bit, I''m a little worried that the > current documentation of of ks.test may make it appear that estimated > parameters are ok, or that somehow the p-values computed are > "corrected" in some way for their existence -- which I very much > doubt. The standard reference on this sort of thing was Durbin''s (1973) > SIAM monograph. There is a very nice approach due to Khmaladze (1981) > based on the Doob-Meyer decomposition - this is the closest thing > that I''m aware of for handling KS type tests with estimated parameters > in a general context. > > url: www.econ.uiuc.edu/~roger/my.html Roger Koenker > email rkoenker at uiuc.edu Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 Champaign, IL 61820 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help

On Fri, 29 Aug 2003, kjetil brinchmann halvorsen wrote:> But is it worth it to modify Kolmogorov-Smirnof fot estimated > parameters? It has very low power anyhow. If the null hypothesis is > "exponential distributio" (which is a scale family), what about using > the quantile transformation twice > > new <- qnorm(pexp(old))And if old is exponential, but not standard exponential? url: www.econ.uiuc.edu/~roger/my.html Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820

On 29 Aug 2003 at 20:19, Roger Koenker wrote:> On Fri, 29 Aug 2003, kjetil brinchmann halvorsen wrote: > > > But is it worth it to modify Kolmogorov-Smirnof fot estimated > > parameters? It has very low power anyhow. If the null hypothesis is > > "exponential distributio" (which is a scale family), what about using > > the quantile transformation twice > > > > new <- qnorm(pexp(old)) > > And if old is exponential, but not standard exponential?But the shapiro-wilk test is scale invariant, and the exponential family is a scale family, so we can standardize old first without destroying the validity of the shapiro-wilk test? Kjetil Halvorsen> > url: www.econ.uiuc.edu/~roger/my.html Roger Koenker > email rkoenker at uiuc.edu Department of Economics > vox: 217-333-4558 University of Illinois > fax: 217-244-6678 Champaign, IL 61820 > >