thr3ads.net - R help - [R] ks.test() [Aug 2003]

If this information is useful, please help other people find it:
Share via:

franck allaire

2003-Aug-28 11:50 UTC

[R] ks.test()

Dear All

I am trying to replicate a numerical application (not computed on R) from an 
article. Using, ks.test() I computed the exact D value shown in the article 
but the p-values I obtain are quite different from the one shown in the 
article.

The tests are performed on a sample of 37 values (please see "[0]
DATA"
below) for truncated Exponential, Pareto and truncated LogNormal (please see 
"[1] FUNCTIONS" below). You can find below the commands I use to
compute the
tests ("[2] COMMANDS") and the p-values presented in the article.

Can anyone explain the difference between these two set of p-values? Maybe 
the explanation is linked to my second question: Can anyone confirm me that 
p-values calculated in the Kolmogrov-Smirnov g-o-f are independent from the 
family of the distribution assumed in H0?

Anderson-Darling test is also performed in the article. would anyone have 
functions to calculate pvalues for exponential, lognormal, weibull, etc?

Many thanks for your help,

Franck Allaire
R 1.6.2

###############
# [0] DATA # truncation point is 30 i.e. all values are above 30
###############
39.7
582
439.9
47.1
83.9
41.2
893.1
192
106.2
216.7
1243.4
52.8
351.6
36.2
431.5
57.3
1602.1
822.2
260.1
58.7
6299.9
814.9
137.2
203.8
1263.5
53.7
1313
118.4
167.8
70.1
503.7
64.8
529.9
87.8
2465.4
317.9
2753.9

###############
# [1] FUNCTIONS
###############

#EXP
exptfit_function(x,b, plot.it = F, lty = 1){

          if (mode(x) != "numeric")
                stop("need numeric data")
          x <- x[!is.na(x)]
          x <- sort(x)
          lambda <- length(x)/sum(x-b)
          if(plot.it) { lines(x, dexpt(x, lambda = lambda, b = b), lty = 
lty,col="red")}
          result <- list(lambda = lambda, b = b)
          class(result) <- "expt"
          result}

dexpt<-function(x,lambda,b) dexp(x-b,rate=lambda)
pexpt<-function(x,lambda,b) pexp(x-b,rate=lambda)
qexpt<-function(p,lambda,b) qexp(p,rate=lambda)+b
rexpt<-function(n,lambda,b) rexp(n,rate=lambda)+b

#PARETO
paretofit<-function(x,b, plot.it = F, lty = 1){

          x <- sort(x)
          alpha <- length(x)/sum(log(x/b))
          if(plot.it) { lines(x, dpareto(x, alpha = alpha, b = b), lty = 
lty,col="red")}
          result <- list(alpha = alpha, b = b)
          class(result) <- "pareto"
          result}

dpareto<-function(x, alpha,b) ifelse(x<b,0,alpha*(b^alpha)/(x^(alpha+1)))
ppareto<-function(x, alpha,b) ifelse(x<b,0,1-(b/x)^alpha)
qpareto<-function(p, alpha,b) b*exp(-log(1-p)/alpha)
rpareto<-function(n, alpha,b) qpareto(runif(n),alpha=alpha,b=b)

#LN
lnormtfit_function(x,b, plot.it = F, lty = 1){

          if (mode(x) != "numeric")
                stop("need numeric data")
          x <- x[!is.na(x)]
          x <- sort(x)
          y <- log(x-b)
          n <- length(y)
          mu <- mean(y)
          sigma <- sd(y)
          if(plot.it) { lines(x, dlnormt(x, meanlog=mu,sdlog=sigma,b=b), lty 
= lty,col="red")}
          result <- list(mu=mu,sigma=sigma,b=b)
          class(result) <- "lnormt"
          result}

dlnormt<-function(x,mu,sigma,b) dlnorm(x-b,meanlog=mu,sdlog=sigma)
plnormt<-function(x,mu,sigma,b) plnorm(x-b,meanlog=mu,sdlog=sigma)
qlnormt<-function(p,mu,sigma,b) qlnorm(p,meanlog=mu,sdlog=sigma)+b
rlnormt<-function(n,mu,sigma,b) rlnorm(n,meanlog=mu,sdlog=sigma)+b

###############
# [2] COMMANDS
###############

# EXP
tmp <- exptfit(data,b=30)
ks.test(data,"pexpt",lambda=tmp$lambda,b=tmp$b)
# Article: D= 0.2599 pvalue<<0.005

# PARETO
tmp <- paretofit(data,b=30)
ks.test(data, "ppareto", alpha=tmp$alpha,b=tmp$b)
# Article: D=0.14586 pvalue=0.16

# LN
tmp <- lnormtfit(data,b=30)
ks.test(data,"plnormt",mu=tmp$mu,sigma=tmp$sigma,b=tmp$b)
# Article: D=0.07939 pvalue>>0.15

Prof Brian Ripley

2003-Aug-28 12:30 UTC

head link

[R] ks.test()

You appear to be applying the KS test after estimating parameters.  The 
distribution theory is for an iid sample from a known continuous 
distribution (and does not otherwise depend on the distribution).  Since 
your H_0 is not pre-specified, that distribution theory is not correct.
(Some corrections have been worked out for say ML fitting of exponential 
and normal distributions -- by Michael Stephens as I recall.)

Also, your `truncated LogNormal' does not appear to be truncated, rather
to be shifted.  That's the same thing for an exponential (for a positive
shift), but not for any other distribution.

On Thu, 28 Aug 2003, franck allaire wrote:
> Dear All
> 
> I am trying to replicate a numerical application (not computed on R) from
an
> article. Using, ks.test() I computed the exact D value shown in the article
> but the p-values I obtain are quite different from the one shown in the 
> article.
And what is the H_0 and H_1 used in the article?
> The tests are performed on a sample of 37 values (please see "[0]
DATA"
> below) for truncated Exponential, Pareto and truncated LogNormal (please
see
> "[1] FUNCTIONS" below). You can find below the commands I use to
compute the
> tests ("[2] COMMANDS") and the p-values presented in the article.
> 
> Can anyone explain the difference between these two set of p-values? Maybe 
> the explanation is linked to my second question: Can anyone confirm me that
> p-values calculated in the Kolmogrov-Smirnov g-o-f are independent from the
> family of the distribution assumed in H0?
> 
> Anderson-Darling test is also performed in the article. would anyone have 
> functions to calculate pvalues for exponential, lognormal, weibull, etc?
> 
> Many thanks for your help,
> 
> Franck Allaire
> R 1.6.2
> 
> ###############
> # [0] DATA # truncation point is 30 i.e. all values are above 30
> ###############
> 39.7
> 582
> 439.9
> 47.1
> 83.9
> 41.2
> 893.1
> 192
> 106.2
> 216.7
> 1243.4
> 52.8
> 351.6
> 36.2
> 431.5
> 57.3
> 1602.1
> 822.2
> 260.1
> 58.7
> 6299.9
> 814.9
> 137.2
> 203.8
> 1263.5
> 53.7
> 1313
> 118.4
> 167.8
> 70.1
> 503.7
> 64.8
> 529.9
> 87.8
> 2465.4
> 317.9
> 2753.9
> 
> ###############
> # [1] FUNCTIONS
> ###############
> 
> #EXP
> exptfit_function(x,b, plot.it = F, lty = 1){
> 
>           if (mode(x) != "numeric")
>                 stop("need numeric data")
>           x <- x[!is.na(x)]
>           x <- sort(x)
>           lambda <- length(x)/sum(x-b)
>           if(plot.it) { lines(x, dexpt(x, lambda = lambda, b = b), lty = 
> lty,col="red")}
>           result <- list(lambda = lambda, b = b)
>           class(result) <- "expt"
>           result}
> 
> dexpt<-function(x,lambda,b) dexp(x-b,rate=lambda)
> pexpt<-function(x,lambda,b) pexp(x-b,rate=lambda)
> qexpt<-function(p,lambda,b) qexp(p,rate=lambda)+b
> rexpt<-function(n,lambda,b) rexp(n,rate=lambda)+b
> 
> #PARETO
> paretofit<-function(x,b, plot.it = F, lty = 1){
> 
>           x <- sort(x)
>           alpha <- length(x)/sum(log(x/b))
>           if(plot.it) { lines(x, dpareto(x, alpha = alpha, b = b), lty = 
> lty,col="red")}
>           result <- list(alpha = alpha, b = b)
>           class(result) <- "pareto"
>           result}
> 
> dpareto<-function(x, alpha,b)
ifelse(x<b,0,alpha*(b^alpha)/(x^(alpha+1)))
> ppareto<-function(x, alpha,b) ifelse(x<b,0,1-(b/x)^alpha)
> qpareto<-function(p, alpha,b) b*exp(-log(1-p)/alpha)
> rpareto<-function(n, alpha,b) qpareto(runif(n),alpha=alpha,b=b)
> 
> #LN
> lnormtfit_function(x,b, plot.it = F, lty = 1){
> 
>           if (mode(x) != "numeric")
>                 stop("need numeric data")
>           x <- x[!is.na(x)]
>           x <- sort(x)
>           y <- log(x-b)
>           n <- length(y)
>           mu <- mean(y)
>           sigma <- sd(y)
>           if(plot.it) { lines(x, dlnormt(x, meanlog=mu,sdlog=sigma,b=b),
lty
> = lty,col="red")}
>           result <- list(mu=mu,sigma=sigma,b=b)
>           class(result) <- "lnormt"
>           result}
> 
> dlnormt<-function(x,mu,sigma,b) dlnorm(x-b,meanlog=mu,sdlog=sigma)
> plnormt<-function(x,mu,sigma,b) plnorm(x-b,meanlog=mu,sdlog=sigma)
> qlnormt<-function(p,mu,sigma,b) qlnorm(p,meanlog=mu,sdlog=sigma)+b
> rlnormt<-function(n,mu,sigma,b) rlnorm(n,meanlog=mu,sdlog=sigma)+b
> 
> ###############
> # [2] COMMANDS
> ###############
> 
> # EXP
> tmp <- exptfit(data,b=30)
> ks.test(data,"pexpt",lambda=tmp$lambda,b=tmp$b)
> # Article: D= 0.2599 pvalue<<0.005
> 
> # PARETO
> tmp <- paretofit(data,b=30)
> ks.test(data, "ppareto", alpha=tmp$alpha,b=tmp$b)
> # Article: D=0.14586 pvalue=0.16
> 
> # LN
> tmp <- lnormtfit(data,b=30)
> ks.test(data,"plnormt",mu=tmp$mu,sigma=tmp$sigma,b=tmp$b)
> # Article: D=0.07939 pvalue>>0.15
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 
> 
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

franck allaire

2003-Aug-28 13:06 UTC

head link

[R] ks.test()

With the Shifted Exponential test, H_0 is "data is a sample coming from a 
Shifted Exponential distribution with shift=30 and lambda = 0.001566907"
>You appear to be applying the KS test after estimating parameters.
I do in order to define H_0 as explained above
>The distribution theory is for an iid sample from a known continuous
>distribution (and does not otherwise depend on the distribution).
That's what I thougth but this is not what I understand reading:
http://www.isi.edu/~kclan/paper/ramp/node11.html
>Since your H_0 is not pre-specified, that distribution theory is not 
>correct.
please see above
>(Some corrections have been worked out for say ML fitting of exponential
>and normal distributions -- by Michael Stephens as I recall.)
>Also, your `truncated LogNormal' does not appear to be truncated, rather
>to be shifted.  That's the same thing for an exponential (for a positive
>shift), but not for any other distribution.
Thanks for this clarification
>And what is the H_0 and H_1 used in the article?
H_1 is that the sample is not coming from the specified distribution in H_0

Maybe Matching Threads

Search for more possibly parallel threads

R help - Aug 2003 - ks.test()

[R] ks.test()

[R] ks.test()

[R] ks.test()

Maybe Matching Threads