thr3ads.net - R help - [R] Problems with ks.test() [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Jochen1980

2011-Jul-29 09:07 UTC

[R] Problems with ks.test()

Hi, 

I got two data point vectors. Now I want to make a ks.test(). I you print 
both vectors you will see, that they fit pretty fine. Here is a picture:
http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png

As you can see there is one histogram and moreover there is the gumbel
density 
function plotted. Now I took to bin-mids and the bin-height for vector1 and 
computed the distribution-values to all bin-mids as vector2. 

I pass these two vectors to ks.test(). Are those the right vectors, if I
want
to decide afterwards, if my experiment-data is gumbel-distributed? 

Surprisingly the p-value changes tremendously if I calculate more digits out
of 
my theoretical formula. If I round to 0 digits, p is 1, if I round to 4
digits,
p drops to 0 - how could this happen, I thought more digits will bring more
accurate results?!

XXXX Case 0 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXX 
  [1]   0   0   0   0   0  24  74  98 133 147 134 120  89  69  46  31  16  
7
 [19]   7   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [91]   0   0   0   0   0   0   0   0   0   0
  [1]   0   0   0   0   1  10  49 113 160 168 147 113  81  55  37  24  15 
10
 [19]   6   4   2   2   1   1   0   0   0   0   0   0   0   0   0   0   0  
0
 [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [91]   0   0   0   0   0   0   0   0   0   0
[1] "Ergebnisse"
[1] "Analyse der Eingangsdaten"
[1] "Mean:  0.104537195"
[1] "SAbw.:  0.0277657985898433"
[1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung"
[1] "Mue:  0.0920411082987717"
[1] "Beta:  0.0216489043196013"
[1] "KS-Test ->  1000  Werte,  100  Bins, x: Klassenmitten, y1, y2
Histogrammh?hen"
[1] "KST D:  0.04"
[1] "KST P:  1"

XXX Case 4 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  [1]   0   0   0   0   0  24  74  98 133 147 134 120  89  69  46  31  16  
7
 [19]   7   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [91]   0   0   0   0   0   0   0   0   0   0
  [1]   0.000   0.000   0.000   0.006   0.622  10.094  49.271 112.776
160.174
 [10] 168.419 146.527 113.137  81.026  55.344  36.690  23.870  15.347  
9.793
 [19]   6.220   3.939   2.490   1.572   0.992   0.625   0.394   0.248  
0.157
 [28]   0.099   0.062   0.039   0.025   0.016   0.010   0.006   0.004  
0.002
 [37]   0.002   0.001   0.001   0.000   0.000   0.000   0.000   0.000  
0.000
 [46]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [55]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [64]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [73]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [82]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [91]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
[100]   0.000
[1] "Ergebnisse"
[1] "Analyse der Eingangsdaten"
[1] "Mean:  0.104537195"
[1] "SAbw.:  0.0277657985898433"
[1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung"
[1] "Mue:  0.0920411082987717"
[1] "Beta:  0.0216489043196013"
[1] "KS-Test ->  1000  Werte,  100  Bins, x: Klassenmitten, y1, y2
Histogrammh?hen"
[1] "KST D:  0.2"
[1] "KST P:  0.0366"

Thanks in advance for some help.
Jochen

--
View this message in context:
http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html
Sent from the R help mailing list archive at Nabble.com.

Jochen1980

2011-Jul-30 09:28 UTC

head link

[R] Problems with ks.test()

Hi, I used a ks-function of another library (kstwo() of numerical recipes, a
mathematics book) to test it for myself and the same happens there - I
cannot understand why this observation happens? I hope someone can
'enlighten' me.

--
View this message in context:
http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3706072.html
Sent from the R help mailing list archive at Nabble.com.

Greg Snow

2011-Jul-30 18:19 UTC

head link

[R] Problems with ks.test()

What makes you think that the p-value of 1 is more accurate than the p-value of
0?  The K-S test will show significance for very small differences in
distributions when the sample size is big enough.

Also, it is not clear that you are using it correctly.  Generally you would just
give the raw data and the CDF to the function, don't worry about midpoints.

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Jochen1980
Sent: Friday, July 29, 2011 3:08 AM
To: r-help at r-project.org
Subject: [R] Problems with ks.test()

Hi, 

I got two data point vectors. Now I want to make a ks.test(). I you print 
both vectors you will see, that they fit pretty fine. Here is a picture:
http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png

As you can see there is one histogram and moreover there is the gumbel
density 
function plotted. Now I took to bin-mids and the bin-height for vector1 and 
computed the distribution-values to all bin-mids as vector2. 

I pass these two vectors to ks.test(). Are those the right vectors, if I
want
to decide afterwards, if my experiment-data is gumbel-distributed? 

Surprisingly the p-value changes tremendously if I calculate more digits out
of 
my theoretical formula. If I round to 0 digits, p is 1, if I round to 4
digits,
p drops to 0 - how could this happen, I thought more digits will bring more
accurate results?!

XXXX Case 0 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXX 
  [1]   0   0   0   0   0  24  74  98 133 147 134 120  89  69  46  31  16  
7
 [19]   7   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [91]   0   0   0   0   0   0   0   0   0   0
  [1]   0   0   0   0   1  10  49 113 160 168 147 113  81  55  37  24  15 
10
 [19]   6   4   2   2   1   1   0   0   0   0   0   0   0   0   0   0   0  
0
 [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [91]   0   0   0   0   0   0   0   0   0   0
[1] "Ergebnisse"
[1] "Analyse der Eingangsdaten"
[1] "Mean:  0.104537195"
[1] "SAbw.:  0.0277657985898433"
[1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung"
[1] "Mue:  0.0920411082987717"
[1] "Beta:  0.0216489043196013"
[1] "KS-Test ->  1000  Werte,  100  Bins, x: Klassenmitten, y1, y2
Histogrammh?hen"
[1] "KST D:  0.04"
[1] "KST P:  1"

XXX Case 4 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
  [1]   0   0   0   0   0  24  74  98 133 147 134 120  89  69  46  31  16  
7
 [19]   7   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0  
0
 [91]   0   0   0   0   0   0   0   0   0   0
  [1]   0.000   0.000   0.000   0.006   0.622  10.094  49.271 112.776
160.174
 [10] 168.419 146.527 113.137  81.026  55.344  36.690  23.870  15.347  
9.793
 [19]   6.220   3.939   2.490   1.572   0.992   0.625   0.394   0.248  
0.157
 [28]   0.099   0.062   0.039   0.025   0.016   0.010   0.006   0.004  
0.002
 [37]   0.002   0.001   0.001   0.000   0.000   0.000   0.000   0.000  
0.000
 [46]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [55]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [64]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [73]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [82]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
 [91]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000  
0.000
[100]   0.000
[1] "Ergebnisse"
[1] "Analyse der Eingangsdaten"
[1] "Mean:  0.104537195"
[1] "SAbw.:  0.0277657985898433"
[1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung"
[1] "Mue:  0.0920411082987717"
[1] "Beta:  0.0216489043196013"
[1] "KS-Test ->  1000  Werte,  100  Bins, x: Klassenmitten, y1, y2
Histogrammh?hen"
[1] "KST D:  0.2"
[1] "KST P:  0.0366"

Thanks in advance for some help.
Jochen

--
View this message in context:
http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Peter Ehlers

2011-Aug-01 08:25 UTC

head link

[R] Problems with ks.test()

(I'm replying to your original post because your follow-up omits the
context.)

The K-S test is designed for continuous distributions. You have far
too many zeros in your data to get anything reasonable out of the
test. For your data, the K-S statistic is the difference in the
(e)cdfs at zero. Your results just show that this can be sensitive
to the degree of rounding used for the theoretical cdf.

Peter Ehlers

On 2011-07-29 02:07, Jochen1980 wrote:> Hi,
>
> I got two data point vectors. Now I want to make a ks.test(). I you print
> both vectors you will see, that they fit pretty fine. Here is a picture:
> http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png
>
> As you can see there is one histogram and moreover there is the gumbel
> density
> function plotted. Now I took to bin-mids and the bin-height for vector1 and
> computed the distribution-values to all bin-mids as vector2.
>
> I pass these two vectors to ks.test(). Are those the right vectors, if I
> want
> to decide afterwards, if my experiment-data is gumbel-distributed?
>
> Surprisingly the p-value changes tremendously if I calculate more digits
out
> of
> my theoretical formula. If I round to 0 digits, p is 1, if I round to 4
> digits,
> p drops to 0 - how could this happen, I thought more digits will bring more
> accurate results?!
>
> XXXX Case 0 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXX
>    [1]   0   0   0   0   0  24  74  98 133 147 134 120  89  69  46  31  16
> 7
>   [19]   7   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [91]   0   0   0   0   0   0   0   0   0   0
>    [1]   0   0   0   0   1  10  49 113 160 168 147 113  81  55  37  24  15
> 10
>   [19]   6   4   2   2   1   1   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [91]   0   0   0   0   0   0   0   0   0   0
> [1] "Ergebnisse"
> [1] "Analyse der Eingangsdaten"
> [1] "Mean:  0.104537195"
> [1] "SAbw.:  0.0277657985898433"
> [1] "Parameter-Berechnung der Daten bei angenommener
Gumbelverteilung"
> [1] "Mue:  0.0920411082987717"
> [1] "Beta:  0.0216489043196013"
> [1] "KS-Test ->   1000  Werte,  100  Bins, x: Klassenmitten, y1, y2
> Histogrammh?hen"
> [1] "KST D:  0.04"
> [1] "KST P:  1"
>
> XXX Case 4 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>    [1]   0   0   0   0   0  24  74  98 133 147 134 120  89  69  46  31  16
> 7
>   [19]   7   3   2   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [37]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [55]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [73]   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
> 0
>   [91]   0   0   0   0   0   0   0   0   0   0
>    [1]   0.000   0.000   0.000   0.006   0.622  10.094  49.271 112.776
> 160.174
>   [10] 168.419 146.527 113.137  81.026  55.344  36.690  23.870  15.347
> 9.793
>   [19]   6.220   3.939   2.490   1.572   0.992   0.625   0.394   0.248
> 0.157
>   [28]   0.099   0.062   0.039   0.025   0.016   0.010   0.006   0.004
> 0.002
>   [37]   0.002   0.001   0.001   0.000   0.000   0.000   0.000   0.000
> 0.000
>   [46]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
> 0.000
>   [55]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
> 0.000
>   [64]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
> 0.000
>   [73]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
> 0.000
>   [82]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
> 0.000
>   [91]   0.000   0.000   0.000   0.000   0.000   0.000   0.000   0.000
> 0.000
> [100]   0.000
> [1] "Ergebnisse"
> [1] "Analyse der Eingangsdaten"
> [1] "Mean:  0.104537195"
> [1] "SAbw.:  0.0277657985898433"
> [1] "Parameter-Berechnung der Daten bei angenommener
Gumbelverteilung"
> [1] "Mue:  0.0920411082987717"
> [1] "Beta:  0.0216489043196013"
> [1] "KS-Test ->   1000  Werte,  100  Bins, x: Klassenmitten, y1, y2
> Histogrammh?hen"
> [1] "KST D:  0.2"
> [1] "KST P:  0.0366"
>
> Thanks in advance for some help.
> Jochen
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Maybe Matching Threads

Search for more maybe matching threads

R help - Jul 2011 - Problems with ks.test()

[R] Problems with ks.test()

[R] Problems with ks.test()

[R] Problems with ks.test()

[R] Problems with ks.test()

Maybe Matching Threads