thr3ads.net - R help - [R] Pb with ks.test pvalue [Mar 2005]

If this information is useful, please help other people find it:
Share via:

Anthony Landrevie

2005-Mar-18 16:36 UTC

[R] Pb with ks.test pvalue

Hello,

While doing test of normality under R and SAS, in order to prove the efficiency
of R to my company, I notice

that Anderson Darling, Cramer Van Mises and Shapiro-Wilk tests results are quite
the same under the two environnements,

but the Kolmogorov-smirnov p-value really is different.

Here is what I do:
> ks.test(w,pnorm,mean(w),sd(w))
One-sample Kolmogorov-Smirnov test

data: w 

D = 0.2143, p-value = 0.3803

alternative hypothesis: two.sided 
> w
[1] 3837 3334 2208 1745 2576 3208 3746 3523 3430 3480 3116 3428 2184 2383 3500
3866 3542

[18] 3278

 

SAS results:

Kolmogorov-Smirnov D 0.214278 Pr > D 0.0271

Why is the p-value so high under R? Much higher than with other tests.

Best regards,

Anthony Landrevie (French Student)


		
---------------------------------


	[[alternative HTML version deleted]]

Christoph Buser

2005-Mar-22 13:32 UTC

head link

[R] Pb with ks.test pvalue

Dear Anthony

I don't know how SAS calculates the p-value, but in R the
p-value is calculated under the assumption that the parameters
of the distribution (you want to compare with your samples) are
known and not estimated from the data.

In your example you estimate them from the data (by mean(w) and
sd(w) and therefore the p-values are not reliable. 
Somehow you fit the theoretical distribution to well to your
data (using mean and sd, estimated from the data).
Hence you are too conservative and the p.values are two large.
Maybe SAS does a correction for the estimation of the parameters
and therefore gets smaller p-values, but this is pure
speculation since I don't know the way how SAS is doing the
calculation.

I did a simulation and created 10000 samples from a normal
distribution and calculated the ks.test. I expected around 500 
significant results (on the level 0.05) by chance and got 1 or
2. 

I recommend to use graphical methods (e.g. normal plots) to
validate the normal distribution of your data instead of testing
it.  
See also ?qqnorm or ?qqplot.

Regards,

Christoph Buser

--------------------------------------------------------------
Christoph Buser <buser at stat.math.ethz.ch>
Seminar fuer Statistik, LEO C11
ETH (Federal Inst. Technology)	8092 Zurich	 SWITZERLAND
phone: x-41-1-632-5414		fax: 632-1228
http://stat.ethz.ch/~buser/
--------------------------------------------------------------

Anthony Landrevie writes:
 > 
 > Hello,
 > 
 > While doing test of normality under R and SAS, in order to prove the
efficiency of R to my company, I notice
 > 
 > that Anderson Darling, Cramer Van Mises and Shapiro-Wilk tests results are
quite the same under the two environnements,
 > 
 > but the Kolmogorov-smirnov p-value really is different.
 > 
 > Here is what I do:
 > 
 > > ks.test(w,pnorm,mean(w),sd(w))
 > 
 > One-sample Kolmogorov-Smirnov test
 > 
 > data: w 
 > 
 > D = 0.2143, p-value = 0.3803
 > 
 > alternative hypothesis: two.sided 
 > 
 > > w
 > 
 > [1] 3837 3334 2208 1745 2576 3208 3746 3523 3430 3480 3116 3428 2184 2383
3500 3866 3542
 > 
 > [18] 3278
 > 
 >  
 > 
 > SAS results:
 > 
 > Kolmogorov-Smirnov D 0.214278 Pr > D 0.0271
 > 
 > Why is the p-value so high under R? Much higher than with other tests.
 > 
 > Best regards,
 > 
 > Anthony Landrevie (French Student)
 > 
 > 
 > 		
 > ---------------------------------
 > 
 > 
 > 	[[alternative HTML version deleted]]
 > 
 > ______________________________________________
 > R-help at stat.math.ethz.ch mailing list
 > https://stat.ethz.ch/mailman/listinfo/r-help
 > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Maybe Matching Threads

Search for more reasonably related threads

R help - Mar 2005 - Pb with ks.test pvalue

[R] Pb with ks.test pvalue

[R] Pb with ks.test pvalue

Maybe Matching Threads