thr3ads.net - R help - [R] About normality tests... [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Ralf B

2010-Jun-23 18:05 UTC

[R] About normality tests...

Hi all,

I have two very large samples of data (10000+ data points) and would
like to perform normality tests on it. I know that p < .05 means that
a data set is considered as not normal with any of the two tests. I am
also aware that large samples tend to lead more likely to normal
results (Andy Field, 2005).

I have a few questions to ensure that I am using them right.

1) The Shapiro-Wilk test requires to provide mean and sd. Is is
correct to add here the mean and sd of the data itself (since I am
comparing to a normal distribution with the same parameters) ?

mySD <- sd(mydata$myfield)
myMean <- mean(mydata$myfield)
shapiro.test(rnorm(100, mean = myMean, sd = mySD))

2) If I just want to test each distribution individually, I assume
that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct?

3) If I simply want to know if normality exists or not, what should I
put for the parameter 'alternative' ? Does it actually matter?

alternative = c("two.sided", "less", "greater")

Thank you,
Ralf

Peter Ehlers

2010-Jun-23 18:35 UTC

head link

[R] About normality tests...

On 2010-06-23 12:05, Ralf B wrote:> Hi all,
>
> I have two very large samples of data (10000+ data points) and would
> like to perform normality tests on it. I know that p<  .05 means that
> a data set is considered as not normal with any of the two tests. I am
> also aware that large samples tend to lead more likely to normal
> results (Andy Field, 2005).
I that depends on what you mean by 'tend to lead ...'
>
> I have a few questions to ensure that I am using them right.
>
> 1) The Shapiro-Wilk test requires to provide mean and sd. Is is
> correct to add here the mean and sd of the data itself (since I am
> comparing to a normal distribution with the same parameters) ?
>
> mySD<- sd(mydata$myfield)
> myMean<- mean(mydata$myfield)
> shapiro.test(rnorm(100, mean = myMean, sd = mySD))
I don't think that your understanding of the S-W test is correct.
You would just do:

  shapiro.test(mydata$myfield)

to test for Normality. However, shapiro.test() won't accept
sample sizes greater than 5000. So use ks.test. Or use a
graphical method: I like qq.plot in the 'car' package.
>
> 2) If I just want to test each distribution individually, I assume
> that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct?
I don't understand this. What do you mean by 'test ...
individually'?
>
> 3) If I simply want to know if normality exists or not, what should I
> put for the parameter 'alternative' ? Does it actually matter?
>
> alternative = c("two.sided", "less",
"greater")
Leave it at the default 'two.sided' unless you have good
reason to suspect that the cdf lies above or below the Normal cdf.

   -Peter Ehlers
>
> Thank you,
> Ralf
>

Greg Snow

2010-Jun-23 19:00 UTC

head link

[R] About normality tests...

Before doing normality tests look at fortune(117) and fortune(234).  If you
still feel the need to have the computer print out a p-value for a test of exact
normality, then try SnowsPenultimateNormalityTest in the TeachingDemos package. 
If you want a test that is more meaningful, then look at vis.test (also in the
TeachingDemos package).


-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Ralf B
> Sent: Wednesday, June 23, 2010 12:05 PM
> To: r-help at r-project.org
> Subject: [R] About normality tests...
> 
> Hi all,
> 
> I have two very large samples of data (10000+ data points) and would
> like to perform normality tests on it. I know that p < .05 means that
> a data set is considered as not normal with any of the two tests. I am
> also aware that large samples tend to lead more likely to normal
> results (Andy Field, 2005).
> 
> I have a few questions to ensure that I am using them right.
> 
> 1) The Shapiro-Wilk test requires to provide mean and sd. Is is
> correct to add here the mean and sd of the data itself (since I am
> comparing to a normal distribution with the same parameters) ?
> 
> mySD <- sd(mydata$myfield)
> myMean <- mean(mydata$myfield)
> shapiro.test(rnorm(100, mean = myMean, sd = mySD))
> 
> 2) If I just want to test each distribution individually, I assume
> that I am doing a one-sample Kolmogorov-Smirnov test. Is that correct?
> 
> 3) If I simply want to know if normality exists or not, what should I
> put for the parameter 'alternative' ? Does it actually matter?
> 
> alternative = c("two.sided", "less",
"greater")
> 
> Thank you,
> Ralf
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jun 2010 - About normality tests...

[R] About normality tests...

[R] About normality tests...

[R] About normality tests...

Possibly Parallel Threads