R Users:
My question is probably more about elementary statistics than the
mechanics of using R, but I've been dabbling in R (version 2.2.0) and
used it recently to test some data .
I have a relatively small set of observations (n = 12) of arsenic
concentrations in background groundwater and wanted to test my
assumption of normality. I used the Shapiro-Wilk test (by calling
shapiro.test() in R) and I'm not sure how to interpret the output.
Here's the input/output from the R console:
>As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13)
>shapiro.test(As)
Shapiro-Wilk normality test
data: As
W = 0.9513, p-value = 0.6555
How do I interpret this? I understand, from poking around the internet,
that the higher the W statistic the "more normal" the data.
What is the null hypothesis - that the data is normally distributed?
What does the p-value tell me? 65.55% chance of what - getting
W-statistic greater than or equal to 0.9513 (I picked this up from the
Dalgaard book, Introductory Statistics with R, but its not really
sinking in with respect to how it applies to a Shipiro Wilk test).?
The method description - retrieved using ?shapiro.test() - is a bit
light on details.
Thanks much.
-------------------------------------
Matthew C. Findley, CPSSc
Environmental Scientist
CH2M HILL
mfindley at ch2m.com
<Matthew.Findley at ch2m.com> writes:> R Users: > > My question is probably more about elementary statistics than the > mechanics of using R, but I've been dabbling in R (version 2.2.0) and > used it recently to test some data . > > I have a relatively small set of observations (n = 12) of arsenic > concentrations in background groundwater and wanted to test my > assumption of normality. I used the Shapiro-Wilk test (by calling > shapiro.test() in R) and I'm not sure how to interpret the output. > Here's the input/output from the R console: > > >As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13) > >shapiro.test(As) > > Shapiro-Wilk normality test > > data: As > W = 0.9513, p-value = 0.6555 > > How do I interpret this? I understand, from poking around the internet, > that the higher the W statistic the "more normal" the data. > > What is the null hypothesis - that the data is normally distributed?Yup.> What does the p-value tell me? 65.55% chance of what - getting > W-statistic greater than or equal to 0.9513 (I picked this up from the > Dalgaard book, Introductory Statistics with R, but its not really > sinking in with respect to how it applies to a Shipiro Wilk test).?*Smaller* or equal - W=1.0 is the "perfect fit". The W statistic is pretty much the Pearson correlation applied to the curve drawn by qqnorm(). (The exact definition of what goes on the x axis differs slightly, I believe.) A low p-value would indicate that the W is too extreme to be explained by chance variation - i.e. evidence against normal distribution. In the present case you have no evidence against normal distribution (beware that this is not evidence _for_ normality). (Personally, I'm not too happy about these normality tests. They tend to lack power in small samples and in large samples they often reject distributions which are perfectly adequate for normal-theory analysis. Learning to evaluate a QQ plot seems a better idea.)> The method description - retrieved using ?shapiro.test() - is a bit > light on details.There are references therein, though... -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> -----Original Message----- > From: r-help-bounces w stat.math.ethz.ch [mailto:r-help- > bounces w stat.math.ethz.ch] On Behalf Of Matthew.Findley w ch2m.com > Sent: Wednesday, July 12, 2006 11:14 PM > To: r-help w stat.math.ethz.ch > Subject: [R] shapiro.test() output > > R Users: > > My question is probably more about elementary statistics than the > mechanics of using R, but I've been dabbling in R (version 2.2.0) and > used it recently to test some data . > > I have a relatively small set of observations (n = 12) of arsenic > concentrations in background groundwater and wanted to test my > assumption of normality. I used the Shapiro-Wilk test (by calling > shapiro.test() in R) and I'm not sure how to interpret the output. > Here's the input/output from the R console: > > >As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13) > >shapiro.test(As) > > Shapiro-Wilk normality test > > data: As > W = 0.9513, p-value = 0.6555 > > How do I interpret this? I understand, from poking around the internet, > that the higher the W statistic the "more normal" the data. > > What is the null hypothesis - that the data is normally distributed? > > What does the p-value tell me? 65.55% chance of what - getting > W-statistic greater than or equal to 0.9513 (I picked this up from the > Dalgaard book, Introductory Statistics with R, but its not really > sinking in with respect to how it applies to a Shipiro Wilk test).? > > The method description - retrieved using ?shapiro.test() - is a bit > light on details. > > Thanks much.The null hypothesis: the data is normally distributed. If p-value > \alpha (significance level) it means that there is no evidence to reject null hypothesis. Otherwise you reject - your data is not normally distributed.