R Users: My question is probably more about elementary statistics than the mechanics of using R, but I've been dabbling in R (version 2.2.0) and used it recently to test some data . I have a relatively small set of observations (n = 12) of arsenic concentrations in background groundwater and wanted to test my assumption of normality. I used the Shapiro-Wilk test (by calling shapiro.test() in R) and I'm not sure how to interpret the output. Here's the input/output from the R console: >As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13) >shapiro.test(As) Shapiro-Wilk normality test data: As W = 0.9513, p-value = 0.6555 How do I interpret this? I understand, from poking around the internet, that the higher the W statistic the "more normal" the data. What is the null hypothesis - that the data is normally distributed? What does the p-value tell me? 65.55% chance of what - getting W-statistic greater than or equal to 0.9513 (I picked this up from the Dalgaard book, Introductory Statistics with R, but its not really sinking in with respect to how it applies to a Shipiro Wilk test).? The method description - retrieved using ?shapiro.test() - is a bit light on details. Thanks much. ------------------------------------- Matthew C. Findley, CPSSc Environmental Scientist CH2M HILL mfindley at ch2m.com
<Matthew.Findley at ch2m.com> writes:> R Users: > > My question is probably more about elementary statistics than the > mechanics of using R, but I've been dabbling in R (version 2.2.0) and > used it recently to test some data . > > I have a relatively small set of observations (n = 12) of arsenic > concentrations in background groundwater and wanted to test my > assumption of normality. I used the Shapiro-Wilk test (by calling > shapiro.test() in R) and I'm not sure how to interpret the output. > Here's the input/output from the R console: > > >As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13) > >shapiro.test(As) > > Shapiro-Wilk normality test > > data: As > W = 0.9513, p-value = 0.6555 > > How do I interpret this? I understand, from poking around the internet, > that the higher the W statistic the "more normal" the data. > > What is the null hypothesis - that the data is normally distributed?Yup.> What does the p-value tell me? 65.55% chance of what - getting > W-statistic greater than or equal to 0.9513 (I picked this up from the > Dalgaard book, Introductory Statistics with R, but its not really > sinking in with respect to how it applies to a Shipiro Wilk test).?*Smaller* or equal - W=1.0 is the "perfect fit". The W statistic is pretty much the Pearson correlation applied to the curve drawn by qqnorm(). (The exact definition of what goes on the x axis differs slightly, I believe.) A low p-value would indicate that the W is too extreme to be explained by chance variation - i.e. evidence against normal distribution. In the present case you have no evidence against normal distribution (beware that this is not evidence _for_ normality). (Personally, I'm not too happy about these normality tests. They tend to lack power in small samples and in large samples they often reject distributions which are perfectly adequate for normal-theory analysis. Learning to evaluate a QQ plot seems a better idea.)> The method description - retrieved using ?shapiro.test() - is a bit > light on details.There are references therein, though... -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
> -----Original Message----- > From: r-help-bounces w stat.math.ethz.ch [mailto:r-help- > bounces w stat.math.ethz.ch] On Behalf Of Matthew.Findley w ch2m.com > Sent: Wednesday, July 12, 2006 11:14 PM > To: r-help w stat.math.ethz.ch > Subject: [R] shapiro.test() output > > R Users: > > My question is probably more about elementary statistics than the > mechanics of using R, but I've been dabbling in R (version 2.2.0) and > used it recently to test some data . > > I have a relatively small set of observations (n = 12) of arsenic > concentrations in background groundwater and wanted to test my > assumption of normality. I used the Shapiro-Wilk test (by calling > shapiro.test() in R) and I'm not sure how to interpret the output. > Here's the input/output from the R console: > > >As = c(13, 17, 23, 9.5, 20, 15, 11, 17, 21, 14, 22, 13) > >shapiro.test(As) > > Shapiro-Wilk normality test > > data: As > W = 0.9513, p-value = 0.6555 > > How do I interpret this? I understand, from poking around the internet, > that the higher the W statistic the "more normal" the data. > > What is the null hypothesis - that the data is normally distributed? > > What does the p-value tell me? 65.55% chance of what - getting > W-statistic greater than or equal to 0.9513 (I picked this up from the > Dalgaard book, Introductory Statistics with R, but its not really > sinking in with respect to how it applies to a Shipiro Wilk test).? > > The method description - retrieved using ?shapiro.test() - is a bit > light on details. > > Thanks much.The null hypothesis: the data is normally distributed. If p-value > \alpha (significance level) it means that there is no evidence to reject null hypothesis. Otherwise you reject - your data is not normally distributed.