Felipe Carrillo
2008-Aug-10 20:55 UTC
[R] detect if data is normal or skewed (without a boxplot)
Hello all: Is there a way to detect in R if a dataset is normally distributed or skewed without graphically seeing it? The reason I want to be able to do this is because I have developed and application with Visual Basic where Word,Access and Excel "talk" to each other and I want to integrate R to this application to estimate confidence intervals on fish sizes (mm). I basically want to automate the process from Excel by detecting if my data has a normal distribution then use t.test, but if my data is skewed then use wilcox.test. Something like the pseudo code below: fishlength <- c(35,32,37,39,42,45,37,36,35,34,40,42,41,50) if fishlength= "normally distributed" then t.test(fishlength) else wilcox.text(fishlength) I hope this isn't very confussing Felipe D. Carrillo Supervisory Fishery Biologist Department of the Interior US Fish & Wildlife Service California, USA
Ben Bolker
2008-Aug-11 16:42 UTC
[R] detect if data is normal or skewed (without a boxplot)
Felipe Carrillo <mazatlanmexico <at> yahoo.com> writes:> > Hello all: > Is there a way to detect in R if a dataset is normally distributed or skewedwithout graphically seeing it?> The reason I want to be able to do this is because I have developed andapplication with Visual Basic where> Word,Access and Excel "talk" to each other and I want to integrate R to thisapplication to estimate> confidence intervals on fish sizes (mm). I basically want to automate theprocess from Excel by detecting> if my data has a normal distribution then use t.test, but if my data is skewedthen use wilcox.test.> Something like the pseudo code below: > > fishlength <- c(35,32,37,39,42,45,37,36,35,34,40,42,41,50) > if fishlength= "normally distributed" then > t.test(fishlength) > else > wilcox.text(fishlength) > > I hope this isn't very confussing > > Felipe D. Carrillo > Supervisory Fishery Biologist > Department of the Interior > US Fish & Wildlife Service > California, USAThere's a whole package (nortest) devoted to tests of normality, BUT: I would suggest that your procedure is not a good idea. It's often hard to detect non-normality, and "fail to reject" shouldn't mean "accept". If you're concerned about non-normality, you should probably just use the Wilcoxon test all the time (it has about 95% of the power of the t-test if the data are normal: http://en.wikipedia.org/wiki/Mann-Whitney_U ), or use robust statistics (e.g. rlm in the MASS package). Ben Bolker
Felipe Carrillo
2008-Aug-11 17:55 UTC
[R] detect if data is normal or skewed (without a boxplot)
Thanks Jim and Ben for your replies, Reading further about data normalization found shapiro.test. I understand that if the p-value is smaller than 0.05 then the data isn't normal, I just don't understand what the "W" means. Hi Felipe, Here's one way: library(nortest) if(sf.test(fishlength)$p.value>0.05) t.test(fishlength) else wilcox.test(fishlength) Jim> Felipe Carrillo <mazatlanmexico <at> yahoo.com> > writes: > > > > > Hello all: > > Is there a way to detect in R if a dataset is normally > distributed or skewed > without graphically seeing it? > > The reason I want to be able to do this is because I > have developed and > application with Visual Basic where > > Word,Access and Excel "talk" to each other > and I want to integrate R to this > application to estimate > > confidence intervals on fish sizes (mm). I basically > want to automate the > process from Excel by detecting > > if my data has a normal distribution then use t.test, > but if my data is skewed > then use wilcox.test. > > Something like the pseudo code below: > > > > fishlength <- > c(35,32,37,39,42,45,37,36,35,34,40,42,41,50) > > if fishlength= "normally distributed" > then > > t.test(fishlength) > > else > > wilcox.text(fishlength) > > > > I hope this isn't very confussing > > > > Felipe D. Carrillo > > Supervisory Fishery Biologist > > Department of the Interior > > US Fish & Wildlife Service > > California, USA > > > There's a whole package (nortest) devoted to tests of > normality, > BUT: I would suggest that your procedure is not a good > idea. > It's often hard to detect non-normality, and "fail > to reject" > shouldn't mean "accept". If you're > concerned about non-normality, > you should probably just use the Wilcoxon test all the time > (it has about 95% of the power of the t-test if the data > are > normal: http://en.wikipedia.org/wiki/Mann-Whitney_U ), or > use robust statistics (e.g. rlm in the MASS package). > > Ben Bolker