Campbell, Desmond
2007-Oct-11 12:11 UTC
[R] test for whether dataset comes from a known MVN
Dear all, I have a multivariate dataset containing 100,000 or more points. I want find the p-value for the dataset of points coming from a particular multivariate normal distribution With mean vector u Covariance matrix s2 So H0: points ~ MVN( u, s2) H1: points not ~ MVN( u, s2) How do I find the p-value in R? To me this is a likelihood ratio test problem. In H0 the parameters are fixed to u and s2, in H1 they are free. I would like to be able to do this in R but don't know how. Can you advise? Regards Desmond Campbell Room C0.29, SGDP Centre Institute of Psychiatry, De Crespigny Park, London SE5 8AF Tel: +44 (0) 20 7848 0236 Fax: +44 (0) 20 7848 0866 [[alternative HTML version deleted]]
Campbell, Desmond wrote:> > Dear all, > > > > I have a multivariate dataset containing 100,000 or more points. > > I want find the p-value for the dataset of points coming from a > particular multivariate normal distribution > > With > > mean vector u > > Covariance matrix s2 > > So > > H0: points ~ MVN( u, s2) > > H1: points not ~ MVN( u, s2) > > How do I find the p-value in R? > > >Googling for "Shapiro-Wilk multivariate" brings up mshapiro.test() in the mvnormtest package. However, I would strongly suspect that if your data are from the real world that you will reject the null hypothesis of multivariate normality when you have 100,000 points -- the power to detect tiny (unimportant?) deviations from MVN will be very high. cheers Ben Bolker -- View this message in context: http://www.nabble.com/test-for-whether-dataset-comes-from-a-known-MVN-tf4607009.html#a13155278 Sent from the R help mailing list archive at Nabble.com.
Desmond Campbell
2007-Oct-11 18:28 UTC
[R] test for whether dataset comes from a known MVN
Dear Ben Bolker, Thanks for replying and offering advice, unfortunately it doesn't solve my problem. 1) The mshapiro.test() in the mvnormtest package appears only applicable for datasets containing 3-5000 samples, whereas my dataset contains 100,000 samples. 2) As you said in your email if my data is from the real world then any test is likely to reject the null hypothesis, because of the power of such a large dataset. However my data is not from the real world. I am conducting validation studies, and if the program I am testing is working correctly then the dataset will be perfectly normally distributed. Thanks anyway. regards Desmond Campbell> Campbell, Desmond wrote: > > Dear all, > > Ihave a multivariate dataset containing 100,000 or more points.> I wantfind the p-value for the dataset of points coming from a> particularmultivariate normal distribution> With > mean vector u >Covariance matrix s2> So > H0: points ~ MVN( u, s2) > H1:points not ~ MVN( u, s2)> How do I find the p-value in R? >> Ben Bolker wrote:> > Googling for "Shapiro-Wilk multivariate" brings upmshapiro.test()> > in the mvnormtest package. However, I wouldstrongly suspect that> > if your data are from the real world that youwill reject the null> > hypothesis > > of multivariatenormality when you have 100,000 points -- the power> > to detect tiny(unimportant?) deviations from MVN will be very high.> > > > cheers > > Ben BolkerIt's about the oil, stupid! ("`-/")_.-'"``-._ . . `; -._ )-;-,_`) (v_,)' _ )`-.\ ``-' _.- _..-_/ / ((.' ((,.-' ((,/ ___________________________________________________________ Want ideas for reducing your carbon footprint? Visit Yahoo! For Good http://uk.promotions.yahoo.com/forgood/environment.html
Maybe Matching Threads
- ks.test one-sample - where can I get a list of the strings specifying the distribution?
- save p-value in mshapiro.test(mvnormtest)
- Error using function MVN in package MCLUST: Fortran symbol name not in DLL for package
- Conditional Distribution of MVN variates
- multinormality