Dear R-Users, How can I use chisq.test() as a goodness of fit test? Reading man-page I?ve some doubts that kind of test is available with this statement. Am I wrong? X2=sum((O-E)^2)/E) O=empirical frequencies E=expected freq. calculated with the model (such as normal distribution) See: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm for X2 used as a goodness of fit test. Any help will be appreciated. Thank a lot. Bye. Vito ====Diventare costruttori di soluzioni Became solutions' constructors "The business of the statistician is to catalyze the scientific learning process." George E. P. Box Top 10 reasons to become a Statistician 1. Deviation is considered normal 2. We feel complete and sufficient 3. We are 'mean' lovers 4. Statisticians do it discretely and continuously 5. We are right 95% of the time 6. We can legally comment on someone's posterior distribution 7. We may not be normal, but we are transformable 8. We never have to say we are certain 9. We are honestly significantly different 10. No one wants our jobs Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/palese/
On 13-Jan-05 Vito Ricci wrote:> Dear R-Users, > > How can I use chisq.test() as a goodness of fit test? > Reading man-page I've some doubts that kind of test is > available with this statement. Am I wrong? > > > X2=sum((O-E)^2)/E) > > O=empirical frequencies > E=expected freq. calculated with the model (such as > normal distribution) > > See: > http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm > for X2 used as a goodness of fit test.It is not conspicuous in "?chisqu.test", though in fact it is the case, that chisq.test() could perform the sort of test you are looking for. No doubt this is a result of so much space devoted to the contingency table case. However, if you use it in the form chisq.test(x,p) where x is a vector of counts in "bins" and p is a vector, of the same length as x, of the probabilities that a random observation will fall in the various bins, then it is that sort of test. So, for example, if you dissect the range of X into k intervals [,X1], (X1,X2], ... , (X[k-2],X[k-1]], (X[k-1],], let N1, N2, ... , Nk be the numbers of observations in these intervals, let x = c(N1,...,Nk) p = c(pnorm(X1), pnorm(c(X2,...,X[k-1])-pnorm(c(X1,...,X[k-2]), 1-pnorm(X[k-1]) ) then chisq.test(x,p) will test the goodness of fit of the normal distribution. (Note that the above is schematic pseudo-R code, not real R code!) However, this use of chisq.test(x,p) is limited (as far as I can see) to the case where no parameters have been estimated in choosing the distribution from which p is calculated, and so will be based on the wrong number of degrees of freedom if the distribution is estimated from the data. I cannot see any provision for specifying either the degrees of freedom, or the number of parameters estimated for p, in the documentation for chisq.test(). So in the latter case you are better off doing it directly. This is not more difficult, since the hard work is in calculating the elements of p. After that, with E=N*p, X2 <- sum(((O-E)^2)/E) has the chi-squared distribution with df=(k-r) d.f. where k is the number of "bins" and r is the number of parameters that have been estimated. So get 1-pchisq(X2,df). Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 [NB: New number!] Date: 13-Jan-05 Time: 18:30:58 ------------------------------ XFMail ------------------------------
On Thu, 13 Jan 2005 18:23:37 +0100 (CET) Vito Ricci wrote:> Dear R-Users, > > How can I use chisq.test() as a goodness of fit test? > Reading man-page I?ve some doubts that kind of test is > available with this statement. Am I wrong? > > X2=sum((O-E)^2)/E) > > O=empirical frequencies > E=expected freq.You can do chisq.test(O, p = E/sum(E)) but note that this assumes that the expected frequencies/probabilities are known (and not estimated).> calculated with the model (such as normal distribution)"Normal distribution" is not a fully specified model! If you estimate the parameters by ML, the inference will typically not be valid. Another approach would be to estimate the parameters by grouped ML or minimum Chi-squared instead. See also ?pearson.test from package nortest and the references therein. For discrete distributions, this Chi-squared statistic is more natural (though not always without problems): see ?goodfit in package vcd. Z> See: > http://www.itl.nist.gov/div898/handbook/eda/section3/eda35f.htm > for X2 used as a goodness of fit test. > > Any help will be appreciated. > Thank a lot. Bye. > Vito > > > ====> Diventare costruttori di soluzioni > Became solutions' constructors > > "The business of the statistician is to catalyze > the scientific learning process." > George E. P. Box > > Top 10 reasons to become a Statistician > > 1. Deviation is considered normal > 2. We feel complete and sufficient > 3. We are 'mean' lovers > 4. Statisticians do it discretely and continuously > 5. We are right 95% of the time > 6. We can legally comment on someone's posterior distribution > 7. We may not be normal, but we are transformable > 8. We never have to say we are certain > 9. We are honestly significantly different > 10. No one wants our jobs > > > Visitate il portale http://www.modugno.it/ > e in particolare la sezione su Palese > http://www.modugno.it/archivio/palese/ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >