Louise Mair
2012-Feb-20 10:57 UTC
[R] chisq.test vs manual calculation - why are different results produced?
Hello, I am trying to fit gamma, negative exponential and inverse power functions to a dataset, and then test whether the fit of each curve is good. To do this I have been advised to calculate predicted values for bins of data (I have grouped a continuous range of distances into 1km bins), and then apply a chi-squared test. Example:> data <- data.frame(distance=c(1,2,3,4,5,6,7), observed=c(43,13,10,6,2,1),predicted=c(28, 18, 10, 5 ,3, 1, 1))> chisq.test(data$observed, data$predicted)Which gives: Pearson's Chi-squared test data: data$observed and data$predicted X-squared = 35, df = 25, p-value = 0.0882 Warning message: In chisq.test(data$observed, data$predicted) : Chi-squared approximation may be incorrect I understand this is due to having observed/predicted values of less than five, however I am interested to know firstly why R uses such a large number of degrees of freedom (when by my understanding there should only be 4 df), and secondly whether using the following manual calculation is therefore inappropriate -> X2 <- sum(((data$observed - data$predicted)^2)/data$predicted) > 1-pchisq(X2,4)[1] 0.04114223 If chi-squared is unsuitable, what other test can I use to determine whether my observed and predicted data come from the same distribution? The frequently recommended fisher's test doesn't seem to be any more appropriate as it requires values of greater than 5 for contingency tables larger than 2 x 2. Thanks for your help. Louise [[alternative HTML version deleted]]
David Winsemius
2012-Feb-20 14:24 UTC
[R] chisq.test vs manual calculation - why are different results produced?
On Feb 20, 2012, at 5:57 AM, Louise Mair wrote:> Hello, > > I am trying to fit gamma, negative exponential and inverse power > functions > to a dataset, and then test whether the fit of each curve is good. > To do > this I have been advised to calculate predicted values for bins of > data (I > have grouped a continuous range of distances into 1km bins), and > then apply > a chi-squared test. Example: > >> data <- data.frame(distance=c(1,2,3,4,5,6,7), >> observed=c(43,13,10,6,2,1), > predicted=c(28, 18, 10, 5 ,3, 1, 1))There's an error with that code.> >> chisq.test(data$observed, data$predicted) > > Which gives: > > Pearson's Chi-squared test > > data: data$observed and data$predicted > X-squared = 35, df = 25, p-value = 0.0882 > > Warning message: > In chisq.test(data$observed, data$predicted) : > Chi-squared approximation may be incorrect > > I understand this is due to having observed/predicted values of less > than > five, however I am interested to know firstly why R uses such a large > number of degrees of freedom (when by my understanding there should > only be > 4 df), and secondly whether using the following manual calculation is > therefore inappropriate -Read the help page Details section .... end of second paragraph. You probably wanted: chisq.test(cbind(data$observed, data$predicted))> >> X2 <- sum(((data$observed - data$predicted)^2)/data$predicted) >> 1-pchisq(X2,4) > [1] 0.04114223 > > If chi-squared is unsuitable, what other test can I use to determine > whether my observed and predicted data come from the same > distribution? The > frequently recommended fisher's test doesn't seem to be any more > appropriate as it requires values of greater than 5 for contingency > tables > larger than 2 x 2. > > Thanks for your help. > > Louise > > [[alternative HTML version deleted]]Plain text is requested as the mail format. David Winsemius, MD West Hartford, CT