thr3ads.net - R help - [R] goodfit() in vcd package: computation of chi-squared [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Boris.Vasiliev at forces.gc.ca

2009-Jul-23 14:26 UTC

[R] goodfit() in vcd package: computation of chi-squared

I have troubles understanding how goodfit() function in the vcd package
computes  the Pearson coefficient.  Can anybody provide more information
on the computation?

In particular, for HorseKicks data in vcd package, goodfit() yields
> oo <-
goodfit(HorseKicks,type="poisson",method="MinChisq")
> summary(oo)
         Goodness-of-fit test for poisson distribution

             X^2 df  P(> X^2)
Pearson 0.594649  3 0.8976563
Warning message:
In summary.goodfit(oo) : Chi-squared approximation may be incorrect

However, if I compute the Pearson coefficient by hand, I get a different
value for chi-squared
> with(oo, sum( (observed-fitted)^2/fitted ))[1] 0.694577

I am not sure where I am making a mistake.  Any suggestions?

Regards,
Boris.

Achim Zeileis

2009-Jul-24 07:37 UTC

head link

[R] goodfit() in vcd package: computation of chi-squared

On Thu, 23 Jul 2009, Boris.Vasiliev at forces.gc.ca wrote:
> I have troubles understanding how goodfit() function in the vcd package
> computes  the Pearson coefficient.
That's because it's conceptually hard to compute. The problem with the 
minimum Chi-squared approach is that for the asymptotics to work you need 
a fixed number of classes (or a number of classes that does not grow to 
fast). For count distributions, you always have an infinite number of 
expected classes so this does not really work. In goodfit() we try to work 
around this by collecting all probability weight in the last expected 
class. But this is only done in computation of the chi-squared statistic, 
not in the summary of fitted/expected for each observed count.

Note that this is still an ad hoc solution. If anyone had a pointer to 
some book/paper that addresses the problem of appropriate cut-offs in the 
goodness-of-fit testing of count distributions, I would be interested.

As we don't know a non-ad-hoc solution at the moment, we've just
recently
put in some more documentation and warnings for the MinChisq method. The 
version should be released soon.
> Can anybody provide more information on the computation?
>
> In particular, for HorseKicks data in vcd package, goodfit() yields
>
>> oo <-
goodfit(HorseKicks,type="poisson",method="MinChisq")
>> summary(oo)
>
>         Goodness-of-fit test for poisson distribution
>
>             X^2 df  P(> X^2)
> Pearson 0.594649  3 0.8976563
> Warning message:
> In summary.goodfit(oo) : Chi-squared approximation may be incorrect
>
> However, if I compute the Pearson coefficient by hand, I get a different
> value for chi-squared
>
>> with(oo, sum( (observed-fitted)^2/fitted ))
> [1] 0.694577
The larges observed count is 4 with observed = 1 and fitted = 0.64. But 
for computing the statistic we make the class "4" the class
"4+", so that

fitted <- sum(oo$observed) * diff(c(0, ppois(0:3, oo$par$lambda), 1))

which increases the fitted value for the last class to 0.73 (and leaves 
all others unchanged). And then

R> sum((oo$observed - fitted)^2/fitted)
[1] 0.594649

Best,
Z
> I am not sure where I am making a mistake.  Any suggestions?
>
> Regards,
> Boris.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jul 2009 - goodfit() in vcd package: computation of chi-squared

[R] goodfit() in vcd package: computation of chi-squared

[R] goodfit() in vcd package: computation of chi-squared

Possibly Parallel Threads