Dear R users, I am trying to fully understand the difference between estimating overdispersion with glm.nb() from MASS compared to glm(..., family quasipoisson). It seems that (i) the coefficient estimates are different and also (ii) the summary() method for glm.nb suggests that overdispersion is taken to be one: "Dispersion parameter for Negative Binomial(0.9695) family taken to be 1", which is not what I expected. The code I used is pasted below: x <- rep(seq(0,23,by=1),50); s <- rep(seq(1,2,length=50*24),1); tmp <- cbind.data.frame(y=rnbinom(length(tmp1),mu=8*(sin(2*pi*x/24)+2),size 1),x=x,s=s); tmp.glm.qp <- glm(y~factor(x)-1,data = tmp, family=quasipoisson, offset=log(s)); tmp.glm.nb <- glm.nb(y~factor(x)-1 +offset(log(s)),data = tmp); On a more advanced topic, I was furthermore hoping to compare models with a global estimate of overdispersion with one that allows overdispersion to be estimated separately for each level of the factor x. Can I achieve that in glm or do I need to employ a mixed effects model ? Thanks! Markus [[alternative HTML version deleted]]
Markus Loecher <mao.loecher <at> gmail.com> writes:> > Dear R users, > I am trying to fully understand the difference between estimating > overdispersion with glm.nb() from MASS compared to glm(..., family > quasipoisson). > It seems that (i) the coefficient estimates are different and also (ii) the > summary() method for glm.nb suggests that overdispersion is taken to be one: > "Dispersion parameter for Negative Binomial(0.9695) family taken to be 1", > which is not what I expected.Quasi-Poisson and negative binomial models serve approximately the same purpose (i.e. account for overdispersion in count data), but they are both conceptually and technically different. QP simply assumes a variance-mean relationship (var=phi*mean), while NB assumes an explicit likelihood model -- in addition, NB has a different var-mean relationship (var=mu*(1+mu/k)). When R tells you "dispersion parameter taken to be 1", I believe it means that in the expression var=phi*mu*(1+mu/theta), phi is 1 -- that is, there isn't an *additional* dispersion factor incorporated. (Fitting NB with a *known* overdispersion parameter theta is within the standard GLM framework, but estimating theta is an extension, so the terminology doesn't always fit nicely.) As plot(coef(tmp.glm.qp),coef(tmp.glm.nb)) would show you, the coefficients are different but not very different -- this is not terribly surprising considering that the two fits are using different statistical models.> On a more advanced topic, I was furthermore hoping to compare models with a > global estimate of overdispersion with one that allows overdispersion to be > estimated separately for each level of the factor x. Can I achieve that in > glm or do I need to employ a mixed effects model ?You can fit completely separate models for the different levels of x, but it's hard to (for example) fit a model with the same mean-effects parameters but different overdispersion. I don't think mixed-effect models will necessarily help you here. I was going to suggest bbmle (blatant plug!), but I haven't implemented offsets yet. The negbinomial family in the VGAM package might help you here ...