Anna Mill
2011-Jun-13 20:33 UTC
[R] glm with binomial errors - problem with overdispersion
Dear all, I am new to R and my question may be trivial to you... I am doing a GLM with binomial errors to compare proportions of species in different categories of seed sizes (4 categories) between 2 sites. In the model summary the residual deviance is much higher than the degree of freedom (Residual deviance: 153.74 on 4 degrees of freedom) and even after correcting for overdispersion by using a quasibinomial error structure instead of binomial the residual deviance does not change. Is this a data problem and I cannot use this statistic or is it because I do something wrong with R (see models attached)? Thanks a lot for your help! Anna first model with binomial error structure:> success<-c(14,43,44,1,13,28,56,8) > failure<-c(88,59,58,101,92,77,49,97) > "fragment"<-c(1,1,1,1,2,2,2,2) > "type"<-c(1,2,3,4,1,2,3,4) > y<-cbind(success,failure) > model<-glm(y~fragment*type,binomial) > summary(model)Call: glm(formula = y ~ fragment * type, family = binomial) Deviance Residuals: 1 2 3 4 5 6 7 8 -4.0175 3.3716 4.5052 -6.0071 -2.8063 0.5449 6.0414 -5.0184 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.04433 0.61072 0.073 0.9421 fragment -0.65477 0.39001 -1.679 0.0932 . type -0.46664 0.23027 -2.027 0.0427 * fragment:type 0.26636 0.14455 1.843 0.0654 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 157.96 on 7 degrees of freedom Residual deviance: 153.74 on 4 degrees of freedom AIC: 196.31 Number of Fisher Scoring iterations: 5 second model with quasibinomial error structure:> summary(model2)Call: glm(formula = y ~ fragment * type, family = quasibinomial) Deviance Residuals: 1 2 3 4 5 6 7 8 -4.0175 3.3716 4.5052 -6.0071 -2.8063 0.5449 6.0414 -5.0184 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.04433 3.63550 0.012 0.991 fragment -0.65477 2.32169 -0.282 0.792 type -0.46664 1.37073 -0.340 0.751 fragment:type 0.26636 0.86048 0.310 0.772 (Dispersion parameter for quasibinomial family taken to be 35.43628) Null deviance: 157.96 on 7 degrees of freedom Residual deviance: 153.74 on 4 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 5 [[alternative HTML version deleted]]
Prof Brian Ripley
2011-Jun-14 06:13 UTC
[R] glm with binomial errors - problem with overdispersion
I presume you intended 'type' and 'fragment' to be factors (see below). Such a model would fit exactly. The additive model> model <- glm(y ~ fragment+type, binomial)is only modestly over-dispersed, and shows that 'fragment' has zero effect. Not 'a negligible effect', but no effect. So something really odd is going on: is this an exercise with artificial data? Otherwise you need to explain the exact balance between the two 'fragments' (each fragment has exactly 1/4 success) and your assumption of independent binomial sampling cannot be true. Using a quasibinomial model does not change the deviance (see e.g. McCullagh and Nelder for the definitions, including of 'scaled deviance')), but it does change the standard errors. On Mon, 13 Jun 2011, Anna Mill wrote:> Dear all, > > I am new to R and my question may be trivial to you... > I am doing a GLM with binomial errors to compare proportions of species in > different categories of seed sizes (4 categories) between 2 sites.You have types and fragments but no species and no sites. At least 'sites' should be a factor, as should 'categories of seed sizes'.> In the model summary the residual deviance is much higher than the degree > of freedom (Residual deviance: 153.74 on 4 degrees of freedom) and even > after correcting for overdispersion by using a quasibinomial error structure > instead of binomial the residual deviance does not change. Is this a data > problem and I cannot use this statistic or is it because I do something > wrong with R (see models attached)? > > Thanks a lot for your help! > Anna > > > first model with binomial error structure: > >> success<-c(14,43,44,1,13,28,56,8) >> failure<-c(88,59,58,101,92,77,49,97) >> "fragment"<-c(1,1,1,1,2,2,2,2) >> "type"<-c(1,2,3,4,1,2,3,4) >> y<-cbind(success,failure) >> model<-glm(y~fragment*type,binomial) >> summary(model) > Call: > glm(formula = y ~ fragment * type, family = binomial) > > Deviance Residuals: > 1 2 3 4 5 6 7 8 > -4.0175 3.3716 4.5052 -6.0071 -2.8063 0.5449 6.0414 -5.0184 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 0.04433 0.61072 0.073 0.9421 > fragment -0.65477 0.39001 -1.679 0.0932 . > type -0.46664 0.23027 -2.027 0.0427 * > fragment:type 0.26636 0.14455 1.843 0.0654 . > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 157.96 on 7 degrees of freedom > Residual deviance: 153.74 on 4 degrees of freedom > AIC: 196.31 > > Number of Fisher Scoring iterations: 5 > > second model with quasibinomial error structure: >> summary(model2) > > Call: > glm(formula = y ~ fragment * type, family = quasibinomial) > > Deviance Residuals: > 1 2 3 4 5 6 7 8 > -4.0175 3.3716 4.5052 -6.0071 -2.8063 0.5449 6.0414 -5.0184 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 0.04433 3.63550 0.012 0.991 > fragment -0.65477 2.32169 -0.282 0.792 > type -0.46664 1.37073 -0.340 0.751 > fragment:type 0.26636 0.86048 0.310 0.772 > > (Dispersion parameter for quasibinomial family taken to be 35.43628) > > Null deviance: 157.96 on 7 degrees of freedom > Residual deviance: 153.74 on 4 degrees of freedom > AIC: NA > > Number of Fisher Scoring iterations: 5 > > [[alternative HTML version deleted]] > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595