Dennis Hansen
2006-Apr-23 11:53 UTC
[R] Comparing GLMMs and GLMs with quasi-binomial errors?
Dear All, I am analysing a dataset on levels of herbivory in seedlings in an experimental setup in a rainforest. I have seven classes/categories of seedling damage/herbivory that I want to analyse, modelling each separately. There are twenty maternal trees, with eight groups of seedlings around each. Each tree has a TreeID, which I use as the random effect (blocking factor). There are two fixed effects: DISTANCE - distance to maternal tree; two levels 'CLOSE' or 'AWAY' (four groups of seedlings each per tree), and PLATEAU - whether the maternal tree grows on the 'UPPER' plateau (bad soil) or 'LOWER' plateau (good soil). In each group of seedlings, we randomly selected one seedling where we scored herbivory. Levels of herbivory for each of the seven herbivory categories was scored as proportion of leaves attacked. Obviously, I don't want to use a more complicated model than necessary - but I equally obviously want to take the random effect 'TreeID' into account. Hence, for each herbivory category, I initially fitted a GLMM using the 'glmmPQL' command from the MASS library(after using the 'cbind()' command on the two columns with total number of leaves per seedling and number of leaves attacked by that herbivory category) - and then compared these models to GLMs without the random effect. ## model example1: leaf mines GLMM proportion.leafmines <- cbind(leaves.affected, total.leaves - leaves.affected) leafminesGLMM <- glmmPQL(proportion.leafmines ~ PLATEAU * DISTANCE, random=~1| TreeID, family=binomial(link=logit)) ##AIC(leafminesGLMM) = 474.773 ## model example2: leaf mines GLM leafminesGLM <- glm(proportion.leafmines ~ PLATEAU * DISTANCE, family=binomial(link=logit)) ##AIC(leafminesGLM) = 207.9465 ...and so on, for all seven herbivory categories. In four of the cases, the AIC is much lower (as in the example bove) for the GLMs than for the GLMMs - whereas in three other cases, clearly TreeID is an important random factor, as the AIC values of the GLMs are much higher than the ones for the GLMMs. There is not a big difference in significance levels - some marginally significant ones now become significant, while some significant ones now become marginal. However, there is one complication to simply using the AIC scores to evaluate which model is the best; for almost all the cases where the GLM has the lower AIC, the data are overdispersed, and I need to fit the model with a quasibinomial, rather than with a binomial error structure. BUT - using a GLM with quasibinomial error structure, I of course no longer get an AIC score... -so, my main question is: can I simply use the GLM with quasibinomial error structure instead of the GLMM if the GLM with binomial error structure has a lower AIC score than the GLMM? Any input on how I can compare such models would be greatly appreciated! Dennis ----------------------------------------------------------- Dennis Marinus Hansen Institute of Environmental Sciences University of Zurich Winterthurerstrasse 190 8057 Zurich Switzerland tel: +41 (0) 44635 6122 fax: +41 (0) 44635 5711 www.uwinst.unizh.ch
Prof Brian Ripley
2006-Apr-24 05:52 UTC
[R] Comparing GLMMs and GLMs with quasi-binomial errors?
AIC is only valid for maximum likelihood fitting, so not for PQL and also not for quasi-binomial. In particular, who said AIC() was valid when applied to a glmmPQL fit? (Certainly not the book it supports!) You say TreeID is a `blocking factor', but you have a treatment (PLATEAU) that is confounded with your blocks. That complicates things, but with a maximum of four blocks in each level of plateau, you have very little information on the difference in soil type if you believe TreeID has appeciable variability. You can choose between the glm and glmmPQL fits by testing if the TreeID variance component is non-zero, although the distribution theory is only very approximate. A much simpler analysis would be to analyse the two soil types separately with TreeID as a classic blocking factor (a fixed effect). That would tell you the effect of DISTANCE within each level of PLATEAU and provide enough info to do a t-test to compare the estimated effects. As for the effect of PLATEAU, just use the mean proportions for each tree and a two-sample t-test. On Sun, 23 Apr 2006, Dennis Hansen wrote:> Dear All, > > I am analysing a dataset on levels of herbivory in seedlings in an > experimental setup in a rainforest. > I have seven classes/categories of seedling damage/herbivory that I want to > analyse, modelling each separately. > > There are twenty maternal trees, with eight groups of seedlings around each. > Each tree has a TreeID, which I use as the random effect (blocking factor). > > There are two fixed effects: DISTANCE - distance to maternal tree; two > levels 'CLOSE' or 'AWAY' (four groups of seedlings each per tree), and > PLATEAU - whether the maternal tree grows on the 'UPPER' plateau (bad soil) > or 'LOWER' plateau (good soil). > > In each group of seedlings, we randomly selected one seedling where we > scored herbivory. Levels of herbivory for each of the seven herbivory > categories was scored as proportion of leaves attacked. Obviously, I don't > want to use a more complicated model than necessary - but I equally > obviously want to take the random effect 'TreeID' into account. > Hence, for each herbivory category, I initially fitted a GLMM using the > 'glmmPQL' command from the MASS library(after using the 'cbind()' command on > the two columns with total number of leaves per seedling and number of > leaves attacked by that herbivory category) - and then compared these models > to GLMs without the random effect. > > ## model example1: leaf mines GLMM > proportion.leafmines <- cbind(leaves.affected, total.leaves - > leaves.affected) > leafminesGLMM <- glmmPQL(proportion.leafmines ~ PLATEAU * DISTANCE, > random=~1| TreeID, family=binomial(link=logit)) > ##AIC(leafminesGLMM) = 474.773 > > ## model example2: leaf mines GLM > leafminesGLM <- glm(proportion.leafmines ~ PLATEAU * DISTANCE, > family=binomial(link=logit)) > ##AIC(leafminesGLM) = 207.9465 > > ...and so on, for all seven herbivory categories. In four of the cases, the > AIC is much lower (as in the example bove) for the GLMs than for the GLMMs - > whereas in three other cases, clearly TreeID is an important random factor, > as the AIC values of the GLMs are much higher than the ones for the GLMMs. > There is not a big difference in significance levels - some marginally > significant ones now become significant, while some significant ones now > become marginal. > However, there is one complication to simply using the AIC scores to > evaluate which model is the best; for almost all the cases where the GLM has > the lower AIC, the data are overdispersed, and I need to fit the model with > a quasibinomial, rather than with a binomial error structure. BUT - using a > GLM with quasibinomial error structure, I of course no longer get an AIC > score... > > -so, my main question is: can I simply use the GLM with quasibinomial error > structure instead of the GLMM if the GLM with binomial error structure has a > lower AIC score than the GLMM? > > Any input on how I can compare such models would be greatly appreciated! > > Dennis > > ----------------------------------------------------------- > Dennis Marinus Hansen > Institute of Environmental Sciences > University of Zurich > Winterthurerstrasse 190 > 8057 Zurich > Switzerland > tel: +41 (0) 44635 6122 > fax: +41 (0) 44635 5711 > www.uwinst.unizh.ch > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595