Kilo Bismarck <tweedie-d <at> web.de> writes:
> i am running an anlysis on proportion data using binomial (quasibinomial
> family) error structure. My data comprises of two continuous vars, body
> size and range size, as well as of feeding guild, nest placement, nest
> type and foragig strata as factors. I hope to model with these variables
> the preference of primary forests (#successes) by certain bird species.
> My code therefore looks like:
>
> y<-cbind(n_forest,n_trials-n_forest)
> model<-glm(y~range+body+nstrata+ntype+forage+feed,
> family=quasibinomial(link=logit),data=dat)
>
> however plausible the approach may look, overdispersion is prevalent
> (dispersion estimated at 6.5). I read up on this and learned that in
> case of multiple factors, not all levels may yield good results with
> logistic regression (Crawley "The R Book"). I subsequently try to
> analyse each feeding guild seperately, but to no avail.overdispersion
> remains. Given the number of categorical variables in my study, is there
> a convenient way to handle the overdispersion? I was trying tree models
> to see the most influential variables but again, to no avail.
>
> BTW: It may well be that the data is just bad...
Sometimes overdispersion comes from a poorly fitting model,
sometimes it is "just there" (i.e. intrinsic or caused by a
non-measured predictor which you can't do anything about).
Examine your data and the fits of the model to your data for
outliers or obvious deviations from the model. If the fit generally
looks OK but there is just consistently more variation than expected
from the binomial distribution then you can probably proceed
with your inferences from the quasibinomial model. (Do make
sure that you are not overfitting, i.e. if you are going to
fit a model with 12 or so parameters [I'm guessing here: it depends
on the numbers of levels in your categorical predictors], you
really need at least 120 (preferably more) observations ...]