Jonas Josefsson
2013-Nov-25  10:47 UTC
[R] Independent variable dependent on offset in GLMM
Hi! I’m running glmer (lme4) models with biodiversity data and I’m having trouble with understanding/finding information on how the offset() option is implemented. Explicitly, I’m wondering if the offset is only implemented on the dependent variable (as I think it is), or does it also affect independent variables in the model (was told this by a stat guy at our department)? My data is inventories of birds (species richness and abundance) at the scale of whole farms. Thus, each observation has a different inventory area which I am accounting for in the model as: offset = log(INVAREA). However, as a fixed effect in the model I’ve got the number of different crop types in the inventoried area. As this variable is also affected by inventoried area, I would like to account for this in some way, but I find it difficult to know the best way to do so. Right now, I have made a linear quadratic function (using lm) of crop number ~ inventoried area + inventoried area^2 to describe how crop number increases with increasing sample size (area). Then, I have subtracted fitted values from observed number of crops and used this measure in the models. Is this a reasonable work around? Thanks, Jonas -------------------------------------------------------------------- Jonas Josefsson (PhD student) Swedish University of Agricultural Sciences (SLU) Department of Ecology Box 7044 750 07 Uppsala Sweden Jonas.Josefsson@slu.se<mailto:Jonas.Josefsson@slu.se> 0046 (0)18 672420 0046 (0)703 752366 [[alternative HTML version deleted]]
Jonas Josefsson <jonas.josefsson <at> slu.se> writes:> > Hi!(I was initially going to say that this question would probably be better on r-sig-mixed-models at r-project.org, but now that I've been through it I've changed my mind -- there aren't really any issues here that are specific to mixed models ... it's really mostly a *statistical* question rather than an R question, and as such might belong on a statistics forum such as http://stats.stackexchange.com ...)> I'm running glmer (lme4) models with biodiversity data and I'm > having trouble with understanding/finding information on how the > offset() option is implemented.> Explicitly, I'm wondering if the offset is only implemented on the > dependent variable (as I think it is), or does it also affect > independent variables in the model (was told this by a stat guy at > our department)?I'm not perfectly sure I understand your question, but as I understand it you are right and the stat guy in your department is wrong (but perhaps you misunderstood them?). The offset term is added to the linear predictor of the model.> My data is inventories of birds (species richness and abundance) at > the scale of whole farms. Thus, each observation has a different > inventory area which I am accounting for in the model as: offset > log(INVAREA).It makes quite a bit of sense to model abundance as directly proportional to area (i.e., you are in effect modeling density rather than total counts, but accounting for changes in Poisson sampling variance appropriately). I'm not so sure it makes sense to model species richness as directly proportional to area. You might want to consider adding log(area) as a covariate rather than as an offset, which is then essentially assuming a power-law relationship between area and species richness (log(richness) = beta_a*log(area) -> richness = area^beta).> However, as a fixed effect in the model I've got the number of > different crop types in the inventoried area. As this variable is > also affected by inventoried area, I would like to account for this > in some way, but I find it difficult to know the best way to do so.> Right now, I have made a linear quadratic function (using lm) of > crop number ~ inventoried area + inventoried area^2 to describe how > crop number increases with increasing sample size (area). Then, I > have subtracted fitted values from observed number of crops and used > this measure in the models. Is this a reasonable work around?This doesn't make very much sense to me, but it will depend on your general model of what's going on. I would have guessed that abundance (for example) would depend on the number of crop types available, not on whether the number of crop types was higher than expected for a sample of a given area. I suppose it's possible, though.