Matthias Gondan
2011-Dec-30 17:51 UTC
[R] Fwd: Re: Poisson GLM using non-integer response/predictors?
Hi, Use offset variables if count occurrences of an event and you want to model the observation time. glm(count ~ predictors + offset(log(observation_time)), family=poisson) If you want to compare durations, look at library(survival), ?coxph If tnoise_sqrt is the square root of tourist noise, your example seems incorrect, because it is a predictor, not the dependent variable tnoise_sqrt ~ lengthfeeding_log Best wishes, Matthias Am 30.12.2011 16:29, schrieb Lucy Dablin:> Great lists, I always find them useful, thank you to > everyone who contributes to them. > > > My question is regarding non-integer values from some data I > collected on parrots when using the poisson GLM. I observed the parrots on a > daily basis to see if they were affected by tourist presence. My key predictors > are tourist noise (averaged over a day period so decimal value, square root to > adjust for skew), tourist number (the > number of tourists at a site, square root), and the number of boats passing the > site in a day (log). These are compared with predictors: total number of birds > (count data, square root), average time devoted to foraging at site (log), species > richness (sqrt), and the number of flushes per day. Apart from the last one > they are all non-integer values. When I run a glm for example: > > > parrots<- glm(tnoise_sqrt ~ lengthfeeding_log, family > poisson) > > summary(parrots) > > > There are warnings which are "27: In dpois(y, mu, log > TRUE) : non-integer x = 1.889822" I was advised to use the offset() function > however this does not seem to correct the problem and I find the code confusing. > What GLM approach should I be using for multiple non-integer predictors and > non-integer responses? Does my GLM approach seem appropriate? > Thank you for taking the time to consider this. > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]
Ben Bolker
2011-Dec-30 19:50 UTC
[R] Fwd: Re: Poisson GLM using non-integer response/predictors?
Matthias Gondan <matthias-gondan <at> gmx.de> writes:> > Hi, > > Use offset variables if count occurrences of an event and you want to > model the > observation time. > > glm(count ~ predictors + offset(log(observation_time)), family=poisson) > > If you want to compare durations, look at library(survival), ?coxph > > If tnoise_sqrt is the square root of tourist noise, your example seems > incorrect, because it is a predictor, not the dependent variable > > tnoise_sqrt ~ lengthfeeding_log > > Best wishes, > > Matthias > > Am 30.12.2011 16:29, schrieb Lucy Dablin: > > Great lists, I always find them useful, thank you to > > everyone who contributes to them. > >> > My question is regarding non-integer values from some data I > > collected on parrots when using the poisson GLM. I observed the > > parrots on a daily basis to see if they were affected by tourist > > presence. My key predictors are tourist noise (averaged over a day > > period so decimal value, square root to adjust for skew), tourist > > number (the number of tourists at a site, square root), and the > > number of boats passing the site in a day (log). These are > > compared with predictors: total number of birds (count data, > > square root), average time devoted to foraging at site (log), > > species richness (sqrt), and the number of flushes per day. Apart > > from the last one they are all non-integer values. When I run a > > glm for example:Your description sounds like you might already have transformed your predictors: generally speaking, you don't want to do that before running a GLM (the variance function incorporated in the GLM takes care of heteroscedasticity, and the link function takes care of nonlinearity in the response). I suspect you want total number of birds, number of flushes per day, and species richness to be modeled as Poisson (or negative binomial -- see ?glm.nb in the MASS package). Species richness *might* be binomial, or more complicated, if you are drawing from a limited species pool (e.g. if there are only 5 possible species and you sometimes see 4 or 5 of them in a day). Is the total number of birds really non-integer *before* you square-root transform it? Time devoted to foraging at the site is most easily modeled as log-normal (unless the response includes zeros: i.e., log-transform as you have already done and use lm), or possibly Gamma-distributed (although you may want to use a log link instead of the default inverse link). As Matthias said, offsets are used for the specific case of non-uniform sampling effort (e.g. if you sampled different areas, or for different lengths of time, every day). You may be interested in r-sig-ecology at r-project.org , which is an R mailing list specifically devoted to ecological questions.