I am trying to use Poisson regression to model count data with four explanatory variables: ratio, ordinal, nominal and dichotomous ? x1, x2, x3 and x4. After playing around with the input for a bit, I have formed ? what I believe is ? a series of badly fitting models probably due to overdispersion [1] - e.g. model=glm(y ~ x1 + x2,family=poisson(link=log),data=data1) - and I was looking for some general guidance/direction/help/approach to correcting this in R. [1] ? I believe this as a. it?s, as I?m sure you?re aware, a possible reason for poor model fits; b.the following: tapply(data1$y,data$x2,function(x)c(mean=mean(x),variance=var(x))) seems to suggest that, whilst variance does appear to be some function of the mean, there is a consistently large difference between the two -- View this message in context: http://r.789695.n4.nabble.com/Regression-Overdispersion-tp4702611.html Sent from the R help mailing list archive at Nabble.com.
There are two straightforward ways of modelling overdispersion: 1) Use glm as in your example but specify family=quasipoisson. 2) Use glm.nb in the MASS package, which fits a negative binomial model. On 1 February 2015 at 16:26, JvanDyne <e283851 at trbvm.com> wrote:> I am trying to use Poisson regression to model count data with four > explanatory variables: ratio, ordinal, nominal and dichotomous ? x1, x2, x3 > and x4. After playing around with the input for a bit, I have formed ? what > I believe is ? a series of badly fitting models probably due to > overdispersion [1] - e.g. model=glm(y ~ x1 + > x2,family=poisson(link=log),data=data1) - and I was looking for some general > guidance/direction/help/approach to correcting this in R. > > [1] ? I believe this as a. it?s, as I?m sure you?re aware, a possible reason > for poor model fits; b.the following: > > tapply(data1$y,data$x2,function(x)c(mean=mean(x),variance=var(x))) > > seems to suggest that, whilst variance does appear to be some function of > the mean, there is a consistently large difference between the two > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Regression-Overdispersion-tp4702611.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
A third, and often preferable, way is to add an observation-level random effect: library(lme4) data1$obs <- factor(seq_len(nrow(data1))) model <- glmer(y ~ x1 + x2 + (1 | obs), family=poisson(link=log), data=data1) See http://glmm.wikidot.com/faq and search for "individual-level random effects". Cheers, Rune On 1 February 2015 at 19:55, David Barron <dnbarron at gmail.com> wrote:> There are two straightforward ways of modelling overdispersion: > > 1) Use glm as in your example but specify family=quasipoisson. > 2) Use glm.nb in the MASS package, which fits a negative binomial model. > > > > On 1 February 2015 at 16:26, JvanDyne <e283851 at trbvm.com> wrote: >> I am trying to use Poisson regression to model count data with four >> explanatory variables: ratio, ordinal, nominal and dichotomous ? x1, x2, x3 >> and x4. After playing around with the input for a bit, I have formed ? what >> I believe is ? a series of badly fitting models probably due to >> overdispersion [1] - e.g. model=glm(y ~ x1 + >> x2,family=poisson(link=log),data=data1) - and I was looking for some general >> guidance/direction/help/approach to correcting this in R. >> >> [1] ? I believe this as a. it?s, as I?m sure you?re aware, a possible reason >> for poor model fits; b.the following: >> >> tapply(data1$y,data$x2,function(x)c(mean=mean(x),variance=var(x))) >> >> seems to suggest that, whilst variance does appear to be some function of >> the mean, there is a consistently large difference between the two >> >> >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Regression-Overdispersion-tp4702611.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Feb 1, 2015, at 8:26 AM, JvanDyne wrote:> I am trying to use Poisson regression to model count data with four > explanatory variables: ratio, ordinal, nominal and dichotomous ? x1, x2, x3 > and x4. After playing around with the input for a bit, I have formed ? what > I believe is ? a series of badly fitting models probably due to > overdispersion [1] - e.g. model=glm(y ~ x1 + > x2,family=poisson(link=log),data=data1) - and I was looking for some general > guidance/direction/help/approach to correcting this in R. > > [1] ? I believe this as a. it?s, as I?m sure you?re aware, a possible reason > for poor model fits; b.the following: > > tapply(data1$y,data$x2,function(x)c(mean=mean(x),variance=var(x))) > > seems to suggest that, whilst variance does appear to be some function of > the mean, there is a consistently large difference between the two >This is possibly an interesting question, but at the moment it is both off-topic on R and probably deserving of a book chapter as an answer. There are simply no specifics. One place where it would be on-topic and if tightened up with a specific example might prompt interesting and useful answers from a knowledgeable audience would be http://CrossValidated.com .> > Sent from the R help mailing list archive at Nabble.com.The Nabble "archive" of R-help is neither an archive of any sort since they arbitraily delte postings and not is most certainly not "the" Rhelp archive. Maybe if I unquote this four line message, then Nabble users will see it, although usually it get s trimmed: R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA