Eiko Fried
2012-Oct-14 16:00 UTC
[R] Poisson Regression: questions about tests of assumptions
I would like to test in R what regression fits my data best. My dependent variable is a count, and has a lot of zeros. And I would need some help to determine what model and family to use (poisson or quasipoisson, or zero-inflated poisson regression), and how to test the assumptions. 1) Poisson Regression: as far as I understand, the strong assumption is that dependent variable mean = variance. How do you test this? How close together do they have to be? Are unconditional or conditional mean and variance used for this? What do I do if this assumption does not hold? 2) I read that if variance is greater than mean we have overdispersion, and a potential way to deal with this is including more independent variables, or family=quasipoisson. Does this distribution have any other requirements or assumptions? What test do I use to see whether 1) or 2) fits better - simply anova(m1,m2)? 3) I also read that negative-binomial distribution can be used when overdispersion appears. How do I do this in R? What is the difference to quasipoisson? 4) Zero-inflated Poisson Regression: I read that using the vuong test checks what models fits better.> vuong (model.poisson, model.zero.poisson)Is that correct? 5) ats.ucla.edu has a section about zero-inflated Poisson Regressions, and test the zeroinflated model (a) against the standard poisson model (b):> m.a <- zeroinfl(count ~ child + camper | persons, data = zinb) > m.b <- glm(count ~ child + camper, family = poisson, data = zinb) > vuong(m.a, m.b)I don't understand what the "| persons" part of the first model does, and why you can compare these models if. I had expected the regression to be the same and just use a different family. Thank you T [[alternative HTML version deleted]]
Achim Zeileis
2012-Oct-14 16:13 UTC
[R] Poisson Regression: questions about tests of assumptions
On Sun, 14 Oct 2012, Eiko Fried wrote:> I would like to test in R what regression fits my data best. My dependent > variable is a count, and has a lot of zeros. > > And I would need some help to determine what model and family to use > (poisson or quasipoisson, or zero-inflated poisson regression), and how to > test the assumptions. > > 1) Poisson Regression: as far as I understand, the strong assumption is > that dependent variable mean = variance. How do you test this? How close > together do they have to be? Are unconditional or conditional mean and > variance used for this? What do I do if this assumption does not hold?There are various formal tests for this, e.g., dispersiontest() in package "AER". Alternatively, you can use a simple likelihood-ratio test (e.g., by means of lrtest() in "lmtest") between a poisson and negative binomial (NB) fit. The p-value can even be halved because the Poisson is on the border of the NB theta parameter range (theta = infty). However, overdispersion can already matter before this is detected by a significance test. Hence, if in doubt, I would simply use an NB model and you're on the safe side. And if the NB's estimated theta parameter turns out to be extremely large (say beyond 20 or 30), then you can still switch back to Poisson if you want.> 2) I read that if variance is greater than mean we have overdispersion, > and a potential way to deal with this is including more independent > variables, or family=quasipoisson. Does this distribution have any other > requirements or assumptions? What test do I use to see whether 1) or 2) > fits better - simply anova(m1,m2)?quasipoisson yields the same parameter estimates as the poisson, only the inference is adjusted appropriately.> 3) I also read that negative-binomial distribution can be used when > overdispersion appears. How do I do this in R?glm.nb() in "MASS" is one of standard options.> What is the difference to quasipoisson?The NB is a likelihood-based model while the quasipoisson is not associated with a likelihood (but has the same conditional mean equation).> 4) Zero-inflated Poisson Regression: I read that using the vuong test > checks what models fits better. >> vuong (model.poisson, model.zero.poisson) > Is that correct?It's one of the possibilities.> 5) ats.ucla.edu has a section about zero-inflated Poisson Regressions, and > test the zeroinflated model (a) against the standard poisson model (b): >> m.a <- zeroinfl(count ~ child + camper | persons, data = zinb) >> m.b <- glm(count ~ child + camper, family = poisson, data = zinb) >> vuong(m.a, m.b) > I don't understand what the "| persons" part of the first model does, and > why you can compare these models if. I had expected the regression to be > the same and just use a different family.I recommend you read the associated documentation. See vignette("countreg", package = "pscl") For glm.nb() I recommend its accompanying documentation, namely the MASS book. hth, Z
Wensui Liu
2012-Oct-14 22:38 UTC
[R] Poisson Regression: questions about tests of assumptions
just a side note for your 4th question. for a small sample, clarke test instead of vuong test might be more appropriate and the calculation is so simple that even excel can handle it :-) On Sun, Oct 14, 2012 at 12:00 PM, Eiko Fried <torvon at gmail.com> wrote:> I would like to test in R what regression fits my data best. My dependent > variable is a count, and has a lot of zeros. > > And I would need some help to determine what model and family to use > (poisson or quasipoisson, or zero-inflated poisson regression), and how to > test the assumptions. > > 1) Poisson Regression: as far as I understand, the strong assumption is > that dependent variable mean = variance. How do you test this? How close > together do they have to be? Are unconditional or conditional mean and > variance used for this? What do I do if this assumption does not hold? > > 2) I read that if variance is greater than mean we have overdispersion, and > a potential way to deal with this is including more independent variables, > or family=quasipoisson. Does this distribution have any other requirements > or assumptions? What test do I use to see whether 1) or 2) fits better - > simply anova(m1,m2)? > > 3) I also read that negative-binomial distribution can be used when > overdispersion appears. How do I do this in R? What is the difference to > quasipoisson? > > 4) Zero-inflated Poisson Regression: I read that using the vuong test > checks what models fits better. >> vuong (model.poisson, model.zero.poisson) > Is that correct? > > 5) ats.ucla.edu has a section about zero-inflated Poisson Regressions, and > test the zeroinflated model (a) against the standard poisson model (b): >> m.a <- zeroinfl(count ~ child + camper | persons, data = zinb) >> m.b <- glm(count ~ child + camper, family = poisson, data = zinb) >> vuong(m.a, m.b) > I don't understand what the "| persons" part of the first model does, and > why you can compare these models if. I had expected the regression to be > the same and just use a different family. > > Thank you > T > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- =============================WenSui Liu Credit Risk Manager, 53 Bancorp wensui.liu at 53.com 513-295-4370