I have some questions regarding Zero Inflation Poisson models. I am using count data to analyze abundance trends of salamanders. However, I have surveys which differ in the amount of effort (i.e. the number of people searching and amount of time - I am using a museum database so not all surveys were conducted by me). Therefore I need to account for the effort. If change the count (response variable) then it will have decimals and not be usable in this model. So I decided to put this term into the independent variable. I am analyzing Historic vs. Current surveys. Here is an example of my code: require(pscl) model <- zeroinfl(Sallys~Survey:Person.Hours, dist="poisson", EM=TRUE) summary(model) I have received some very significant results on most of them and on some that I thought wouldn't be significant turned out to be. So I am concerned with the model being appropriate. I created a simulated database and ran a simple glm to see if y/b ~ x is the same as y~x:b and it is not (not surprisingly). Does anyone have suggestions for how to adjust my model to allow for these comparisons? I cannot use a glm with Poisson error because of overdispersion and a lot of zeroes. I thought about either rounding up my ratios or multiplying everything by 100 to eliminate the decimals but to keep the variation (I am not pleased with either of those options) On another note, I am having a little trouble interpreting the results (I think). Which this may not matter if I cannot use the ZIP model. Is the Count model coefficients (poisson with log link) the measure of if the sites differ and if so what do the estimates for both surveys indicate? Is that the mean for both surveys and it is testing them against zero? If so I want to test them against each other and I don't know exactly how to do that. Here is the output: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.97418 0.06570 30.048 <2e-16 *** SurveyCurrent:Person.Hours 0.04192 0.07597 0.552 0.581 SurveyHistoric:Person.Hours 0.40221 0.01540 26.110 <2e-16 *** As for the "Zero-inflation model coefficients( binomial with logit link). I read that this is a measure of 1) suitability or 2) if the predictor of excess zeros was significant. Which one of these (or is it something else) is correct and how do I interpret this? Here is a sample of a read out: Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -1.1625 0.9833 -1.182 0.237 SurveyCurrent:Person.Hours -1.1787 1.1304 -1.043 0.297 SurveyHistoric:Person.Hours -0.5050 0.3440 -1.468 0.142 <http://search.twitter.com/search?q=%0D%0A><http://www.google.com/search?q=%0D%0A><http://smarterfox.com/wikisearch/search?q=%0D%0A&locale=en-US><http://www.oneriot.com/search?p=smarterfox&ssrc=smarterfox_popup_bubble&spid=8493c8f1-0b5b-4116-99fd-f0bcb0a3b602&q=%0D%0A> Thanks for any suggestions/help!! -- Nicholas M Caruso Graduate Student CLFS-Biology 4219 Biology-Psychology Building University of Maryland, College Park, MD 20742-5815 phone: 301-405-6884 ------------------------------------------------------------------ I learned something of myself in the woods today, and walked out pleased for having made the acquaintance. [[alternative HTML version deleted]]

On Wed, 24 Feb 2010, Nicholas M. Caruso wrote:> I have some questions regarding Zero Inflation Poisson models. > > I am using count data to analyze abundance trends of salamanders. However, > I have surveys which differ in the amount of effort (i.e. the number of > people searching and amount of time - I am using a museum database so not > all surveys were conducted by me). Therefore I need to account for the > effort. If change the count (response variable) then it will have decimals > and not be usable in this model. So I decided to put this term into the > independent variable.The usual approach would be the following: If you think that some link function of y/n (response per effort) is linear in a set of covariates x with coefficients b, you would typically write log(y/n) = x'b which can be transformed to log(y) - log(n) = x'b log(y) = x'b + log(n) i.e., the log-effort would be an additional regressor with coefficient fixed to 1. This is called an offset so the R formula would be y ~ x + offset(log(n)) Alternatively, instead of relying on the fact the coefficient is exactly 1, you can estimate and test it, i.e. y ~ x + log(n)> I am analyzing Historic vs. Current surveys. > > Here is an example of my code: > require(pscl) > model <- zeroinfl(Sallys~Survey:Person.Hours, dist="poisson", EM=TRUE) > summary(model)I think I would allow different intercepts as well, i.e., zeroinfl(Sallys ~ Survey * log(Person.Hours))> I have received some very significant results on most of them and on some > that I thought wouldn't be significant turned out to be. So I am concerned > with the model being appropriate. I created a simulated database and ran a > simple glm to see if y/b ~ x is the same as y~x:b and it is not (not > surprisingly). Does anyone have suggestions for how to adjust my model to > allow for these comparisons? I cannot use a glm with Poisson error because > of overdispersion and a lot of zeroes. I thought about either rounding up > my ratios or multiplying everything by 100 to eliminate the decimals but to > keep the variation (I am not pleased with either of those options) > > On another note, I am having a little trouble interpreting the results (I > think). Which this may not matter if I cannot use the ZIP model. Is the > Count model coefficients (poisson with log link) the measure of if the sites > differ and if so what do the estimates for both surveys indicate? Is that > the mean for both surveys and it is testing them against zero? If so I want > to test them against each other and I don't know exactly how to do that. > Here is the output: > Estimate Std. Error z value > Pr(>|z|) > (Intercept) 1.97418 0.06570 30.048 <2e-16 > *** > SurveyCurrent:Person.Hours 0.04192 0.07597 0.552 0.581 > SurveyHistoric:Person.Hours 0.40221 0.01540 26.110 <2e-16 ***It forces the intercept to be the same, both for the current and the historic sites which is not so intuitive. The two slopes mean, that for the historic sites, the counts increased clearly with effort, but for the current sites it increased only slightly (not significantly).> As for the "Zero-inflation model coefficients( binomial with logit link). I > read that this is a measure of 1) suitability or 2) if the predictor of > excess zeros was significant. Which one of these (or is it something else) > is correct and how do I interpret this? > > Here is a sample of a read out: > > Zero-inflation model coefficients (binomial with logit link): > Estimate Std. Error z value > Pr(>|z|) > (Intercept) -1.1625 0.9833 -1.182 > 0.237 > SurveyCurrent:Person.Hours -1.1787 1.1304 -1.043 0.297 > SurveyHistoric:Person.Hours -0.5050 0.3440 -1.468 0.142This reflects the probability of additional zeros which does not seem to depend on either site or effort. For an introduction to the zero-inflation model and its implementation in R see vignette("countreg", package = "pscl") Also, I would recommend to consider hurdle() models as well. They often give similar fits and are slightly easier to interpret (IMO). hth, Z> <http://search.twitter.com/search?q=%0D%0A><http://www.google.com/search?q=%0D%0A><http://smarterfox.com/wikisearch/search?q=%0D%0A&locale=en-US><http://www.oneriot.com/search?p=smarterfox&ssrc=smarterfox_popup_bubble&spid=8493c8f1-0b5b-4116-99fd-f0bcb0a3b602&q=%0D%0A> > > Thanks for any suggestions/help!! > > -- > Nicholas M Caruso > Graduate Student > CLFS-Biology > 4219 Biology-Psychology Building > University of Maryland, College Park, MD 20742-5815 > phone: 301-405-6884 > > > > ------------------------------------------------------------------ > I learned something of myself in the woods today, > and walked out pleased for having made the acquaintance. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >