thr3ads.net - R help - [R] Zero-inflated regression models: predicting no 0s [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Jean-Simon Michaud

2011-Jun-01 00:39 UTC

[R] Zero-inflated regression models: predicting no 0s

Hi all, 

 

First post for me here, but I have been reading on the forum for almost two
years now. Thanks to everyone who contributed btw!

 

I have a dataset of 4000 observations of count of a mammal and I am trying
to predict abundance from a inflated-zero model as there is quite a bit of
zeros in the response variable. 

I have tried multiple options, but I might do something wrong as every time
I look at the fitted values it do not comprise any 0. 

 

Here is what I tried so far: 

 

"

## - hurdle from the package (lpsc) - ##

 
> hurdle1 = hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +mydata_purge2$LC231 + mydata_purge2$DEM, data = food, dist = "negbin",
zero.dist = "binomial")
> summary(hurdle1)
 

Call:

hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food, 

    dist = "negbin", zero.dist = "binomial")

 

Pearson residuals:

    Min      1Q  Median      3Q     Max 

-1.0833 -0.7448 -0.2801  0.4296  6.7242 

 

Count model coefficients (truncated negbin with log link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)          1.7841678  0.0923781  19.314  < 2e-16 ***

mydata_purge2$LC80  -2.5929984  0.4184956  -6.196 5.79e-10 ***

mydata_purge2$LC231  0.2154269  0.1171259   1.839 0.065875 .  

mydata_purge2$DEM    0.0007708  0.0002064   3.735 0.000188 ***

Log(theta)           0.3742602  0.0390319   9.589  < 2e-16 ***

Zero hurdle model coefficients (binomial with logit link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)          0.0602347  0.2302370   0.262 0.793614    

mydata_purge2$LC80  -3.0590108  0.8360020  -3.659 0.000253 ***

mydata_purge2$LC231  1.7754441  0.3226731   5.502 3.75e-08 ***

mydata_purge2$DEM    0.0031943  0.0005307   6.020 1.75e-09 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

 

Theta: count = 1.4539

Number of iterations in BFGS optimization: 12 

Log-likelihood: -1.251e+04 on 9 Df

 

 

## - zeroinfl from the package (lpsc) - ##

 
> zip1A = zeroinfl(mydata_purge2$TOT ~ mydata_purge2$LC80 +mydata_purge2$LC231 + mydata_purge2$DEM, data = food)

 
> summary(zip1A)
 

Call:

zeroinfl(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
mydata_purge2$LC231 + mydata_purge2$DEM, data = food)

 

Pearson residuals:

    Min      1Q  Median      3Q     Max 

-2.2128 -1.2886 -0.5010  0.7594 11.8458 

 

Count model coefficients (poisson with log link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)          1.894e+00  3.547e-02  53.401  < 2e-16 ***

mydata_purge2$LC80  -2.249e+00  1.768e-01 -12.725  < 2e-16 ***

mydata_purge2$LC231  1.799e-01  4.492e-02   4.005 6.21e-05 ***

mydata_purge2$DEM    6.670e-04  7.687e-05   8.678  < 2e-16 ***

 

Zero-inflation model coefficients (binomial with logit link):

                      Estimate Std. Error z value Pr(>|z|)    

(Intercept)         -0.0593751  0.2308068  -0.257 0.796986    

mydata_purge2$LC80   2.9428092  0.8523669   3.453 0.000555 ***

mydata_purge2$LC231 -1.7772101  0.3233166  -5.497 3.87e-08 ***

mydata_purge2$DEM   -0.0031901  0.0005319  -5.997 2.01e-09 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1

 

Number of iterations in BFGS optimization: 13 

Log-likelihood: -1.727e+04 on 8 Df

 
> a1 = predict(zip1A)
> b1 = mydata_purge2$TOT
> plot(a1,b1)
 

"

 

Please find attached the plot of zip1A (which look quite similar to the
hurdle1).

 

Your help would be much appreciated, 

 

Thanks, 

 

JM.

Achim Zeileis

2011-Jun-01 07:22 UTC

head link

[R] Zero-inflated regression models: predicting no 0s

On Tue, 31 May 2011, Jean-Simon Michaud wrote:
> Hi all,
>
> First post for me here, but I have been reading on the forum for almost two
> years now. Thanks to everyone who contributed btw!
>
> I have a dataset of 4000 observations of count of a mammal and I am trying
> to predict abundance from a inflated-zero model as there is quite a bit of
> zeros in the response variable.
>
> I have tried multiple options, but I might do something wrong as every 
> time I look at the fitted values it do not comprise any 0.
Yes, the fitted() method and also the default of the predict() method is 
to compute fitted _means_. And the mean of a count distribution will 
always be non-zero.

Moreover, even when you round the fitted values to integers, this does 
_not_ lead to the most likely count. Consider the following simple 
examples:

Probability density for a Poisson distribution with mean 0.8

R> dpois(0:5, lambda = 0.8)
[1] 0.449329 0.359463 0.143785 0.038343 0.007669 0.001227

i.e., zero is still the most likely outcome with a probability of 45% even 
though the mean is 0.8. And for negative binomial distributions, this can 
be even more extreme. The probability density for a geometric distribution 
(negative binomial with size = 1) and mean 2:

R> dnbinom(0:5, mu = 2, size = 1)
[1] 0.33333 0.22222 0.14815 0.09877 0.06584 0.04390

i.e., despite the mean of 2, zero is still the most likely outcome.

You can get the predicted probabilities for all observations via
predict(hurdle1, type = "prob") and predict(zip1A, type =
"prob"),
respectively. Given the results for your negative binomial hurdle model, I 
suspect that the zero-inflated Poisson fit will have a nonsatisfactory fit 
for the zeros, even though the predicted means of the two models are 
similar.

See also the paper accompanying the count regression functions in
"pscl"
(not "lpsc"):

   http://www.jstatsoft.org/v27/i08/

Finally, a short comment on your model formula:

   hurdle(TOT ~ LC80 + LC231 + DEM, data = mydata_purge, ...)

will be easier to read and less confusing (after all "data = food"
does
not appear to be used at all).

hth,
Z
> Here is what I tried so far:
>
> "
> ## - hurdle from the package (lpsc) - ##
>
>> hurdle1 = hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
> mydata_purge2$LC231 + mydata_purge2$DEM, data = food, dist =
"negbin",
> zero.dist = "binomial")
>
>> summary(hurdle1)
>
> Call:
> hurdle(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
> mydata_purge2$LC231 + mydata_purge2$DEM, data = food,
>    dist = "negbin", zero.dist = "binomial")
>
> Pearson residuals:
>    Min      1Q  Median      3Q     Max
> -1.0833 -0.7448 -0.2801  0.4296  6.7242
>
> Count model coefficients (truncated negbin with log link):
>                      Estimate Std. Error z value Pr(>|z|)
> (Intercept)          1.7841678  0.0923781  19.314  < 2e-16 ***
> mydata_purge2$LC80  -2.5929984  0.4184956  -6.196 5.79e-10 ***
> mydata_purge2$LC231  0.2154269  0.1171259   1.839 0.065875 .
> mydata_purge2$DEM    0.0007708  0.0002064   3.735 0.000188 ***
> Log(theta)           0.3742602  0.0390319   9.589  < 2e-16 ***
>
> Zero hurdle model coefficients (binomial with logit link):
>                      Estimate Std. Error z value Pr(>|z|)
> (Intercept)          0.0602347  0.2302370   0.262 0.793614
> mydata_purge2$LC80  -3.0590108  0.8360020  -3.659 0.000253 ***
> mydata_purge2$LC231  1.7754441  0.3226731   5.502 3.75e-08 ***
> mydata_purge2$DEM    0.0031943  0.0005307   6.020 1.75e-09 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>
> Theta: count = 1.4539
> Number of iterations in BFGS optimization: 12
> Log-likelihood: -1.251e+04 on 9 Df
>
>
> ## - zeroinfl from the package (lpsc) - ##
>
>> zip1A = zeroinfl(mydata_purge2$TOT ~ mydata_purge2$LC80 +
> mydata_purge2$LC231 + mydata_purge2$DEM, data = food)
>
>> summary(zip1A)
>
> Call:
> zeroinfl(formula = mydata_purge2$TOT ~ mydata_purge2$LC80 +
> mydata_purge2$LC231 + mydata_purge2$DEM, data = food)
>
> Pearson residuals:
>    Min      1Q  Median      3Q     Max
> -2.2128 -1.2886 -0.5010  0.7594 11.8458
>
> Count model coefficients (poisson with log link):
>                      Estimate Std. Error z value Pr(>|z|)
> (Intercept)          1.894e+00  3.547e-02  53.401  < 2e-16 ***
> mydata_purge2$LC80  -2.249e+00  1.768e-01 -12.725  < 2e-16 ***
> mydata_purge2$LC231  1.799e-01  4.492e-02   4.005 6.21e-05 ***
> mydata_purge2$DEM    6.670e-04  7.687e-05   8.678  < 2e-16 ***
>
> Zero-inflation model coefficients (binomial with logit link):
>                      Estimate Std. Error z value Pr(>|z|)
> (Intercept)         -0.0593751  0.2308068  -0.257 0.796986
> mydata_purge2$LC80   2.9428092  0.8523669   3.453 0.000555 ***
> mydata_purge2$LC231 -1.7772101  0.3233166  -5.497 3.87e-08 ***
> mydata_purge2$DEM   -0.0031901  0.0005319  -5.997 2.01e-09 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>
> Number of iterations in BFGS optimization: 13
> Log-likelihood: -1.727e+04 on 8 Df
>
>> a1 = predict(zip1A)
>> b1 = mydata_purge2$TOT
>> plot(a1,b1)
>
> "
>
> Please find attached the plot of zip1A (which look quite similar to the
> hurdle1).
>
> Your help would be much appreciated,
>
> Thanks,
>
> JM.

geojs

2011-Jun-02 17:10 UTC

head link

[R] Zero-inflated regression models: predicting no 0s

Thanks for the quick reply, 

I understand that the predict(zip1A, type = "response") command is
computing
the fitted_means and these are different than the probabilities
predict(zip1A, type = "prob"). Although, according to Martin (2005),
the
highest probabilities do not simply lead to the true count estimates: "to
get the true estimate of relative mean abundance from the ZIP one must
multiply the estimated relative mean number of individuals at a site by the
probability that the relative mean number of individuals at a site is
generated through a Poisson distribution."

I initially thought that the predicted mean and the observed count could be
compared to estimate the fit of the model, but now I am not sure what to
think with Martin (2005) statement. 

Thank you for your help, 

JM

Martin, T.G. et al. (2005) Zero tolerance ecology: improving ecological
inference by modelling the source of zero observations, Ecology Letters,
Volume 8, Issue 11, pages 1235?1246.


--
View this message in context:
http://r.789695.n4.nabble.com/Zero-inflated-regression-models-predicting-no-0s-tp3564807p3568865.html
Sent from the R help mailing list archive at Nabble.com.

Michael Dewey

2011-Jun-03 12:47 UTC

head link

[R] Zero-inflated regression models: predicting no 0s

At 18:10 02/06/2011, geojs wrote:>Thanks for the quick reply,
>
>I understand that the predict(zip1A, type = "response") command is
computing
>the fitted_means and these are different than the probabilities
>predict(zip1A, type = "prob"). Although, according to Martin
(2005), the
>highest probabilities do not simply lead to the true count estimates:
"to
>get the true estimate of relative mean abundance from the ZIP one must
>multiply the estimated relative mean number of individuals at a site by the
>probability that the relative mean number of individuals at a site is
>generated through a Poisson distribution."
>
>I initially thought that the predicted mean and the observed count could be
>compared to estimate the fit of the model, but now I am not sure what to
>think with Martin (2005) statement.
When I started using zero-inflated models I found 
the vignette enormously helpful not only in 
understanding the models but also in teaching a 
few R programming ideas which I have used 
repeatedly since. I am sure Z is too modest to 
say this but it really will repay serious study.

>Thank you for your help,
>
>JM
>
>Martin, T.G. et al. (2005) Zero tolerance ecology: improving ecological
>inference by modelling the source of zero observations, Ecology Letters,
>Volume 8, Issue 11, pages 1235?1246.
>
>--
>View this message in context: 
>http://r.789695.n4.nabble.com/Zero-inflated-regression-models-predicting-no-0s-tp3564807p3568865.html
>Sent from the R help mailing list archive at Nabble.com.
Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html

Maybe Matching Threads

Search for more seemingly similar threads

R help - Jun 2011 - Zero-inflated regression models: predicting no 0s

[R] Zero-inflated regression models: predicting no 0s

[R] Zero-inflated regression models: predicting no 0s

[R] Zero-inflated regression models: predicting no 0s

[R] Zero-inflated regression models: predicting no 0s

Maybe Matching Threads