Dear all, A question related to the following has been asked on R-help before, but I could not find any answer to it. Input will be much appreciated. I got an unexpected sign of the "slope" parameter associated with a covariate (diam) using zeroinfl(). It led me to compare the estimates given by zeroinfl() and hurdle(): The (significant) negative estimate here is surprising, given the biology of the species: > summary(zeroinfl(bnl ~ 1| diam, dist = "poisson", data = valdaekar, EM = TRUE)) Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 3.74604 0.02635 142.2 <2e-16 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) 21.7510 7.6525 2.842 0.00448 ** diam -1.1437 0.3941 -2.902 0.00371 ** Number of iterations in BFGS optimization: 1 Log-likelihood: -582.8 on 3 Df The hurdle model gives the same estimates, but with opposite (and expected) signs of the parameters: summary(hurdle(bnl ~ 1| diam, dist = "poisson", data = valdaekar)) Count model coefficients (truncated poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 3.74604 0.02635 142.2 <2e-16 *** Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -21.7510 7.6525 -2.842 0.00448 ** diam 1.1437 0.3941 2.902 0.00371 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 8 Log-likelihood: -582.8 on 3 Df Why is this so? thanks, Tord Windows NT, R 2.8.1, pcsl 1.03
Dear all, A question related to the following has been asked on R-help before, but I could not find any answer to it. Input will be much appreciated. I got an unexpected sign of the "slope" parameter associated with a covariate (diam) using zeroinfl(). It led me to compare the estimates given by zeroinfl() and hurdle(): The (significant) negative estimate here is surprising, given the biology of the species: > summary(zeroinfl(bnl ~ 1| diam, dist = "poisson", data = valdaekar, EM = TRUE)) Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 3.74604 0.02635 142.2 <2e-16 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) 21.7510 7.6525 2.842 0.00448 ** diam -1.1437 0.3941 -2.902 0.00371 ** Number of iterations in BFGS optimization: 1 Log-likelihood: -582.8 on 3 Df The hurdle model gives the same estimates, but with opposite (and expected) signs of the parameters: summary(hurdle(bnl ~ 1| diam, dist = "poisson", data = valdaekar)) Count model coefficients (truncated poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 3.74604 0.02635 142.2 <2e-16 *** Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -21.7510 7.6525 -2.842 0.00448 ** diam 1.1437 0.3941 2.902 0.00371 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 8 Log-likelihood: -582.8 on 3 Df Why is this so? thanks, Tord Windows NT, R 2.8.1, pcsl 1.03 -- Tord Sn?ll Department of Ecology / Swedish Species Information Centre Swedish University of Agricultural Sciences (SLU) P.O. 7044, SE-750 07 Uppsala, Sweden Office/Mobile/Fax +46-18-672612/+46-76-7662612/+46-18-673537 www.ekol.slu.se/staff_tordsnall www.artdata.slu.se/personal/fototsn.asp
Tord: The logistic zero-inflation portion of the zeroinfl() implementation of ZIP or ZINB predict the probability of 0 rather than the probability of 1 (>0 counts) so the signs of the coefficients are often reversed from how you would expect them to be if you had just performed a logistic regression. I'm guessing that the hurdle model as a two-stage model is using a logistic regression predicting the probability of 1, hence the reversed signs of the estimates in the logistic regression portion of the model. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: brian_cade@usgs.gov tel: 970 226-9326 From: Tord Snäll <tord.snall@ekol.slu.se> To: r-help@r-project.org Date: 10/23/2009 07:40 AM Subject: [R] opposite estimates from zeroinfl() and hurdle() Sent by: r-help-bounces@r-project.org Dear all, A question related to the following has been asked on R-help before, but I could not find any answer to it. Input will be much appreciated. I got an unexpected sign of the "slope" parameter associated with a covariate (diam) using zeroinfl(). It led me to compare the estimates given by zeroinfl() and hurdle(): The (significant) negative estimate here is surprising, given the biology of the species: > summary(zeroinfl(bnl ~ 1| diam, dist = "poisson", data = valdaekar, EM = TRUE)) Count model coefficients (poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 3.74604 0.02635 142.2 <2e-16 *** Zero-inflation model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) 21.7510 7.6525 2.842 0.00448 ** diam -1.1437 0.3941 -2.902 0.00371 ** Number of iterations in BFGS optimization: 1 Log-likelihood: -582.8 on 3 Df The hurdle model gives the same estimates, but with opposite (and expected) signs of the parameters: summary(hurdle(bnl ~ 1| diam, dist = "poisson", data = valdaekar)) Count model coefficients (truncated poisson with log link): Estimate Std. Error z value Pr(>|z|) (Intercept) 3.74604 0.02635 142.2 <2e-16 *** Zero hurdle model coefficients (binomial with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) -21.7510 7.6525 -2.842 0.00448 ** diam 1.1437 0.3941 2.902 0.00371 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Number of iterations in BFGS optimization: 8 Log-likelihood: -582.8 on 3 Df Why is this so? thanks, Tord Windows NT, R 2.8.1, pcsl 1.03 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Tord Sn?ll-4 wrote:> > Dear all, > A question related to the following has been asked on R-help before, but > I could not find any answer to it. Input will be much appreciated. > > I got an unexpected sign of the "slope" parameter associated with a > covariate (diam) using zeroinfl(). It led me to compare the estimates > given by zeroinfl() and hurdle(): > [snip] >The right thing to do in this case is to poke through the code of hurdle() and zeroinfl(), but a simple (?) demonstration shows that hurdle() and zeroinfl() are indeed reporting opposite values : hurdle reports -log(p/(1-p)) = -qlogis(p), where p is the probability of a zero count: z = rpois(500,lambda=3) z = (z[z>0])[1:90] z = c(z,rep(0,10)) hurdle(z~1) ## -qlogis(0.1) ## zero coefficient always == -qlogis(0.1) zeroinfl reports log(p/(1-p)), where p is the zero-inflation: z = rpois(90,lambda=3) z = c(z,rep(0,10)) zeroinfl(z~1) ## qlogis(0.1) tmpf = function() { z = rpois(90,lambda=3) z = c(z,rep(0,10)) coef(zeroinfl(z~1))[2] } rr = replicate(1000,tmpf()) hist(rr,breaks=1000) summary(rr) qlogis(0.1) Perhaps it would be worth sending an e-mail to the package maintainers to request a note to this effect in the documentation, particularly if this a FAQ ... -- View this message in context: http://www.nabble.com/opposite-estimates-from-zeroinfl%28%29-and-hurdle%28%29-tp26024735p26029131.html Sent from the R help mailing list archive at Nabble.com.