Nick Livingston
2014-Aug-29 03:57 UTC
[R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"
I have sought consultation online and in person, to no avail. I hope someone on here might have some insight. Any feedback would be most welcome. I am attempting to plot predicted values from a two-component hurdle model (logistic [suicide attempt yes/no] and negative binomial count [number of attempts thereafter]). To do so, I estimated each component separately using glm (MASS). While I am able to reproduce hurdle results for the logit portion in glm, estimates for the negative binomial count component are different. Call: hurdle(formula = Suicide. ~ Age + gender + Victimization * FamilySupport | Age + gender + Victimization * FamilySupport, dist = "negbin", link "logit") Pearson residuals: ? ? Min? ? ? 1Q? Median? ? ? 3Q? ???Max -0.9816 -0.5187 -0.4094? 0.2974? 5.8820 Count model coefficients (truncated negbin with log link): ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Estimate Std. Error z value Pr(>|z|)??? (Intercept)? ? ? ? ? ? ? ? ? ? ? ? ? -0.29150? ? 0.33127? -0.880???0.3789??? Age? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.17068? ? 0.07556???2.259???0.0239 * gender? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???0.28273? ? 0.31614???0.894???0.3712??? Victimization? ? ? ? ? ? ? ? ? ? ? ???1.08405? ? 0.18157???5.971 2.36e-09 *** FamilySupport? ? ? ? ? ? ? ? ? ? ? 0.33629? ? 0.29302???1.148???0.2511??? Victimization:FamilySupport -0.96831? ? 0.46841? -2.067???0.0387 * Log(theta)? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.12245? ? 0.54102???0.226???0.8209??? Zero hurdle model coefficients (binomial with logit link): ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???Estimate Std. Error z value Pr(>|z|)??? (Intercept)? ? ? ? ? ? ? ? ? ? ? ? ???-0.547051???0.215981? -2.533? 0.01131 * Age? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???-0.154493???0.063994? -2.414 0.01577 * gender? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???-0.030942???0.284868? -0.109? 0.91350? ??? Victimization? ? ? ? ? ? ? ? ? ? ? ? ? 1.073956???0.338015???3.177? 0.00149 ** FamilySupport? ? ? ? ? ? ? ? ? ? ???-0.380360???0.247530? -1.537? 0.12439??? Victimization\:FamilySupport? -0.813329???0.399905? -2.034? 0.04197 * --- Signif. codes:? 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Theta: count = 1.1303 Number of iterations in BFGS optimization: 23 Log-likelihood: -374.3 on 25 Df> summary(logistic)Call: glm(formula = SuicideBinary ~ Age + gender = Victimization * FamilySupport, family = "binomial") Deviance Residuals: ? ? Min? ? ???1Q???Median? ? ???3Q? ? ? Max -1.9948? -0.8470? -0.6686???1.1160???2.0805 Coefficients: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ???Estimate Std. Error z value Pr(>|z|)??? (Intercept)? ? ? ? ? ? ? ? ? ? ? ? ? -0.547051???0.215981? -2.533? 0.01131 * Age? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -0.154493???0.063994? -2.414? 0.01577 * gender? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -0.030942???0.284868? -0.109? 0.91350??? Victimization? ? ? ? ? ? ? ? ? ? ? ???1.073956???0.338014???3.177? 0.00149 ** FamilySupport? ? ? ? ? ? ? ? ? ? ? -0.380360???0.247530? -1.537? 0.12439??? Victimization:FamilySupport? -0.813329???0.399904? -2.034? 0.04197 * --- Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for binomial family taken to be 1) ? ? Null deviance: 452.54? on 359? degrees of freedom Residual deviance: 408.24? on 348? degrees of freedom ? (52 observations deleted due to missingness) AIC: 432.24 Number of Fisher Scoring iterations: 4> summary(Count1)Call: glm(formula = NegBinSuicide ~ Age + gender + Victimization * FamilySupport, family = negative.binomial(theta = 1.1303)) Deviance Residuals: ? ? Min? ? ???1Q???Median? ? ???3Q? ? ? Max -1.6393? -0.4504? -0.1679???0.2350???2.1676 Coefficients: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Estimate Std. Error t value Pr(>|t|)??? (Intercept)? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.60820? ? 0.13779???4.414 2.49e-05 *** Age? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.08836? ? 0.04189???2.109???0.0373 * gender? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 0.10983? ? 0.17873???0.615???0.5402??? Victimization? ? ? ? ? ? ? ? ? ? ? ? ? 0.73270? ? 0.10776???6.799 6.82e-10 *** FamilySupport? ? ? ? ? ? ? ? ? ? ? ? 0.10213? ? 0.15979???0.639???0.5241??? Victimization:FamilySupport???-0.60146? ? 0.24532? -2.452???0.0159 * --- Signif. codes:? 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 (Dispersion parameter for Negative Binomial(1.1303) family taken to be 0.4549082) ? ? Null deviance: 76.159? on 115? degrees of freedom Residual deviance: 35.101? on 104? degrees of freedom ? (296 observations deleted due to missingness) AIC: 480.6 Number of Fisher Scoring iterations: 15 Alternatively, if there is a simpler way to plot hurdle regression output, or if anyone is award of another means of estimating NB models (I haven't had much luck with vglm from VGAM either), I would be happy to hear about that as well. I'm currently using the "visreg" package for plotting. Thanks! ??? ???
peter dalgaard
2014-Aug-29 11:13 UTC
[R] Question regarding the discrepancy between count model parameter estimates between "pscl" and "MASS"
I'm no expert on hurdle models, but it seems that you are unaware that the negative binomial and the truncated negative binomial are quite different things. -pd On 29 Aug 2014, at 05:57 , Nick Livingston <nlivingston at ymail.com> wrote:> I have sought consultation online and in person, to no avail. I hope someone > on here might have some insight. Any feedback would be most welcome. > > I am attempting to plot predicted values from a two-component hurdle model > (logistic [suicide attempt yes/no] and negative binomial count [number of > attempts thereafter]). To do so, I estimated each component separately using > glm (MASS). While I am able to reproduce hurdle results for the logit > portion in glm, estimates for the negative binomial count component are > different. > > Call: > hurdle(formula = Suicide. ~ Age + gender + Victimization * FamilySupport | > Age + gender + Victimization * FamilySupport, dist = "negbin", link > "logit") > > Pearson residuals: > Min 1Q Median 3Q Max > -0.9816 -0.5187 -0.4094 0.2974 5.8820 > > Count model coefficients (truncated negbin with log link): > Estimate Std. Error z value > Pr(>|z|) > (Intercept) -0.29150 0.33127 -0.880 0.3789 > Age 0.17068 0.07556 2.259 0.0239 > * > gender 0.28273 0.31614 0.894 0.3712 > Victimization 1.08405 0.18157 5.971 2.36e-09 > *** > FamilySupport 0.33629 0.29302 1.148 0.2511 > Victimization:FamilySupport -0.96831 0.46841 -2.067 0.0387 * > Log(theta) 0.12245 0.54102 0.226 0.8209 > Zero hurdle model coefficients (binomial with logit link): > Estimate Std. Error z value > Pr(>|z|) > (Intercept) -0.547051 0.215981 -2.533 0.01131 > * > Age -0.154493 0.063994 -2.414 > 0.01577 * > gender -0.030942 0.284868 -0.109 0.91350 > Victimization 1.073956 0.338015 3.177 0.00149 > ** > FamilySupport -0.380360 0.247530 -1.537 0.12439 > Victimization\:FamilySupport -0.813329 0.399905 -2.034 0.04197 * > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Theta: count = 1.1303 > Number of iterations in BFGS optimization: 23 > Log-likelihood: -374.3 on 25 Df >> summary(logistic) > > > > > Call: > glm(formula = SuicideBinary ~ Age + gender = Victimization * FamilySupport, > family = "binomial") > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.9948 -0.8470 -0.6686 1.1160 2.0805 > > Coefficients: > Estimate Std. Error z value > Pr(>|z|) > (Intercept) -0.547051 0.215981 -2.533 0.01131 * > Age -0.154493 0.063994 -2.414 0.01577 > * > gender -0.030942 0.284868 -0.109 0.91350 > Victimization 1.073956 0.338014 3.177 0.00149 > ** > FamilySupport -0.380360 0.247530 -1.537 0.12439 > Victimization:FamilySupport -0.813329 0.399904 -2.034 0.04197 * > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 452.54 on 359 degrees of freedom > Residual deviance: 408.24 on 348 degrees of freedom > (52 observations deleted due to missingness) > AIC: 432.24 > > Number of Fisher Scoring iterations: 4 > >> summary(Count1) > > > > > > > Call: > glm(formula = NegBinSuicide ~ Age + gender + Victimization * FamilySupport, > family = negative.binomial(theta = 1.1303)) > > Deviance Residuals: > Min 1Q Median 3Q Max > -1.6393 -0.4504 -0.1679 0.2350 2.1676 > > Coefficients: > Estimate Std. Error t value > Pr(>|t|) > (Intercept) 0.60820 0.13779 4.414 2.49e-05 > *** > Age 0.08836 0.04189 2.109 0.0373 > * > gender 0.10983 0.17873 0.615 0.5402 > Victimization 0.73270 0.10776 6.799 6.82e-10 > *** > FamilySupport 0.10213 0.15979 0.639 0.5241 > Victimization:FamilySupport -0.60146 0.24532 -2.452 0.0159 * > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > (Dispersion parameter for Negative Binomial(1.1303) family taken to be > 0.4549082) > > Null deviance: 76.159 on 115 degrees of freedom > Residual deviance: 35.101 on 104 degrees of freedom > (296 observations deleted due to missingness) > AIC: 480.6 > > Number of Fisher Scoring iterations: 15 > > > Alternatively, if there is a simpler way to plot hurdle regression output, or if anyone is award of another means of estimating NB models (I haven't had much luck with vglm from VGAM either), I would be happy to hear about that as well. I'm currently using the "visreg" > package for plotting. > > Thanks! > > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com