I answered the question on SO. In short the differences come from
truncated vs. untruncated models and conditional vs. unconditional
expectations. Feel free to follow-up on SO or here on the list...
On Fri, 16 Feb 2018, John Wilson wrote:
> Hello,
>
> I'm using pscl to run a hurdle model. Everything works great until I
get to
> the point of making predictions. All of my "count" predictions
are lower
> than my actual data, and lower than the "response" predictions,
similar to
> the issue described here (
> https://stat.ethz.ch/pipermail/r-help/2012-August/320426.html) and here (
>
https://stackoverflow.com/questions/48794622/hurdle-model-prediction-count-vs-response
> ).
>
> Since the issue is the same (and not resolved), I'll just use the
example
> from the second link:
>
> library("pscl")
> data("RecreationDemand", package = "AER")
>
> ## model
> m <- hurdle(trips ~ quality | ski, data = RecreationDemand, dist =
"negbin")
> nd <- data.frame(quality = 0:5, ski = "no")
> predict(m, newdata = nd, type = "count")
> predict(m, newdata = nd, type = "response")
>
> The presence/absence part of the model gives identical estimates to a
> logistic model run on the data. However, I thought that the negbin part of
> the hurdle should give identical estimates to a separate, glm.nb model of
> the positive data. But I get completely different values...
>
> library(MASS)
> m.nb <- glm.nb(trips ~ quality, data >
RecreationDemand[RecreationDemand$trips > 0,])
> predict(m, newdata = nd, type = "count") ## hurdle
> predict(m.nb, newdata = nd, type = "response") ## positive counts
only
>
> Any help would be appreciated.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>