Gavin Rudge
2013-Jul-05 15:06 UTC
[R] Obtaining predicted values from a zero-inflated poisson regression model
I've got a data set of about 5,000 observations with a zero-inflated count
response variable, two predictor variables and a variable which is effectively
an area of opportunity, which I want to use as an offset term (all continuous).
I want to explore the association between these variables, in particular I'm
interested in the association of one particular variable net of the others and
it would be useful when presenting the results of the analysis to hold my other
variables constant at say their means or medians, and show what my model
predicts would happen to my RV as it varied.
In reality there are a bunch of methods issues with this that I don't want
to explore in this post, (like would a hurdle model be better, and as it is a
model with spatial variables, should I use something optimised for this,
bootstrapping the CIs etc) as I want to get to an adequate level of
understanding of how this particular model behaves before I go any further with
this.
I have incorporated the code to synthesise some data that look a bit like mine,
I've then gone about making some predictions the best way I can given my
current level of understanding of the package I've used (pscl), with some
brief annotation.
#packages required
require(pscl)
require(emdbook)
#get some data
set.seed(33)
ds<data.frame(a=rzinbinom(500,mu=2,zprob=0.6,size=0.8),b=runif(500,3,70),c=runif(500,0.3,10),d=runif(500,0.03,0.8))
#run the model
mymodel<-zeroinfl(a~b+c,offset=d,data=ds,dist="negbin",EM=TRUE)
summary(mymodel)
#make a prediction data set, holding c and d at their means and varying b over
the range of original values
mypred<-data.frame(b=3:70,c=5.18,d=0.39)
#generate predicted values of response variable, add to the data frame
pred_rv<-predict.zeroinfl(mymodel,newdata=mypred,type="response",model="count",se=TRUE)
#assemble the prediction data.frame
predfinal<-data.frame(mypred,pred_rv)
colnames(predfinal)<-c("b","c","d","predicted","lower","upper","se")
predfinal
My questions are: What exactly does this give me? I am trying to get counts in
their original scale that my model predicts. Is what I have here the modelled
counts from just the count component of the model? If I change the model term to
"zero" from "count" I get the same result, why is this? Or
indeed have i simply completely gone the wrong way about getting these
estimates?
Also, I can't seem to manually replicate any of these predicted values using
the model results.
I've tried looking at the help page of the predict.zeroinfl and predict
commands, but still cannot get much further with this.
Any help is much appreciated.
Regards
GavinR
[[alternative HTML version deleted]]