Gavin Rudge
2013-Jul-05 15:06 UTC
[R] Obtaining predicted values from a zero-inflated poisson regression model
I've got a data set of about 5,000 observations with a zero-inflated count response variable, two predictor variables and a variable which is effectively an area of opportunity, which I want to use as an offset term (all continuous). I want to explore the association between these variables, in particular I'm interested in the association of one particular variable net of the others and it would be useful when presenting the results of the analysis to hold my other variables constant at say their means or medians, and show what my model predicts would happen to my RV as it varied. In reality there are a bunch of methods issues with this that I don't want to explore in this post, (like would a hurdle model be better, and as it is a model with spatial variables, should I use something optimised for this, bootstrapping the CIs etc) as I want to get to an adequate level of understanding of how this particular model behaves before I go any further with this. I have incorporated the code to synthesise some data that look a bit like mine, I've then gone about making some predictions the best way I can given my current level of understanding of the package I've used (pscl), with some brief annotation. #packages required require(pscl) require(emdbook) #get some data set.seed(33) ds<data.frame(a=rzinbinom(500,mu=2,zprob=0.6,size=0.8),b=runif(500,3,70),c=runif(500,0.3,10),d=runif(500,0.03,0.8)) #run the model mymodel<-zeroinfl(a~b+c,offset=d,data=ds,dist="negbin",EM=TRUE) summary(mymodel) #make a prediction data set, holding c and d at their means and varying b over the range of original values mypred<-data.frame(b=3:70,c=5.18,d=0.39) #generate predicted values of response variable, add to the data frame pred_rv<-predict.zeroinfl(mymodel,newdata=mypred,type="response",model="count",se=TRUE) #assemble the prediction data.frame predfinal<-data.frame(mypred,pred_rv) colnames(predfinal)<-c("b","c","d","predicted","lower","upper","se") predfinal My questions are: What exactly does this give me? I am trying to get counts in their original scale that my model predicts. Is what I have here the modelled counts from just the count component of the model? If I change the model term to "zero" from "count" I get the same result, why is this? Or indeed have i simply completely gone the wrong way about getting these estimates? Also, I can't seem to manually replicate any of these predicted values using the model results. I've tried looking at the help page of the predict.zeroinfl and predict commands, but still cannot get much further with this. Any help is much appreciated. Regards GavinR [[alternative HTML version deleted]]