Hello all: I have a question regarding the fitted.values returned from the zeroinfl() function. The values seem to be nearly identical to those fitted.values returned by the ordinary glm(). Why is this, shouldn't they be more "zero-inflated"? I construct a zero-inflated series of counts, called Y, like so: b= as.vector(c(1.5, -2)) g= as.vector(c(-3, 1)) x <- runif(100) # x is the covariate X <- cbind(1,x) p <- exp(X%*%g)/(1+exp(X%*%g)) m <- exp(X%*%b) # log-link for the mean process # of the Poisson Y <- rep(0, 100) u <- runif(100) for(i in 1:100) { if( u[i] < p[i] ) { Y[i] = 0 } else { Y[i] <- rpois(1, m[i]) } } # now let's compare the fitted.values from zeroinfl() # and from glm() z1 <- glm(Y ~ x, family=poisson) z2 <- zeroinfl(Y ~ x|x) #poisson is the default z1$fitted.values[1:20] #1.3254209 0.7458029 2.0300505 1.1292954 1.4512862 #0.6513798 1.8980126 0.6558228 1.5302057 #0.6993626 2.6875736 0.7586985 2.0622238 2.1009979 #1.4254607 1.8130159 3.6603137 2.1330030 #2.9409379 3.3203350 z2$fitted.values[1:20] #1.3587457 0.7254296 2.0730982 1.1497492 1.4902778 #0.6178648 1.9429778 0.6229478 1.5717923 #0.6726527 2.7010395 0.7400369 2.1045779 2.1424025 #1.4634459 1.8583877 3.5830697 2.1735319 #2.9354839 3.2800839 You can see that they are almost identical... and the fitted.values from zeroinfl don't seem to be zero-inflated at all! What is going on? Ultimately I want these fitted.values for a goodness of fit type of test to see if the zeroinfl model is needed or not for a given data series. With these fitted.values as they are, I am rejecting assumption of a zero-inflated model even when the data really are zero-inflated. many thanks, Sarah Thomas -- Sarah J. Thomas Research Assistant, Department of Statistics Rice University, Houston, TX

On Mon, 18 Feb 2008, Sarah J Thomas wrote:> Hello all: > > I have a question regarding the fitted.values returned from the > zeroinfl() function. The values seem to be nearly identical to those > fitted.values returned by the ordinary glm(). Why is this, shouldn't > they be more "zero-inflated"? > > I construct a zero-inflated series of counts, called Y, like so:To make this reproducible, I set the random seed to set.seed(123) in advance and then ran your source code b= as.vector(c(1.5, -2)) g= as.vector(c(-3, 1)) x <- runif(100) # x is the covariate X <- cbind(1,x) p <- exp(X%*%g)/(1+exp(X%*%g)) m <- exp(X%*%b) # log-link for the mean process # of the Poisson Y <- rep(0, 100) u <- runif(100) for(i in 1:100) { if( u[i] < p[i] ) { Y[i] = 0 } else { Y[i] <- rpois(1, m[i]) } } # now let's compare the fitted.values from zeroinfl() # and from glm() z1 <- glm(Y ~ x, family=poisson) z2 <- zeroinfl(Y ~ x|x) #poisson is the default [snip]> You can see that they are almost identical... and the fitted.values from > zeroinfl don't seem to be zero-inflated at all! What is going on?Well, let's see how zero inflated your observations are: R> sum(u < p) [1] 2 Wow, two (!) observations that have been zero-inflated. Let's see how much the probability for observing a zero would have been R> dpois(0, m[u < p]) [1] 0.3147816 0.1409670 which is not so low, in particular for the first one. Overall, you've got R> sum(Y < 1) [1] 23 zeros in that data set and the expected number of zeros in a Poisson GLM is R> sum(dpois(0, fitted(z1))) [1] 23.35615 So you have observed *less* zeros than expected by a Poisson GLM. Surely, this is not the kind of data that zero-inflated models have been developed for.> Ultimately I want these fitted.values for a goodness of fit type of test > to see if the zeroinfl model is needed or not for a given data series. > With these fitted.values as they are, I am rejecting assumption of a > zero-inflated model even when the data really are zero-inflated.Maybe you ought to think about useful data-generating processes first before designing tests or criticizing software... Z