This request is related to the following post from last year: https://stat.ethz.ch/pipermail/r-help/2011-June/279752.html After reading the thread, the idea is still not clear. I have fitted a model using HURDLE from the PSCL package. I am trying to get marginal effects / slopes by multiplying the coefficients by the mean of the marginal effects (I think this is right). To my understanding, this will require a mean for the binary probability model and a mean for the truncated Poisson count model. My guess is that I would use mean(predict( MODELNAME, type = "XXX")) where MODELNAME is the hurdle model and XXX is either RESPONSE, COUNT, or ZERO. Assuming the above is right (correct me if it isn't), my questions are: 1. What XXX gives me the mean of the marginal effects for the binomial probability model? 2. What XXX gives me the mean of the marginal effects for the count model? Judging from my results, I would guess the answer to question 1 is COUNT, except max(predict(MODELNAME, type= "count")) returns 4.5 and I expected it to be less than 1. I would also have expected COUNT to match up with the truncated Poisson count model. What is the intuition here? Also, when I try XXX = PROB, I get the following error: Error in matrix(NA, nrow = length(mu), ncol = nUnique) : too many elements specified So maybe there are other problems.

On Tue, 17 Jan 2012, twarzin wrote:> This request is related to the following post from last year: > > https://stat.ethz.ch/pipermail/r-help/2011-June/279752.htmlNot very closely. The post linked above was asking about getting predicted means vs. predicted probabilities. It is not related to marginal effects.> After reading the thread, the idea is still not clear. I have fitted a > model using HURDLE from the PSCL package. I am trying to get marginal > effects / slopes by multiplying the coefficients by the mean of the > marginal effects (I think this is right).It is not clear to me that this definition is right. It also seems to be recursive. Also, I assume you mean marginal effects on the overall mean. (Alternatives would be: For the probability of crossing the hurdle, for the truncated count mean, for the untruncated count mean.)> To my understanding, this will require a mean for the binary probability > model and a mean for the truncated Poisson count model. My guess is that > I would use > > mean(predict( MODELNAME, type = "XXX")) > > where MODELNAME is the hurdle model and XXX is either RESPONSE, COUNT, > or ZERO. Assuming the above is right (correct me if it isn't), my > questions are: > > 1. What XXX gives me the mean of the marginal effects for the binomial probability model? > 2. What XXX gives me the mean of the marginal effects for the count model?Neither of the above gives you _marginal_ effects. "response" gives you the predicted mean for each observation (sometimes also called effects). "prob" gives you probabilities for each possible outcome 0, 1, 2, ... "count" gives you the predicted mean of the count component of the model and "zero" the ratio of probabilities for non-zero counts. The latter is probably named somewhat confusingly. It was chosen to be as close as possible to the corresponding quantity from zeroinfl(). See also the the package vignette for more details, especially in the appendix.> Judging from my results, I would guess the answer to question 1 is > COUNT, except max(predict(MODELNAME, type= "count")) returns 4.5 and I > expected it to be less than 1. I would also have expected COUNT to match > up with the truncated Poisson count model. What is the intuition here? > > Also, when I try XXX = PROB, I get the following error: > > Error in matrix(NA, nrow = length(mu), ncol = nUnique) : > too many elements specifiedMaybe you have a lot of observations with an extremely large maximal observed response? If so, try setting the argument 'at' to a suitable value, e.g., at = 0:10 or so. hth, Z> So maybe there are other problems. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >

Travis, thanks for the follow-up. As the reply got somewhat lengthier and includes a worked example, I include the R-help mailing list again. Maybe the reply is useful to someone else in the future.> Thanks for the reply. I think you cleared it up, but I would like to > follow up to be certain. I also may have mixed some terminology in my > original question. I have been following Greene's econometrics text, but > I also have your Applied Econometrics with R that I'll reference for > this question.Yes, good idea. Let's take the RecreationDemand data, that's simple. And before using confusing terminology, let's look at a practical application, that's often more illuminating. Let's use the hurdle model for the number of recreational boat trips: ## packages and data library("pscl") data("RecreationDemand", package = "AER") ## model m <- hurdle(trips ~ . | quality + income, data = RecreationDemand, dist = "negbin") As you say below, we get a coefficient for "quality" of 0.17 and 1.5 respectively. The first one means that the _odds_ (not the probabilities) of observing a non-zero number of trips increase strongly with quality: about 350% per unit of quality = exp(1.5) - 1. This is still not too intuitive due to the usage of odds, but we'll get to probabilities later. And the 0.17 mean that the expected number of trips in the count component increase by 18% = exp(0.17) - 1 per unity of quality. But we can also make this even easier to interpret by looking at the _effects_ (not the marginal effects) of quality for a "typical" individual with average income, average costs, etc. ## new data with quality = 0, ..., 5 and "typical" values ## for all other variables nd <- data.frame(quality = 0:5, ski = "no", userfee = "no", income = mean(RecreationDemand$income), costC = mean(RecreationDemand$costC), costS = mean(RecreationDemand$costS), costH = mean(RecreationDemand$costH)) ## predicted probability for non-zero trips nonzero <- 1 - predict(m, newdata = nd, type = "prob")[, 1] ## visualize plot(0:5, nonzero, type = "b") ## slopes diff(nonzero) So we see that if this "typical" individual had a quality rating of zero, the probability of taking any trips at all would only be 5%. While for the same "typical" individual with quality rating of 5, the probability of taking at least one trip would be 99%. Also we see that the influence is clearly non-linear and S-shaped. The slope varies between only 3.4 percentage points to up to 32 percentage points! Similarly, we can compute the predicted mean from the count component of the model (untruncated): ## predicted mean from (untruncated) count component countmean <- predict(m, newdata = nd, type = "count") plot(0:5, countmean, type = "b") This is also not linear, but not as strongly as for the binary part. Finally, we can also combine both model parts to the overall response: ## predicted overall mean (combining both model parts) allmean <- predict(m, newdata = nd, type = "response") plot(0:5, allmean, type = "b") which is still rather S-shaped. These curves are sometimes also referred to as "effects" and I find them rather easy to interpret. For each of these curves you can also look at the "marginal effects", i.e., the average slope. So, if we want to look at the marginal effect of "quality" in the binary hurdle part, we can use Equation 5.2 from our AER book or equivalently Equation 21-10 from Greene's book (5th ed). ## average regressor x <- c(1, mean(RecreationDemand$quality), mean(RecreationDemand$income)) ## coefficients b <- coef(m, model = "zero") ## linear predictor x'b xb <- sum(x * b) ## marginal effect dlogis(xb) * b["quality"] As the average quality is 1.42, it is not surprising that this slope is similar to the diff(nonzero)[2] (i.e., going from quality=1 to quality=2): an increase of about 32 percentage points. But we have seen above that the slope at quality=4 would be very different from this! With similar arguments you could compute average slopes for the count mean or the overall mean. I don't do it here because I find the _effects_ computed above much more useful than the _marginal effects_. However, as econometricians prefer numbers to graphics, you don't see many effects plots in econometric works. Also, Stata computes marginal effects for just about everything - no matter whether it is useful or not. For standard lm and glm as well as multinom and polr objects, John Fox's "effects" package can also produce very nice effect displays automatically. It does not work for hurdle objects, though.> On pages 140 - 141 of your text you run a hurdle model for recreational > trips. In the count model the coefficient on QUALITY is 0.1717, and in > the zero hurdle model the coefficient on quality is 1.5029. Let us > pretend that QUALITY is the only independent variable. It is my > understanding that neither 0.1717 or 1.5029 are marginal effects, i.e., > we cannot say a one percent increase in quality would cause a 0.1717 > increase in trips. Because the model is nonlinear, the marginal effects > will vary with the value of the covariates. Greene suggests evaluating > the expression at 1) the sample means of the data or 2) "evaluate the > marginal effects at every observation and use the sample average of the > individual marginal effects" (pg 668 in 5th edition).Yes, but this is only for binary regressions. There it is rather clear that you want to track changes in the mean i.e. in the probability. However for hurdle models, there are various quantities you could be interested in: probability for non-zero count, untruncated count mean, truncated count mean, overall mean, etc. That's why there is no single marginal effect of interest. Hope this clarifies some of the previous issues! Best regards, Achim