Paul Johnson
2008-Apr-09 22:35 UTC
[R] Endogenous variables in ordinal logistic (or probit) regression
A student brought this question to me and I can't find any articles or examples that are directly on point. Suppose there are 2 ordinal logistic regression models, and one wants to set them into a simultaneous equation framework. Y1 might be a 4 category scale about how much the respondent likes the American Flag and Y2 might be how much the respondent likes the Republican Party in America. By the usual simultaneous equation argument, one should not simply run 2 polr polr (Y1 ~ Y2 + X1 +X2) and polr(Y2 ~ Y1 + X1 + X2) because Y1 and Y2 are endogenous. Where does the problem arise? Thinking back to the theoretical model, there are unmeasured scale variables y1* and y2* that are determined by y1* = b0 + b1 * y2 + b2 * X1 + b3 * X2 + e1 and y2* = c0 + c1 * y1 + c2 * X1 + c3 * X2 + e2 y1* and y2* are not observed, we see only the categorical outputs Y1 and Y2 that correspond to Y1 = 0 if y1* < pi1 Y1 = 1 if pi1 <= y1* < pi2 Y1 = 2 if pi2 <= y1* < pi3 Y1 = 3 if pi3 <= y1* and similarly for Y2. Since e1 is "going into" y1*, and y1* "goes into" y2*, then there is the good chance that the error term e1 is correlated with y2*. Running polr (Y1 ~ Y2 + X1 +X2) in isolation might give badly biased estimates. I have found a well developed literature that deals with the question when one of the Y's is dichotomous. Rivers, Douglas and Quang H. Vuong. 1988. Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models. Journal of Econometrics 39: 347-366 Alvarez, R. Michael and Garrett Glasgow. 1999. Two-Stage Estimation of Nonrecursive Choice Models. Political Analysis. 8: 11:24. I have not found anybody who has estimated one of these models with R, however, and was hoping to get an example from someone. I would also like to know if there is likely to be a problem extending the estimation framework to two multi-category dependent variables. In particular, I'm curious to know if one estimates a first stage model of Y1 as in polr(Y1 ~ X1 + X2 + Z1) to estimate predicted values of y1*, (y1*-hat, the linear predictor's estimated value, I believe), what would be the properties second stage parameter estimates of the regression that uses the instrumental variable polr(Y2 ~ y1*-hat + X1 + X2) As far as I can tell, this instrumental variables approach is the only realistic way to do this. I am aware of some articles that claim that a multi-category logistic regression will essentially boil down to a series of dichotomous logits, in the sense that the dependent variable can be thought of as a sequence "are you in group 0 or group 1" "are you in group 1 or group 2" and so forth. Cole, Stephen R, Paul D. Allison, and Cande V. Ananth. 2004. Estimation of Cumulative Odds Ratios. AEP 14(3): 172-178. (AEP Annals of Epidemiology) Following that approach, one could convert the data into the cumulative logistic format and then proceed with the methods proposed for binary dependent variables. I'm cautious about that approach because the results are not equivalent to maximum likelihood as would be obtained from polr, for example, and I don't quite see the strength of building on that approach. Thanks in advance, PJ -- Paul E. Johnson Professor, Political Science 1541 Lilac Lane, Room 504 University of Kansas