I have a student that I'm encouraging to use R rather than SAS or Stata and within just 2 weeks he has come up with a question that stumps me. What does a person do about endogeneity in generalized linear models? Suppose Y1 and Y2 are 5 category ordinal dependent variables. I see that MASS has polr for estimation of models like that, as long as they are independent. But what if the models were to be written: Y1.plr <- polr(Y1 ~ Y2 + X1 + X2) Y2.plr <- polr(Y2 ~ Y1 + X3 + X4) Are estimates of the coefficients for Y1 and Y2 biased, as they would be in a linear model? I think yes. Do I need some equivalent of 2SLS or FIML? It is not entirely clear to me if, in this example, the input Y1 or Y2 is conceptualized as the 5 point scale or rather if it is thought of as a continuous variable which is observed with error. Is there an email list besides r-help where I should be asking questions like this? I understand it is not strictly R related and would gladly go bother other people than you if you tell me where. -- Paul E. Johnson email: pauljohn at ukans.edu Dept. of Political Science http://lark.cc.ku.edu/~pauljohn University of Kansas Office: (785) 864-9086 Lawrence, Kansas 66045 FAX: (785) 864-5700 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Thursday, May 30, 2002, Paul Johnson wrote:> > I have a student that I'm encouraging to use R rather than SAS or Stata > and within just 2 weeks he has come up with a question that stumps me. > > What does a person do about endogeneity in generalized linear models? > > Suppose Y1 and Y2 are 5 category ordinal dependent variables. I see that > MASS has polr for estimation of models like that, as long as they are > independent. But what if the models were to be written: > > Y1.plr <- polr(Y1 ~ Y2 + X1 + X2) > > Y2.plr <- polr(Y2 ~ Y1 + X3 + X4) > > Are estimates of the coefficients for Y1 and Y2 biased, as they would be > in a linear model? I think yes. Do I need some equivalent of 2SLS or > FIML?yes and yes, I believe. I presume that you *really* have in mind that Y1 and Y2 are imperfect (ie, categorized) observations of underlying continuous variables (Z1 and Z2, say)? And that the equations whose coefficients you'd really like to estimate are (in your R style) lm(Z1 ~ Z2 + X1 + X2) lm(Z2 ~ Z1 + X1 + X2) -- in which case the likelihood, assuming bivariate normality of (Z1,Z2) given (X1,X2), involves bivariate normal integrals evaluated over rectangles with boundaries determined by category threshold parameters. I don't think this (ie, maximization of that likelihood) is programmed at present in R. From what you say, I infer that it's not in Stata or SAS either? A sensible first analysis might be simply to forget that Y1 and Y2 are multinomial, and fit the linear system using some suitable set(s) of numeric scores for the categories. Depending on the results, that might also be a sensible last analysis... Regards, David> It is not entirely clear to me if, in this example, the input Y1 or Y2 is > conceptualized as the 5 point scale or rather if it is thought of as a > continuous variable which is observed with error. > > Is there an email list besides r-help where I should be asking questions > like this? I understand it is not strictly R related and would gladly go > bother other people than you if you tell me where. > > -- Paul E. Johnson email: pauljohn at ukans.edu > Dept. of Political Science http://lark.cc.ku.edu/~pauljohn > University of Kansas Office: (785) 864-9086 > Lawrence, Kansas 66045 FAX: (785) 864-5700 > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > .-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R- > FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ > ._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Dear Paul and David, As far as I'm aware, the closest that you'll come to this model currently in R is the sem or tsls function in the sem package, both of which assume quantitative endogenous variables, however. Several specialized structural-equations programs (e.g., LISREL) will fit models of this form, with an approach very close to David's suggestion, based on polychoric and point-polyserial correlations. The calis procedure in SAS won't do this; I'm not sure about Stata. I've been thinking about adding this kind of capability to the sem package, but don't know if I'll do it. I hope that this helps, John At 06:02 PM 5/30/2002 +0100, David Firth wrote:>On Thursday, May 30, 2002, Paul Johnson wrote: > >> >>I have a student that I'm encouraging to use R rather than SAS or Stata >>and within just 2 weeks he has come up with a question that stumps me. >> >>What does a person do about endogeneity in generalized linear models? >> >>Suppose Y1 and Y2 are 5 category ordinal dependent variables. I see that >>MASS has polr for estimation of models like that, as long as they are >>independent. But what if the models were to be written: >> >> Y1.plr <- polr(Y1 ~ Y2 + X1 + X2) >> >> Y2.plr <- polr(Y2 ~ Y1 + X3 + X4) >> >>Are estimates of the coefficients for Y1 and Y2 biased, as they would be >>in a linear model? I think yes. Do I need some equivalent of 2SLS or FIML? > >yes and yes, I believe. I presume that you *really* have in mind that Y1 >and Y2 are imperfect (ie, categorized) observations of underlying >continuous variables (Z1 and Z2, say)? And that the equations whose >coefficients you'd really like to estimate are (in your R style) > > lm(Z1 ~ Z2 + X1 + X2) > lm(Z2 ~ Z1 + X1 + X2) > >-- in which case the likelihood, assuming bivariate normality of (Z1,Z2) >given (X1,X2), involves bivariate normal integrals evaluated over >rectangles with boundaries determined by category threshold parameters. > >I don't think this (ie, maximization of that likelihood) is programmed at >present in R. From what you say, I infer that it's not in Stata or SAS either? > >A sensible first analysis might be simply to forget that Y1 and Y2 are >multinomial, and fit the linear system using some suitable set(s) of >numeric scores for the categories. Depending on the results, that might >also be a sensible last analysis... > >Regards, >David > >>It is not entirely clear to me if, in this example, the input Y1 or Y2 is >>conceptualized as the 5 point scale or rather if it is thought of as a >>continuous variable which is observed with error. >> >>Is there an email list besides r-help where I should be asking questions >>like this? I understand it is not strictly R related and would gladly go >>bother other people than you if you tell me where.----------------------------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario, Canada L8S 4M4 email: jfox at mcmaster.ca phone: 905-525-9140x23604 web: www.socsci.mcmaster.ca/jfox ----------------------------------------------------- -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Maybe Matching Threads
- Frustration to get help R users group
- Systemfit with structural equations and cross equation parameter interaction
- summary(object, test=c("Roy", "Wilks", "Pillai", ....) AND ellipse(object, center=....)
- loops: pasting indexes in variables names
- How can I plot this graph