John Fieberg
2003-May-28 21:26 UTC
[R] Ordinal data - Regression Trees & Proportional Odds
I have a data set w/ an ordinal response taking on one of 10 categories. I am considering using polr to fit a cumulative logits model. I previously fit the model in SAS (using proc logistic) which provides a test for the proportional odds assumption (p < 0.001 for the test). Are there simple diagnostic plots that can be used to look at the validity of this assumption and possibly help w/ modifying the model as appropriate? Any references or examples of useful R code for addressing the proportional odds assumption would be much appreciated! I also used a regression tree approach to explore this data set. In doing so, I treated the response as numeric, using the rpart library. I am rather new to regression trees - and wondered about the validity of this approach. I used cross-validation to prune the tree - but plots of the response clearly indicate that the data are non-normal and don't have equal variance (the data are highly skewed towards larger response categories - values of 8-10). I have seen some people suggest that the tree approach is essentially non-parametric - but then I have seen other references suggesting examination of residual plots and potential transformations of the response to ensure homogeneity of variance. For this data set, it will be difficult to find an appropriate transformation, given the large number of responses near 10 (i.e., the fact that the data are constrained to be less than or equal to 10 results in strange residual plots). Any help is much appreciated! John Fieberg, Ph.D. Wildlife Biometrician, Minnesota DNR 5463-C W. Broadway Forest Lake, MN 55434
> From: John Fieberg [mailto:John.Fieberg at dnr.state.mn.us] > > I have a data set w/ an ordinal response taking on one of 10 > categories. > I am considering using polr to fit a cumulative logits model. I > previously fit the model in SAS (using proc logistic) which provides a > test for the proportional odds assumption (p < 0.001 for the > test). Are > there simple diagnostic plots that can be used to look at the validity > of this assumption and possibly help w/ modifying the model as > appropriate? Any references or examples of useful R code for > addressing > the proportional odds assumption would be much appreciated! > > I also used a regression tree approach to explore this data set. In > doing so, I treated the response as numeric, using the rpart > library. I > am rather new to regression trees - and wondered about the validity of > this approach. I used cross-validation to prune the tree - > but plots of > the response clearly indicate that the data are non-normal and don't > have equal variance (the data are highly skewed towards > larger response > categories - values of 8-10). I have seen some people > suggest that the > tree approach is essentially non-parametric - but then I have > seen other > references suggesting examination of residual plots and potential > transformations of the response to ensure homogeneity of > variance. For > this data set, it will be difficult to find an appropriate > transformation, given the large number of responses near 10 (i.e., the > fact that the data are constrained to be less than or equal to 10 > results in strange residual plots).I can't say anything about logistic models, but would like to say a few things about trees. AFAIK there's no implementation (or description) of tree algorithm that handles ordinal response. We have discussed this with Prof. Breiman some time last year, and it is not straight forward at all (to us, at least). Regression trees are non-parametric models in the sense that the regression functions they estimate can have arbitrary form. However, the least squares (or even least absolute value) splitting criterion implicitly assume homoscedasticity. As a matter of fact, the CART book (Breiman, Friedman, Olshen & Stone, 1984) has discussion on the effect of heteroscedasticity on regression trees. HTH, Andy> Any help is much appreciated! > > John Fieberg, Ph.D. > Wildlife Biometrician, Minnesota DNR > 5463-C W. Broadway > Forest Lake, MN 55434 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Andreas Christmann
2003-Jun-02 11:26 UTC
[R] RE: Ordinal data - Regression Trees & Proportional Odds
>>> 1. RE: Ordinal data - Regression Trees & Proportional Odds(Liaw, Andy) > AFAIK there's no implementation (or description) of tree algorithm > that handles ordinal response. > Regression trees with an ordinal response variable can be computed with SPSS Answer Tree 3.0. Andreas Christmann -- ----------------------------------------------------------------------------- Andreas Christmann University of Dortmund Department of Statistics 44221 Dortmund Germany ----- Phone: +231 / 755 3180 Email: christmann at statistik.uni-dortmund.de WWW: http://www.statistik.uni-dortmund.de/de/content/einrichtungen/lehrstuehle/datenanalyse.html
Apparently Analagous Threads
- 2 D non-parametric density estimation
- Regression models w/ splines
- Ordinal data - Regression Trees & Proportional Odds
- Regression trees with an ordinal response variable
- Testing the proportional odds assumption of an ordinal generalized estimating equations (GEE) regression model