Dear List,
I have two questions about how to do predictions using lrm, specifically
how to predict the ordinal response for each observation *individually*.
I'm very new to cumulative odds models, so my apologies if my questions are
too basic.
I have a dataset with 4000 observations. Each observation consists of
an ordinal outcome y (i.e., rating of a stimulus with four possible ratings,
1 through 4), and the values of two predictor variables x1 and x2 associated
with each stimulus:
---------------------------------------
Obs# y x1 x2
---------------------------------------
1 3 2.35 -1.07
2 2 1.78 -0.66
3 4 5.19 -3.51
...
4000 1 0.63 -0.23
---------------------------------------
I get excellent fits using
fit1 <-lrm(y ~ x1+x2, data=my.dataframe1)
Now I want to see how well my model can predict y for a new set of 4000
observations. I need to predict y for each new observation *individually*.
I know an expression like
predicted1<-predict(fit1, newdata=my.dataframe2,
type=""fitted.ind")
can give *probability* of each of the 4 possible responses for each
observation. So my questions are
(1) How do I pick the likeliest y (i.e., likeliest of the 4 possible
ratings) for each given new observation?
(2) Are there good reference that explain the theory behind this type of
prediction to a beginner like me?
Thank you very much,
Jay Hegd?
Univeristy of Minnesota
--
View this message in context:
http://www.nabble.com/Predicting-ordinal-outcomes-using-lrm%7BDesign%7D-tp16704403p16704403.html
Sent from the R help mailing list archive at Nabble.com.
Frank E Harrell Jr
2008-Apr-15 22:49 UTC
[R] Predicting ordinal outcomes using lrm{Design}
jayhegde wrote:> Dear List, > I have two questions about how to do predictions using lrm, specifically > how to predict the ordinal response for each observation *individually*. > I'm very new to cumulative odds models, so my apologies if my questions are > too basic. > > I have a dataset with 4000 observations. Each observation consists of > an ordinal outcome y (i.e., rating of a stimulus with four possible ratings, > 1 through 4), and the values of two predictor variables x1 and x2 associated > with each stimulus: > > --------------------------------------- > Obs# y x1 x2 > --------------------------------------- > 1 3 2.35 -1.07 > 2 2 1.78 -0.66 > 3 4 5.19 -3.51 > ... > 4000 1 0.63 -0.23 > --------------------------------------- > > I get excellent fits using > > fit1 <-lrm(y ~ x1+x2, data=my.dataframe1) > > Now I want to see how well my model can predict y for a new set of 4000 > observations. I need to predict y for each new observation *individually*. > I know an expression like > > predicted1<-predict(fit1, newdata=my.dataframe2, type=""fitted.ind") > > can give *probability* of each of the 4 possible responses for each > observation. So my questions are > > (1) How do I pick the likeliest y (i.e., likeliest of the 4 possible > ratings) for each given new observation? > > (2) Are there good reference that explain the theory behind this type of > prediction to a beginner like me? > > Thank you very much, > Jay Hegd? > Univeristy of Minnesota > > > >You can easily pick the highest probability category after running predict(fit, newdataset, type='fitted.ind') but this will result in an improper scoring rule (i.e., an accuracy score that is optimized by the wrong model). I suggest instead computing the Somers Dxy rank correlation between predicted log odds (for any one intercept, it doesn't matter which one) and the observed ordinal category. Frank -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University