thr3ads.net - R help - [R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help [Oct 2010]

If this information is useful, please help other people find it:
Share via:

John Haart

2010-Oct-01 10:12 UTC

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

Dear list,

I am relatively new to ordinal models and have been working through the example
given by Frank Harrell in the predict.lrm {Design} help

All of this makes sense to me, except for the responses, i,e how do i interpret
them? i would be extremely grateful if someone could explain the results?

First i establish the date and model - 
> y <- factor(sample(1:3, 400, TRUE), 1:3,
c('good','better','best'))
> x1 <- runif(400)
> x2 <- runif(400)
> f <- lrm(y ~ rcs(x1,4)*x2, x=TRUE)     
Get 0.95 confidence limits for Prob[better or best

# How do i interpret this on the y scale i.e good,better,best?
> 
> L <- predict(f, se.fit=TRUE)           #omitted kint= so use 1st
intercept
> plogis(with(L, linear.predictors + 1.96*cbind(-se.fit,se.fit)))
>                 		 se.fit
> 1   0.6430994 0.8305201
> 2   0.5812662 0.7919122
> 3   0.5692593 0.7976906
> 4   0.5600308 0.7278637
> 5   0.6845250 0.8819143
> 6   0.5518848 0.7228657
> 7   0.5876031 0.7717215
> 8   0.6291766 0.8354423
> 9   0.5839353 0.8333790
> 10  0.5631326 0.8314051



 Get Prob(better) than all others - 

# Does this mean that for data point 1, y= best as it has the higher
probability?
> predict(f, type="fitted.ind")[1:10,]
      y=good  y=better    y=best
1  0.2517915 0.3469692 0.4012392
2  0.3031733 0.3554471 0.3413796
3  0.3046236 0.3555365 0.3398398
4  0.3514780 0.3546880 0.2938340
5  0.1989827 0.3251784 0.4758390
6  0.3581265 0.3540297 0.2878438
7  0.3130150 0.3559091 0.3310759
8  0.2541324 0.3476007 0.3982669
9  0.2740127 0.3519713 0.3740160
10 0.2839907 0.3535331 0.3624763

Establish data frame to use as newdata
> d <- data.frame(x1=c(.1,.5),x2=c(.5,.15))
Predict newdata - Prob(Y>=j) for new observation
> predict(f, d, type="fitted")
# Does this mean that for data point 1, y= better as it has the higher
probability?

  y>=better   y>=best
1 0.6800290 0.3239935
2 0.5846743 0.2409657

# Prob(Y=j)

# Again - Does this mean that for data point 1, y= better as it has the higher
probability?

predict(f, d, type="fitted.ind")    
     
y=good  y=better    y=best
1 0.3199710 0.3560355 0.3239935
2 0.4153257 0.3437086 0.2409657

predict mean(y) using codes 1,2,3

# How do i interpret this on the y scale i.e good,better,best?
>  predict(f, d, type='mean', codes=TRUE)
   1        2 
2.004022 1.825640 

Thanks for any advice it is greatly appreciated

John

	[[alternative HTML version deleted]]

Frank Harrell

2010-Oct-01 11:14 UTC

head link

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

John,

Don't conclude that one category is the most probable when its probability
of being equaled or exceeded is a maximum.  The first category would always
be the winner if that were the case.

When you say y=best remember that you are dealing with a probability model. 
Nothing is forcing you to classify an observation, and unless the category's
probability is high, this may be dangerous.  You might do well to consider a
more smooth approach such as using the generalized roc area (C-index) or its
related rank correlation measure Dxy.  Also there are odds ratios.

Frank

















-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context:
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2891623.html
Sent from the R help mailing list archive at Nabble.com.

peterfrancis at me.com

2010-Oct-01 14:23 UTC

head link

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

The reason I am trying to assign them is because I have a data set where i have
arrived at  the most likely model that describes the data and now I have another
dataset where I know the factors but not the response.

Therefore, surely I need to assign the predicted values to a response in order
to say something like:

Based on the model I believe unknown 1 is good, where as unknown 2 is very good
etc?

Maybe I am missing something or using the wrong approach but I thought the main
purpose of using the predict function on new data was to "predict" the
response?

Peter

On 1 Oct 2010, at 14:51, Frank Harrell <f.harrell at vanderbilt.edu>
wrote:
> 
> Why assign them at all?  Is this a "forced choice at gunpoint"
problem?
> Remember what probabilities mean.
> 
> Frank
> 
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> -- 
> View this message in context:
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

John Haart

2010-Oct-01 14:36 UTC

head link

[R] Fwd: Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

Frank and list,

The reason I am trying to assign them is because I have a data set where i have
arrived at  the most likely model that describes the data and now I have another
dataset where I know the factors but not the response.

Therefore, surely I need to assign the predicted values to a response in order
to say something like:

Based on the model I believe unknown 1 is good, where as unknown 2 is very good
etc?

Maybe I am missing something or using the wrong approach but I thought the main
purpose of using the predict function on new data was to "predict" the
response?

John

On 1 Oct 2010, at 14:51, Frank Harrell <f.harrell at vanderbilt.edu>
wrote:
> 
> Why assign them at all?  Is this a "forced choice at gunpoint"
problem?
> Remember what probabilities mean.
> 
> Frank
> 
> -----
> Frank Harrell
> Department of Biostatistics, Vanderbilt University
> -- 
> View this message in context:
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2909713.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Jorge W. Cardoso

2010-Oct-01 19:44 UTC

head link

[R] SQL Windows server 2008 R2 x32 / x64

Dear R users

Sorry for my mistake
The message is for <freetds at lists.ibiblio.org> list

Jorge


El Viernes 01/10/10, 16:31:08 Jorge W. Cardoso escribi?:
____________________________________________________> Hello list
> 
>  Can FreeTDS connect to a "SQL Windows server 2008 x64"?
> 
>  If it does, Do I need a 64 bit linux kernel or may a 32 bit
>  kernel in a x86 architecture be ok?
> 
>  Regards
> 
>  Jorge
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented,
> minimal, self-contained, reproducible code.

John Haart

2010-Oct-04 13:03 UTC

head link

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

Dear List and Frank,

I have calculated the log-odds for my models but maybe i am not getting
something but i am not understanding how for a categorical factor this helps? On
all the examples i have see it relates to continuous factors where moving from
one number to another shows either a increase or decrease, not as in my case a
change of catagory.

Furthermore, this gives the values for each factor independent of each other,
how do i get the log-odds for the entire model? I appreciate i maybe trying to
put things in boxes again, i am not i am happy to report the log odds  of moving
from one response level to the next but would like it for all the factors
together not independently.

John
										Low		High	Diff.	Effect	S.E.		Lower	Upper
WO	Woody:Non_woody					1		2		NA	0.28	0.16	-0.04	0.6
Odds Ratio								1		2		NA	1.32	NA		0.96	1.82
PD	Abiotic:Biotic							2		1		NA	-1.21	0.13	-1.47	-0.96
Odds Ratio								2		1		NA	0.3		NA		0.23	0.38
ALT	All:Low								3		1		NA	0.47	0.19	0.11	0.84
Odds Ratio								3		1		NA	1.6		NA		1.11	2.31
ALT	High:Low							3		2		NA	-0.07	0.14	-0.35	0.21
Odds Ratio								3		2		NA	0.93	NA		0.7		1.24
ALT	Mid:Low								3		4		NA	0.39	0.15	0.1		0.67
Odds Ratio								3		4		NA	1.48	NA		1.11	1.96
REG	Two_plus:One					1		2		NA	-0.59	0.13	-0.84	-0.34
Odds Ratio								1		2		NA	0.55	NA		0.43	0.72
BIO	Arctic:Subtropical/Tropical				4		1		NA	-1.02	0.81	-2.61	0.58
Odds Ratio								4		1		NA	0.36	NA		0.07	1.78
BIO	Boreal:Subtropical/Tropical			4		2		NA	-1.21	0.81	-2.79	0.37
Odds Ratio								4		2		NA	0.3		NA		0.06	1.44
BIO	Mediterranean:Subtropical/Tropical	4		3		NA	-1.89	0.48	-2.83	-0.95
Odds Ratio								4		3		NA	0.15	NA		0.06	0.39
BIO	Temperate:Subtropical/Tropical		4		5		NA	-0.09	0.16	-0.41	0.23
Odds Ratio								4		5		NA	0.91	NA		0.66	1.26
On 3 Oct 2010, at 15:29, Frank Harrell wrote:


You still seem to be hung up on making arbitrary classifications.  Instead,
look at tendencies using odds ratios or rank correlation measures.  My book
Regression Modeling Strategies covers this.

Frank

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context:
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2953220.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Frank Harrell

2010-Oct-04 13:10 UTC

head link

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

I may be missing a point, but the proportional odds model easily gives you
odds ratios for Y>=j (independent of j by PO assumption).  Other options
include examining a rank correlation between the linear predictor and Y, or
(if Y is numeric and spacings between categories are meaningful) you can get
predicted mean Y (see the Mean.lrm in the R rms package, a replacement for
the Design package).

Frank 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context:
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Frank-Harrell-in-the-predict-lrm-Design-help-tp2883311p2954274.html
Sent from the R help mailing list archive at Nabble.com.

Reasonably Related Threads

Search for more apparently analagous threads

R help - Oct 2010 - Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

[R] Fwd: Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

[R] SQL Windows server 2008 R2 x32 / x64

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

[R] Interpreting the example given by Frank Harrell in the predict.lrm {Design} help

Reasonably Related Threads