thr3ads.net - R help - [R] Pseudo R^2 for logit - really naive question [Aug 2002]

If this information is useful, please help other people find it:
Share via:

Paul M. Jacobson

2002-Aug-04 13:08 UTC

[R] Pseudo R^2 for logit - really naive question

I am using GLM to calculate logit models based on cross-sectional data.  I
am now down to the hard work of making the results intelligible to very
average readers.  Is there any way to calculate a psuedo analoque to the R^2
in standard linear regression for use as a purely descriptive statistic of
goodness of fit? Most of the readers of my report will be vaguely familiar
and more comfortable with R^2 than with any other regression diagnostics.

Paul M. Jacobson
Jacobson Consulting Inc.
80 Front Street East, Suite 720
Toronto, ON, M5E 1T4
Voice:  +1(416)868-1141
Farm: +1(519)463-6061/6224
Fax: +1(416)868-1131
E-mail: pmj at jciconsult.com
Web:  http://www.jciconsult.com/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Frank E Harrell Jr

2002-Aug-04 15:36 UTC

head link

[R] Pseudo R^2 for logit - really naive question

The Nagelkerke R^2 is commonly used.   The lrm function in the Design library
computes this for logistic regression.  The numerator is 1 - exp(-LR/n) where LR
is the likelihood ratio chi-square stat and n is the total sample size.  Divide
it by the maximum attainable value of this if the model is perfect (which is a
simple function of the -2 log likelihood with an intercept-only model) to get
Nagelkerke's R^2.  The numerator is exactly the ordinary R^2 in OLS, as LR =
-n log(1-R^2) there.  For a more interpretable index and one that measures
purely discrimination ability, the ROC area or "C index" which is
essentially a Mann-Whitney statistic based on concordance probability is
recommended.  The lrm function also outputs this or you can get it from the
somers2 or rcorr.cens functions in the Hmisc library.

Frank Harrell

On Sun, 4 Aug 2002 09:08:46 -0400
"Paul M. Jacobson" <pmj at jciconsult.com> wrote:
> I am using GLM to calculate logit models based on cross-sectional data.  I
> am now down to the hard work of making the results intelligible to very
> average readers.  Is there any way to calculate a psuedo analoque to the
R^2
> in standard linear regression for use as a purely descriptive statistic of
> goodness of fit? Most of the readers of my report will be vaguely familiar
> and more comfortable with R^2 than with any other regression diagnostics.
> 
> Paul M. Jacobson
> Jacobson Consulting Inc.
> 80 Front Street East, Suite 720
> Toronto, ON, M5E 1T4
> Voice:  +1(416)868-1141
> Farm: +1(519)463-6061/6224
> Fax: +1(416)868-1131
> E-mail: pmj at jciconsult.com
> Web:  http://www.jciconsult.com/
> 
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Chris Lawrence

2002-Aug-04 16:23 UTC

head link

[R] Pseudo R^2 for logit - really naive question

On Aug 04, Paul M. Jacobson wrote:> I am using GLM to calculate logit models based on cross-sectional data.  I
> am now down to the hard work of making the results intelligible to very
> average readers.  Is there any way to calculate a psuedo analoque to the
R^2
> in standard linear regression for use as a purely descriptive statistic of
> goodness of fit? Most of the readers of my report will be vaguely familiar
> and more comfortable with R^2 than with any other regression diagnostics.
In fact, there are several "R^2-like" measures for logit and probit
models (not surprisingly, called "pseudo-R^2").  An overview is in:

"Pseudo-R Measures for Some Common Limited Dependent Variable Models"
http://citeseer.nj.nec.com/veall96pseudor.html

The Aldrich-Nelson measure appears to be the most widely used.

You may also want to consider Herron's (1999) "Expected Percent
Correctly Predicted" and related measures, described in Political
Analysis 8(1): http://web.polmeth.ufl.edu/pa/herron.pdf; even
traditional PCP/PRE measures tend to be quite informative (perhaps
even more useful than Pseudo-R^2).


Chris
-- 
Chris Lawrence <cnlawren at olemiss.edu> - http://www.lordsutch.com/chris/

Instructor and Ph.D. Candidate, Political Science, Univ. of Mississippi
208 Deupree Hall - 662-915-5765
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Huan Huang

2002-Aug-04 16:38 UTC

head link

[R] Incomplete Principal Component Regression

Dear list

My data frame has serious collinearity problem. I want to try the Incomplete
Principal Component Regression, introduced by "Regression Analysis" by
Rudolf j. Freund and William J. Wilson (1998). I wonder if I can find a
function in R or S-plus to do it.

Thanks a lot!

Huan

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Huan Huang

2002-Aug-08 12:23 UTC

head link

[R] Re: cluster summary score

Dear Prof. Harrell and R list,

I have done the variable clustering and summary scores. Thanks a lot for
your kind help.

But it hasn't solved the collinearity problem in my dataset. Afer the
clustering and transcan, there is still very strong collinearity between the
summary scores. The objective of my project is to find out the influential
variables. I believe any variable resuction is not appropriate when the
collinearity exists. I am thinking about the principal component regression
and variable reduction based on it (Rudolf J. Freund and William J. Wilson
(1998), P215).

Does anybody have suggestion on the variable resuction under this condition?
I will appreciate any kind imformation.

Best

Huan
----- Original Message -----
From: "Frank E Harrell Jr" <fharrell at virginia.edu>
To: "Huan Huang" <huang at stats.ox.ac.uk>
Sent: Sunday, August 04, 2002 7:56 PM
Subject: Re: cluster summary score

> On Sun, 4 Aug 2002 19:48:22 +0100
> Huan Huang <huang at stats.ox.ac.uk> wrote:
>
> >
> >
> > > This was just done by
> > >
> > > f <- lrm(y ~ all cluster summary scores)
> > > fastbw(f, suitable stopping criteria)
> >
> > Thank you very much for your kind reply. But I don't know how to
get the
> > cluster summary score.
> >
> > I did:
> > t <- transcan(x, transform = T)
> > t$transform
> >
> > I got a new matrix, with the transformed value for each variable. How
can I> > get the cluster summary scores?
>
> You see the little pc1 function I defined in Hmisc?  I just do things like
>
> p1 <- pc1(t$transform) or pct1(t$transform[,c(3,5,7)]) to use variables
3,5,7>
> Frank
>
> >
> > Huan
> >
> > >
> > > Doing the fast backward stepdown is safer with cluster scores
than
with> > raw variables, especially if you use conservative stopping criteria
(e.g.,> > large alpha).  I allowed "highly insignificant" cluster
scores to be
> > dropped, and did not ever look at their component variables again.
> > >
> > > Frank
> > >
> > > >
> > > > Actually I am doing  my thesis project. My explanatory
variables
have> > > > serious collinearity. I have used the function transcan and
varclus
on> > the
> > > > variables and find out some clusters. I am trying to use the
method
> > > > introduced in this section to drop some variables. I want to
know
how> > you
> > > > carry out the cluster summary scores.
> > > >
> > > > Thanks a lot and looking forward to hearing from you.
> > > >
> > > > Huan
> > > > ----- Original Message -----
> > > > From: "Frank E Harrell Jr" <fharrell at
virginia.edu>
> > > > To: <pmj at jciconsult.com>
> > > > Cc: <r-help at stat.math.ethz.ch>
> > > > Sent: Sunday, August 04, 2002 4:36 PM
> > > > Subject: Re: [R] Pseudo R^2 for logit - really naive
question
> > > >
> > > >
> > > > > The Nagelkerke R^2 is commonly used.   The lrm function
in the
Design> > > > library computes this for logistic regression.  The
numerator is 1 -
> > > > exp(-LR/n) where LR is the likelihood ratio chi-square stat
and n is
the> > > > total sample size.  Divide it by the maximum attainable
value of
this if> > the
> > > > model is perfect (which is a simple function of the -2 log
likelihood> > with
> > > > an intercept-only model) to get Nagelkerke's R^2.  The
numerator is
> > exactly
> > > > the ordinary R^2 in OLS, as LR = -n log(1-R^2) there.  For a
more
> > > > interpretable index and one that measures purely
discrimination
ability,> > the
> > > > ROC area or "C index" which is essentially a
Mann-Whitney statistic
> > based on
> > > > concordance probability is recommended.  The lrm function
also
outputs> > this
> > > > or you can get it from the somers2 or rcorr.cens functions
in the
Hmisc> > > > library.
> > > > >
> > > > > Frank Harrell
> > > > >
> > > > > On Sun, 4 Aug 2002 09:08:46 -0400
> > > > > "Paul M. Jacobson" <pmj at
jciconsult.com> wrote:
> > > > >
> > > > > > I am using GLM to calculate logit models based on
cross-sectional> > data.
> > > > I
> > > > > > am now down to the hard work of making the results
intelligible
to> > very
> > > > > > average readers.  Is there any way to calculate a
psuedo
analoque to> > the
> > > > R^2
> > > > > > in standard linear regression for use as a purely
descriptive
> > statistic
> > > > of
> > > > > > goodness of fit? Most of the readers of my report
will be
vaguely> > > > familiar
> > > > > > and more comfortable with R^2 than with any other
regression
> > > > diagnostics.
> > > > > >
> > > > > > Paul M. Jacobson
> > > > > > Jacobson Consulting Inc.
> > > > > > 80 Front Street East, Suite 720
> > > > > > Toronto, ON, M5E 1T4
> > > > > > Voice:  +1(416)868-1141
> > > > > > Farm: +1(519)463-6061/6224
> > > > > > Fax: +1(416)868-1131
> > > > > > E-mail: pmj at jciconsult.com
> > > > > > Web:  http://www.jciconsult.com/
> > > > > >
> > > > >
> > > >
> >
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > > > -.-.-
> > > > > > r-help mailing list -- Read
> > > > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > > > Send "info", "help", or
"[un]subscribe"
> > > > > > (in the "body", not the subject !)  To:
> > r-help-request at stat.math.ethz.ch
> > > > > >
> > > >
> >
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.> > > > _._
> > > > >
> > > > >
> > > > > --
> > > > > Frank E Harrell Jr              Prof. of Biostatistics
&
Statistics> > > > > Div. of Biostatistics & Epidem. Dept. of Health
Evaluation
Sciences> > > > > U. Virginia School of Medicine
> > http://hesweb1.med.virginia.edu/biostat
> > > >
> >
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > > > -.-.-
> > > > > r-help mailing list -- Read
> > > > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > > Send "info", "help", or
"[un]subscribe"
> > > > > (in the "body", not the subject !)  To:
> > r-help-request at stat.math.ethz.ch
> > > > >
> > > >
> >
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.> > > > _._
> > > > >
> > > >
> > >
> > >
> > > --
> > > Frank E Harrell Jr              Prof. of Biostatistics &
Statistics
> > > Div. of Biostatistics & Epidem. Dept. of Health Evaluation
Sciences
> > > U. Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat> > >
> >
>
>
> --
> Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Jonathan Baron

2002-Aug-08 13:37 UTC

head link

[R] Re: cluster summary score

On 08/08/02 13:23, Huan Huang wrote:>Dear Prof. Harrell and R list,
>
>I have done the variable clustering and summary scores. Thanks a lot for
>your kind help.
>
>But it hasn't solved the collinearity problem in my dataset. Afer the
>clustering and transcan, there is still very strong collinearity between the
>summary scores. The objective of my project is to find out the influential
>variables. I believe any variable resuction is not appropriate when the
>collinearity exists. I am thinking about the principal component regression
>and variable reduction based on it (Rudolf J. Freund and William J. Wilson
>(1998), P215).
>
>Does anybody have suggestion on the variable resuction under this condition?
>I will appreciate any kind imformation.
I'm not sure what you mean by resuction, but when I and many
other psychologists face this kind of problem - reducing a set of
variables - we often use factor analysis.  A good progam is
factanal in the mva library.  Varimax rotation (the default)
usuallly picks out a sensible set of factors, although of course
other rotations may be more informative for a given case.  You
sort the loadings if you want.  (Look at the various options for
loadings() and print().)

There are no fixed rules for this sort of thing.  Sometimes one
variable winds up in the wrong place by chance.  The strategy I
use is to figure out a sensible grouping of variables before I
use them to predict anything, so that I am not biased by knowing
the results.  So I feel free to move or remove variables that
don't make sense.  Some people may prefer a more rigid approach,
which further reduces the temptation to cheat.

Having found the grouping of variables, you can do three
different things:

1. Define "scores" by simply adding up the (standardized?) scores
   of the variables in each group (with high loadings in the same
   factor, perhaps).

2. Use the factor scores themselves as variables.

3. Use a single representative variable from each group.  This
   seems to be what you were suggesting, but I'm having trouble
   thinking of a situation where this would be better than #1 or
   #2.

Whatever you do, you need to figure out how many groups, and
prcomp() or princomp() is often helpful here.  (And take a look
at biplot().  A really nice tool for looking at the first two
principal components.)  The factanal() program also reports a
chi-square fit statistic.  So in principle you could use that to
figure out how many factors there are.  However, that method
usually gives more factors than are meaningful, especially when
you have a large data set.

-- 
Jonathan Baron, Professor of Psychology, University of Pennsylvania
Home page:            http://www.sas.upenn.edu/~baron
R page:               http://finzi.psych.upenn.edu/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more reasonably related threads

R help - Aug 2002 - Pseudo R^2 for logit - really naive question

[R] Pseudo R^2 for logit - really naive question

[R] Pseudo R^2 for logit - really naive question

[R] Pseudo R^2 for logit - really naive question

[R] Incomplete Principal Component Regression

[R] Re: cluster summary score

[R] Re: cluster summary score

Possibly Parallel Threads