thr3ads.net - R help - [R] warning associated with Logistic Regression [Jan 2004]

If this information is useful, please help other people find it:
Share via:

Guillem Chust

2004-Jan-25 13:59 UTC

[R] warning associated with Logistic Regression

Hi All,

When I tried to do logistic regression (with high maximum number of
iterations) I got the following warning message

Warning message:
fitted probabilities numerically 0 or 1 occurred in: (if
(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,

As I checked from the Archive R-Help mails, it seems that this happens when
the dataset exhibits complete separation. However, p-values tend to 1, and
residual deviance tends to 0. My questions then is:
-Is the converged model correct? or
-Can I limit the number of iterations in order to avoid this warning? If I
do so, I?ve checked that the model selected by step can diverge in some
cases (I use 10 different presence-absence datasets and 18 explanatory
variables), and when I validate the model with independent data, the new
model is slightly more powerful (in the most part of cases).

Thanks in advance,

Guillem Chust

------------------------------------------------------------------
Guillem Chust 				       chust at cict.fr

Laboratoire Evolution et Diversit? Biologique, UMR 5174 CNRS/UPS
UPS Toulouse III, batiment IVR3
118, route de Narbonne  - 31062 Toulouse Cedex 4, France
Tel 33 (0)5 61 55 67 58          Fax 33 (0)5 61 55 73 27
http://www.edb.ups-tlse.fr/

David Firth

2004-Jan-25 17:02 UTC

head link

[R] warning associated with Logistic Regression

On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:
> Hi All,
>
> When I tried to do logistic regression (with high maximum number of
> iterations) I got the following warning message
>
> Warning message:
> fitted probabilities numerically 0 or 1 occurred in: (if
> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
>
> As I checked from the Archive R-Help mails, it seems that this happens 
> when
> the dataset exhibits complete separation.
Yes.  correct.
> However, p-values tend to 1
The reported p-values cannot be trusted: the asymptotic theory on which 
they are based is not valid in such circumstances.
> , and
> residual deviance tends to 0.
Yes, this happens under complete separation: the model fits the 
observed 0/1 data perfectly.
> My questions then is:
> -Is the converged model correct?
Well, "converged" is not really the right word to use -- the iterative
algorithm has diverged.  At least one of the coefficients has its MLE 
at infinity (or minus infinity).  In that sense what you see reported 
(ie large values of estimated log odds-ratios, which approximate 
infinity) is correct.  Still more correct would be estimates reported 
as Inf or -Inf: but the algorithm is not programmed to detect such 
divergence.
> or
> -Can I limit the number of iterations in order to avoid this warning?
Yes, probably, but this is not a sensible course of action.  The 
iterations are iterations of an algorithm to compute the MLE.  The MLE 
is not finite-valued, and the warning is a clue to that.

If you *really* want finite parameter estimates, the answer is not to 
use maximum likelihood as the method of estimation.  Various 
alternatives exist, mostly based on penalizing the likelihood [one such 
is in the brlr package, but there are others].  As a general principle 
surely it's better to maximize a different criterion (eg a penalized 
likelihood, with a purposefully chosen penalty function) rather than 
stop the MLE algorithm prematurely and arbitrarily?

I hope this helps!

David

Professor David Firth
Dept of Statistics
University of Warwick
Coventry CV4 7AL
United Kingdom

Email: d.firth at warwick.ac.uk
Voice: +44 (0)247 657 2581
Fax:   +44 (0)247 652 4532

(Ted Harding)

2004-Jan-25 18:06 UTC

head link

[R] warning associated with Logistic Regression

On 25-Jan-04 Guillem Chust wrote:> Hi All,
> 
> When I tried to do logistic regression (with high maximum number of
> iterations) I got the following warning message
> 
> Warning message:
> fitted probabilities numerically 0 or 1 occurred in: (if
> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> 
> As I checked from the Archive R-Help mails, it seems that this happens
> when the dataset exhibits complete separation.
This is so. Indeed, there is a sense in which you are experiencing
unusually good fortune, since for values of your predictors in one
region you are perfectly predicting the 0s in your reponse, and for
values in another region your a perfectly predicting the 1s. What
better could you hope for?

However, you would respond that this is not realistic: your variables
are not (in real life) such that P(Y=1|X=x) is ever exactly 1 or
exactly 0, so this perfect prediction is not realistic.

In that case, you are somewhat stuck. The plain fact is that your
data (in particular the way the values of the X variables are distributed)
are not adequate to tell you what is happening.

There may be manipulative tricks (like penalised regression) which
would inhibit the logistic regression from going all the way to a
perfect fit; but, then, how would you know how far to let it go
(because it will certainly go as far in that direction as you allow
it to).

The key parameter in this situation the dispersion parameter (sigma
in the usual notation). When you get perfect fit in a "completely
separated" situation, this corresponds to sigma=0. If you don't like
this, then there must be reasons why you want sigma>0 and this may
imply that you have reasons for wanting sigma to be at least s0 (say),
or, if you are prepared to be Bayesian about it, you may be satisfied
that there is a prior distribution for sigma which would not allow
sigma=0, and would attach high probability to a range of sigma values
which you condisder to be realistic.

Unless you have a fairly firm idea of what sort of values sigma is
likely to havem then you are indeed stuck because you have no reason
to prefer one positive value of sigma to a different positive value
of sigma. In that case you cannot really object if the logistic
regression tries to make it as small as possible!

In the absence of such reasons, you may consider exploring the
effect of fixing sigma at some positive value, and then varying this
value. For each such value, look at the estimates of the coefficients
of the X variables, the goodness of fit, and so on. This may help you
to form an idea of what sort of estimate you should hope for, and
would enable you to design a better dataset (i.e. placement of X values)
which would be capable of supporting a fit which was both realistic
and estimated with adequate precision.

Another approach you should consider, if you have several X variables,
is to look at subsets of these variables, retaining in the first
instance only those few (the fewer the better) which on substantive
grounds you considered to be the most important in the application
to which the data refer. Also look at the multivariate distribution
of the X values and in particular carry out a linear discriminant
anaysis on them.

If, however, you have only 1 X variable, then you have a situation
equivalent to the following (pairs of (x,y)):

  (-2,0), (-1,9), (0,0), (1,1), (2,1), (3,1).

clearly you are not going to get anything out of this unless you
either repeat the experiment many times (so that you have several
Y responses at each value of X, and probabilities between 0 and 1
at each X then have a better chance to express themselves, as so
many 0s and also so many 1s at each X), or you fill in the range
over which P(Y=1|X=x) increases from low to high, e.g. by observing
Y for X = -1.0, -0.9, -0.8, ... , 0.0, 0.1, ... 1.9, 2.0 (say).

I hope these suggestions help.
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 25-Jan-04                                       Time: 18:06:16
------------------------------ XFMail ------------------------------

Ravi Varadhan

2004-Jan-26 16:40 UTC

head link

[R] warning associated with Logistic Regression

Hi All:

I am really fascinated by the content and the depth of discussion of 
this thread.  This really exemplifies what I have come to love and 
enjoy about the R user group - that it is not JUST an answering service 
for getting help on programming issues, but also a forum for some 
critical and deep thinking on fundamental statistical issues.  

Kudos to the group!

Best,
Ravi.

----- Original Message -----
From: David Firth <d.firth at warwick.ac.uk>
Date: Monday, January 26, 2004 5:28 am
Subject: Re: [R] warning associated with Logistic Regression
> On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote:
> 
> > On 25-Jan-04 Guillem Chust wrote:
> >> Hi All,
> >>
> >> When I tried to do logistic regression (with high maximum 
> number of
> >> iterations) I got the following warning message
> >>
> >> Warning message:
> >> fitted probabilities numerically 0 or 1 occurred in: (if
> >> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> >>
> >> As I checked from the Archive R-Help mails, it seems that this 
> happens>> when the dataset exhibits complete separation.
> >
> > This is so. Indeed, there is a sense in which you are experiencing
> > unusually good fortune, since for values of your predictors in one
> > region you are perfectly predicting the 0s in your reponse, and for
> > values in another region your a perfectly predicting the 1s. What
> > better could you hope for?
> >
> > However, you would respond that this is not realistic: your 
> variables> are not (in real life) such that P(Y=1|X=x) is ever 
> exactly 1 or
> > exactly 0, so this perfect prediction is not realistic.
> >
> > In that case, you are somewhat stuck. The plain fact is that your
> > data (in particular the way the values of the X variables are 
> > distributed)
> > are not adequate to tell you what is happening.
> >
> > There may be manipulative tricks (like penalised regression) which
> > would inhibit the logistic regression from going all the way to a
> > perfect fit; but, then, how would you know how far to let it go
> > (because it will certainly go as far in that direction as you allow
> > it to).
> >
> > The key parameter in this situation the dispersion parameter (sigma
> > in the usual notation). When you get perfect fit in a "completely
> > separated" situation, this corresponds to sigma=0. If you
don't like
> > this, then there must be reasons why you want sigma>0 and this may
> > imply that you have reasons for wanting sigma to be at least s0 
> (say),> or, if you are prepared to be Bayesian about it, you may 
> be satisfied
> > that there is a prior distribution for sigma which would not allow
> > sigma=0, and would attach high probability to a range of sigma 
> values> which you condisder to be realistic.
> >
> > Unless you have a fairly firm idea of what sort of values sigma is
> > likely to havem then you are indeed stuck because you have no reason
> > to prefer one positive value of sigma to a different positive value
> > of sigma. In that case you cannot really object if the logistic
> > regression tries to make it as small as possible!
> 
> This seems arguable.  Accepting that we are talking about point 
> estimation (the desirability of which is of course open to 
> question!!), 
> then old-fashioned criteria like bias, variance and mean squared 
> error 
> can be used as a guide.  For example, we might desire to use an 
> estimation method for which the MSE of the estimated logistic 
> regression coefficients (suitably standardized) is as small as 
> possible; or some other such thing.
> 
> The simplest case is estimation of log(pi/(1-pi)) given an 
> observation 
> r from binomial(n,pi).  Suppose we find that r=n -- what then can 
> we 
> say about pi?  Clearly not much if n is small, rather more if n is 
> large.  Better in terms of MSE than the MLE (whose MSE is 
> infinite) is 
> to use log(p/(1-p)), with p = (r+0.5)/(n+1).  See for example Cox 
> & 
> Snell's book on binary data.  This corresponds to penalizing the 
> likelihood by the Jeffreys prior, a penalty function which has 
> good 
> frequentist properties also in the more general logistic 
> regression 
> context.  References given in the brlr package give the theory and 
> some 
> empirical evidence.  The logistf package, also on CRAN, is another 
> implementation.
> 
> I do not mean to imply that the Jeffreys-prior penalty will be the 
> right thing for all applications -- it will not.  (eg if you 
> really do 
> have prior information, it would be better to use it.)
> 
> In general I agree wholeheartedly that it is best to get 
> more/better 
> data!
> 
> > In the absence of such reasons,
> (cut)
> 
> All good wishes,
> David
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-
> guide.html

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2004 - warning associated with Logistic Regression

[R] warning associated with Logistic Regression

[R] warning associated with Logistic Regression

[R] warning associated with Logistic Regression

[R] warning associated with Logistic Regression

Reasonably Related Threads