Hi All, When I tried to do logistic regression (with high maximum number of iterations) I got the following warning message Warning message: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, As I checked from the Archive R-Help mails, it seems that this happens when the dataset exhibits complete separation. However, p-values tend to 1, and residual deviance tends to 0. My questions then is: -Is the converged model correct? or -Can I limit the number of iterations in order to avoid this warning? If I do so, I?ve checked that the model selected by step can diverge in some cases (I use 10 different presence-absence datasets and 18 explanatory variables), and when I validate the model with independent data, the new model is slightly more powerful (in the most part of cases). Thanks in advance, Guillem Chust ------------------------------------------------------------------ Guillem Chust chust at cict.fr Laboratoire Evolution et Diversit? Biologique, UMR 5174 CNRS/UPS UPS Toulouse III, batiment IVR3 118, route de Narbonne - 31062 Toulouse Cedex 4, France Tel 33 (0)5 61 55 67 58 Fax 33 (0)5 61 55 73 27 http://www.edb.ups-tlse.fr/
On Sunday, Jan 25, 2004, at 13:59 Europe/London, Guillem Chust wrote:> Hi All, > > When I tried to do logistic regression (with high maximum number of > iterations) I got the following warning message > > Warning message: > fitted probabilities numerically 0 or 1 occurred in: (if > (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, > > As I checked from the Archive R-Help mails, it seems that this happens > when > the dataset exhibits complete separation.Yes. correct.> However, p-values tend to 1The reported p-values cannot be trusted: the asymptotic theory on which they are based is not valid in such circumstances.> , and > residual deviance tends to 0.Yes, this happens under complete separation: the model fits the observed 0/1 data perfectly.> My questions then is: > -Is the converged model correct?Well, "converged" is not really the right word to use -- the iterative algorithm has diverged. At least one of the coefficients has its MLE at infinity (or minus infinity). In that sense what you see reported (ie large values of estimated log odds-ratios, which approximate infinity) is correct. Still more correct would be estimates reported as Inf or -Inf: but the algorithm is not programmed to detect such divergence.> or > -Can I limit the number of iterations in order to avoid this warning?Yes, probably, but this is not a sensible course of action. The iterations are iterations of an algorithm to compute the MLE. The MLE is not finite-valued, and the warning is a clue to that. If you *really* want finite parameter estimates, the answer is not to use maximum likelihood as the method of estimation. Various alternatives exist, mostly based on penalizing the likelihood [one such is in the brlr package, but there are others]. As a general principle surely it's better to maximize a different criterion (eg a penalized likelihood, with a purposefully chosen penalty function) rather than stop the MLE algorithm prematurely and arbitrarily? I hope this helps! David Professor David Firth Dept of Statistics University of Warwick Coventry CV4 7AL United Kingdom Email: d.firth at warwick.ac.uk Voice: +44 (0)247 657 2581 Fax: +44 (0)247 652 4532
On 25-Jan-04 Guillem Chust wrote:> Hi All, > > When I tried to do logistic regression (with high maximum number of > iterations) I got the following warning message > > Warning message: > fitted probabilities numerically 0 or 1 occurred in: (if > (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, > > As I checked from the Archive R-Help mails, it seems that this happens > when the dataset exhibits complete separation.This is so. Indeed, there is a sense in which you are experiencing unusually good fortune, since for values of your predictors in one region you are perfectly predicting the 0s in your reponse, and for values in another region your a perfectly predicting the 1s. What better could you hope for? However, you would respond that this is not realistic: your variables are not (in real life) such that P(Y=1|X=x) is ever exactly 1 or exactly 0, so this perfect prediction is not realistic. In that case, you are somewhat stuck. The plain fact is that your data (in particular the way the values of the X variables are distributed) are not adequate to tell you what is happening. There may be manipulative tricks (like penalised regression) which would inhibit the logistic regression from going all the way to a perfect fit; but, then, how would you know how far to let it go (because it will certainly go as far in that direction as you allow it to). The key parameter in this situation the dispersion parameter (sigma in the usual notation). When you get perfect fit in a "completely separated" situation, this corresponds to sigma=0. If you don't like this, then there must be reasons why you want sigma>0 and this may imply that you have reasons for wanting sigma to be at least s0 (say), or, if you are prepared to be Bayesian about it, you may be satisfied that there is a prior distribution for sigma which would not allow sigma=0, and would attach high probability to a range of sigma values which you condisder to be realistic. Unless you have a fairly firm idea of what sort of values sigma is likely to havem then you are indeed stuck because you have no reason to prefer one positive value of sigma to a different positive value of sigma. In that case you cannot really object if the logistic regression tries to make it as small as possible! In the absence of such reasons, you may consider exploring the effect of fixing sigma at some positive value, and then varying this value. For each such value, look at the estimates of the coefficients of the X variables, the goodness of fit, and so on. This may help you to form an idea of what sort of estimate you should hope for, and would enable you to design a better dataset (i.e. placement of X values) which would be capable of supporting a fit which was both realistic and estimated with adequate precision. Another approach you should consider, if you have several X variables, is to look at subsets of these variables, retaining in the first instance only those few (the fewer the better) which on substantive grounds you considered to be the most important in the application to which the data refer. Also look at the multivariate distribution of the X values and in particular carry out a linear discriminant anaysis on them. If, however, you have only 1 X variable, then you have a situation equivalent to the following (pairs of (x,y)): (-2,0), (-1,9), (0,0), (1,1), (2,1), (3,1). clearly you are not going to get anything out of this unless you either repeat the experiment many times (so that you have several Y responses at each value of X, and probabilities between 0 and 1 at each X then have a better chance to express themselves, as so many 0s and also so many 1s at each X), or you fill in the range over which P(Y=1|X=x) increases from low to high, e.g. by observing Y for X = -1.0, -0.9, -0.8, ... , 0.0, 0.1, ... 1.9, 2.0 (say). I hope these suggestions help. Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 25-Jan-04 Time: 18:06:16 ------------------------------ XFMail ------------------------------
Hi All: I am really fascinated by the content and the depth of discussion of this thread. This really exemplifies what I have come to love and enjoy about the R user group - that it is not JUST an answering service for getting help on programming issues, but also a forum for some critical and deep thinking on fundamental statistical issues. Kudos to the group! Best, Ravi. ----- Original Message ----- From: David Firth <d.firth at warwick.ac.uk> Date: Monday, January 26, 2004 5:28 am Subject: Re: [R] warning associated with Logistic Regression> On Sunday, Jan 25, 2004, at 18:06 Europe/London, (Ted Harding) wrote: > > > On 25-Jan-04 Guillem Chust wrote: > >> Hi All, > >> > >> When I tried to do logistic regression (with high maximum > number of > >> iterations) I got the following warning message > >> > >> Warning message: > >> fitted probabilities numerically 0 or 1 occurred in: (if > >> (is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y, > >> > >> As I checked from the Archive R-Help mails, it seems that this > happens>> when the dataset exhibits complete separation. > > > > This is so. Indeed, there is a sense in which you are experiencing > > unusually good fortune, since for values of your predictors in one > > region you are perfectly predicting the 0s in your reponse, and for > > values in another region your a perfectly predicting the 1s. What > > better could you hope for? > > > > However, you would respond that this is not realistic: your > variables> are not (in real life) such that P(Y=1|X=x) is ever > exactly 1 or > > exactly 0, so this perfect prediction is not realistic. > > > > In that case, you are somewhat stuck. The plain fact is that your > > data (in particular the way the values of the X variables are > > distributed) > > are not adequate to tell you what is happening. > > > > There may be manipulative tricks (like penalised regression) which > > would inhibit the logistic regression from going all the way to a > > perfect fit; but, then, how would you know how far to let it go > > (because it will certainly go as far in that direction as you allow > > it to). > > > > The key parameter in this situation the dispersion parameter (sigma > > in the usual notation). When you get perfect fit in a "completely > > separated" situation, this corresponds to sigma=0. If you don't like > > this, then there must be reasons why you want sigma>0 and this may > > imply that you have reasons for wanting sigma to be at least s0 > (say),> or, if you are prepared to be Bayesian about it, you may > be satisfied > > that there is a prior distribution for sigma which would not allow > > sigma=0, and would attach high probability to a range of sigma > values> which you condisder to be realistic. > > > > Unless you have a fairly firm idea of what sort of values sigma is > > likely to havem then you are indeed stuck because you have no reason > > to prefer one positive value of sigma to a different positive value > > of sigma. In that case you cannot really object if the logistic > > regression tries to make it as small as possible! > > This seems arguable. Accepting that we are talking about point > estimation (the desirability of which is of course open to > question!!), > then old-fashioned criteria like bias, variance and mean squared > error > can be used as a guide. For example, we might desire to use an > estimation method for which the MSE of the estimated logistic > regression coefficients (suitably standardized) is as small as > possible; or some other such thing. > > The simplest case is estimation of log(pi/(1-pi)) given an > observation > r from binomial(n,pi). Suppose we find that r=n -- what then can > we > say about pi? Clearly not much if n is small, rather more if n is > large. Better in terms of MSE than the MLE (whose MSE is > infinite) is > to use log(p/(1-p)), with p = (r+0.5)/(n+1). See for example Cox > & > Snell's book on binary data. This corresponds to penalizing the > likelihood by the Jeffreys prior, a penalty function which has > good > frequentist properties also in the more general logistic > regression > context. References given in the brlr package give the theory and > some > empirical evidence. The logistf package, also on CRAN, is another > implementation. > > I do not mean to imply that the Jeffreys-prior penalty will be the > right thing for all applications -- it will not. (eg if you > really do > have prior information, it would be better to use it.) > > In general I agree wholeheartedly that it is best to get > more/better > data! > > > In the absence of such reasons, > (cut) > > All good wishes, > David > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html