thr3ads.net - R help - [R] warnings associated with logistic regression [Jul 2000]

If this information is useful, please help other people find it:
Share via:

Allan Strand

2000-Jul-11 16:20 UTC

[R] warnings associated with logistic regression

Hi all,

This is as much as statistical/estimation question as an R-specific
one, but here goes.

I am trying to use logistic regression to predict suitability of
habitats for certain plant species.  The response variable is a binary 
one that indicates whether a particular species is found at a site on
the landscape.  The independent variables represent physical
characteristics of the landscape derived from a GIS.  A significant
proportion of the time I get the following warning messages from
glm():
> lr <-
glm(known.v1~elevation+aspect+slope+energy15+energy166+aspect+accum+streams.buffered,family=binomial,data=siteframe)Warning messages: 
1: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null else
glm.fit)(x = X, y = Y,
2: fitted probabilities numerically 0 or 1 occurred in: (if (is.empty.model(mt))
glm.fit.null else glm.fit)(x = X, y = Y,

Now I can get the algorithm to converge (or at least not produce the
warning) by increasing the number of iterations, but that does not
affect the second warning.  A read of Hosmer and Lemeshow (1989) does not
provide much insight, so I thought that I would post the question  to
the list.

Any comments? Also, I'd be happy to email a dataset that exhibits this
behavior if anyone is curious enough.

Cheers,
A.

-- 
Allan Strand,   Biology    linum.cofc.edu
College of Charleston      Ph. (843) 953-8085
Charleston, SC 29424       Fax (843) 953-5453

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian D Ripley

2000-Jul-11 16:36 UTC

head link

[R] warnings associated with logistic regression

On 11 Jul 2000, Allan Strand wrote:
> Hi all,
> 
> This is as much as statistical/estimation question as an R-specific
> one, but here goes.
> 
> I am trying to use logistic regression to predict suitability of
> habitats for certain plant species.  The response variable is a binary 
> one that indicates whether a particular species is found at a site on
> the landscape.  The independent variables represent physical
> characteristics of the landscape derived from a GIS.  A significant
> proportion of the time I get the following warning messages from
> glm():
> 
> > lr <-
glm(known.v1~elevation+aspect+slope+energy15+energy166+aspect+accum+streams.buffered,family=binomial,data=siteframe)
> Warning messages: 
> 1: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null
else glm.fit)(x = X, y = Y,
> 2: fitted probabilities numerically 0 or 1 occurred in: (if
(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> 
> Now I can get the algorithm to converge (or at least not produce the
> warning) by increasing the number of iterations, but that does not
> affect the second warning.  A read of Hosmer and Lemeshow (1989) does not
> provide much insight, so I thought that I would post the question  to
> the list.
> 
> Any comments? Also, I'd be happy to email a dataset that exhibits this
> behavior if anyone is curious enough.
It usually means that your dataset exhibits complete separation, and so a
logistic regression can fit perfectly. All the diagnostics (p-values etc)
are then (very) unreliable. There are also concepts of partial separation,
where only some of the cases are fitted perfectly, but similar comments
apply.

This is shamefully missed in most statistics books, but is well known in
the AI community, which used to seek such fits (as `perceptrons') and
do again (as `support vector machines') Santer & Duffy is the only
contingency-tables book I know that covers this, as does my (1996) Pattern
Recognition and Neural Networks book.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2000-Jul-11 16:41 UTC

head link

[R] warnings associated with logistic regression

On 11 Jul 2000, Allan Strand wrote:
> Hi all,
> 
> This is as much as statistical/estimation question as an R-specific
> one, but here goes.
> 
> I am trying to use logistic regression to predict suitability of
> habitats for certain plant species.  The response variable is a binary 
> one that indicates whether a particular species is found at a site on
> the landscape.  The independent variables represent physical
> characteristics of the landscape derived from a GIS.  A significant
> proportion of the time I get the following warning messages from
> glm():
> 
> > lr <-
glm(known.v1~elevation+aspect+slope+energy15+energy166+aspect+accum+streams.buffered,family=binomial,data=siteframe)
> Warning messages: 
> 1: Algorithm did not converge in: (if (is.empty.model(mt)) glm.fit.null
else glm.fit)(x = X, y = Y,
> 2: fitted probabilities numerically 0 or 1 occurred in: (if
(is.empty.model(mt)) glm.fit.null else glm.fit)(x = X, y = Y,
> 
> Now I can get the algorithm to converge (or at least not produce the
> warning) by increasing the number of iterations, but that does not
> affect the second warning. 
Well, that's what you'd expect.  The warning says that for certain
combinations of predictors the fitted response is equal to 0 or 1.  This
also means that the maximum of the likelihood is at infinity for some
coefficients. 

This potentially causes numerical problems, at least in that R won't
report infinite coefficients.  It also causes statistical problems,
because the Wald p-values reported are not useful for very large
coefficients. 

Sometimes this happens when you try to fit too many parameters, in which
case you may be able to fix it.  It can also happen when the coefficient
in question really is large and happens by chance to give perfect
predictions. A third possibility is that the probability really is zero
(eg above the treeline you really don't have any trees), in which case you
don't want a logistic regression model.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more reasonably related threads

R help - Jul 2000 - warnings associated with logistic regression

[R] warnings associated with logistic regression

[R] warnings associated with logistic regression

[R] warnings associated with logistic regression

Maybe Matching Threads