thr3ads.net - R help - [R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Ben quant

2011-Dec-01 17:54 UTC

[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

Sorry if this is a duplicate: This is a re-post because the pdf's mentioned
below did not go through.

Hello,

I'm new'ish to R, and very new to glm. I've read a lot about my
issue:
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

...including:

tolstoy.newcastle.edu.au/R/help/05/07/7759.html
r.789695.n4.nabble.com/glm-fit-quot-fitted-probabilities-numerically-0-or-1-occurred-quot-td849242.html
(note that I never found: "MASS4 pp.197-8"  However, Ted's post
was quite
helpful.)

This is a common question, sorry. Because it is a common issue I am posting
everything I know about the issue and how I think I am not falling into the
same trap at the others (but I must be due to some reason I am not yet
aware of).
>From the two links above I gather that my warning "glm.fit: fittedprobabilities numerically 0 or 1 occurred" arises from a "perfect
fit"
situation (i.e. the issue where all the high value x's (predictor
variables) are Y=1 (response=1) or the other way around). I don't feel my
data has this issue. Please point out how it does!

The list post instructions state that I can attach pdf's, so I attached
plots of my data right before I do the call to glm.

The attachments are plots of my data stored in variable l_yx (as can be
seen in the axis names):
My response (vertical axis) by row index (horizontal axis):
 plot(l_yx[,1],type='h')
My predictor variable (vertical axis) by row index index (horizontal axis):
 plot(l_yx[,2],type='h')

 So here is more info on my data frame/data (in case you can't see my pdf
attachments):> unique(l_yx[,1])
[1] 0 1> mean(l_yx[,2])
[1] 0.01123699> max(l_yx[,2])
[1] 14.66518> min(l_yx[,2])
[1] 0> attributes(l_yx)$dim
[1] 690303      2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "y" "x"


With the above data I do:>     l_logit = glm(y~x, data=as.data.frame(l_yx),family=binomial(link="logit"))
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred

Why am I getting this warning when I have data points of varying values for
y=1 and y=0?  In other words, I don't think I have the linear separation
issue discussed in one of the links I provided.

PS - Then I do this and I get a odds ratio a crazy size:>     l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds
$coefficients[2]>     l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the
coeffcients>     l_exp_coef       x
3161.781

So for one unit increase in the predictor variable I get 3160.781%
(3161.781 - 1 = 3160.781) increase in odds? That can't be correct either.
How do I correct for this issue? (I tried multiplying the predictor
variables by a constant and the odds ratio goes down, but the warning above
still persists and shouldn't the odds ratio be predictor variable size
independent?)

Thank you for your help!

Ben

	[[alternative HTML version deleted]]

peter dalgaard

2011-Dec-01 18:55 UTC

head link

[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

On Dec 1, 2011, at 18:54 , Ben quant wrote:
> Sorry if this is a duplicate: This is a re-post because the pdf's
mentioned
> below did not go through.
Still not there. Sometimes it's because your mailer doesn't label them
with the appropriate mime-type (e.g. as application/octet-stream, which is
"arbitrary binary"). Anyways, see below

[snip]> 
> With the above data I do:
>>    l_logit = glm(y~x, data=as.data.frame(l_yx),
> family=binomial(link="logit"))
> Warning message:
> glm.fit: fitted probabilities numerically 0 or 1 occurred
> 
> Why am I getting this warning when I have data points of varying values for
> y=1 and y=0?  In other words, I don't think I have the linear
separation
> issue discussed in one of the links I provided.
I bet that you do... You can get the warning without that effect (one of my own
examples is  the probability of menarche in a data set that includes infants and
old age pensioners), but not with a huge odds ratio as well. Take a look at

d <- as.data.frame(l_yx) 
with(d, y[order(x)])

if it comes out as all zeros followed by all ones or vice versa, then you have
the problem.

> 
> PS - Then I do this and I get a odds ratio a crazy size:
>>    l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds
> $coefficients[2]
>>    l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the
> coeffcients
>>    l_exp_coef
>       x
> 3161.781
> 
> So for one unit increase in the predictor variable I get 3160.781%
> (3161.781 - 1 = 3160.781) increase in odds? That can't be correct
either.
> How do I correct for this issue? (I tried multiplying the predictor
> variables by a constant and the odds ratio goes down, but the warning above
> still persists and shouldn't the odds ratio be predictor variable size
> independent?)

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Apparently Analagous Threads

Search for more reasonably related threads

R help - Dec 2011 - logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred

Apparently Analagous Threads