Ben quant
2011-Dec-01 17:54 UTC
[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
Sorry if this is a duplicate: This is a re-post because the pdf's mentioned below did not go through. Hello, I'm new'ish to R, and very new to glm. I've read a lot about my issue: Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred ...including: http://tolstoy.newcastle.edu.au/R/help/05/07/7759.html http://r.789695.n4.nabble.com/glm-fit-quot-fitted-probabilities-numerically-0-or-1-occurred-quot-td849242.html (note that I never found: "MASS4 pp.197-8" However, Ted's post was quite helpful.) This is a common question, sorry. Because it is a common issue I am posting everything I know about the issue and how I think I am not falling into the same trap at the others (but I must be due to some reason I am not yet aware of).>From the two links above I gather that my warning "glm.fit: fittedprobabilities numerically 0 or 1 occurred" arises from a "perfect fit" situation (i.e. the issue where all the high value x's (predictor variables) are Y=1 (response=1) or the other way around). I don't feel my data has this issue. Please point out how it does! The list post instructions state that I can attach pdf's, so I attached plots of my data right before I do the call to glm. The attachments are plots of my data stored in variable l_yx (as can be seen in the axis names): My response (vertical axis) by row index (horizontal axis): plot(l_yx[,1],type='h') My predictor variable (vertical axis) by row index index (horizontal axis): plot(l_yx[,2],type='h') So here is more info on my data frame/data (in case you can't see my pdf attachments):> unique(l_yx[,1])[1] 0 1> mean(l_yx[,2])[1] 0.01123699> max(l_yx[,2])[1] 14.66518> min(l_yx[,2])[1] 0> attributes(l_yx)$dim [1] 690303 2 $dimnames $dimnames[[1]] NULL $dimnames[[2]] [1] "y" "x" With the above data I do:> l_logit = glm(y~x, data=as.data.frame(l_yx),family=binomial(link="logit")) Warning message: glm.fit: fitted probabilities numerically 0 or 1 occurred Why am I getting this warning when I have data points of varying values for y=1 and y=0? In other words, I don't think I have the linear separation issue discussed in one of the links I provided. PS - Then I do this and I get a odds ratio a crazy size:> l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds$coefficients[2]> l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate thecoeffcients> l_exp_coefx 3161.781 So for one unit increase in the predictor variable I get 3160.781% (3161.781 - 1 = 3160.781) increase in odds? That can't be correct either. How do I correct for this issue? (I tried multiplying the predictor variables by a constant and the odds ratio goes down, but the warning above still persists and shouldn't the odds ratio be predictor variable size independent?) Thank you for your help! Ben [[alternative HTML version deleted]]
peter dalgaard
2011-Dec-01 18:55 UTC
[R] logistic regression - glm.fit: fitted probabilities numerically 0 or 1 occurred
On Dec 1, 2011, at 18:54 , Ben quant wrote:> Sorry if this is a duplicate: This is a re-post because the pdf's mentioned > below did not go through.Still not there. Sometimes it's because your mailer doesn't label them with the appropriate mime-type (e.g. as application/octet-stream, which is "arbitrary binary"). Anyways, see below [snip]> > With the above data I do: >> l_logit = glm(y~x, data=as.data.frame(l_yx), > family=binomial(link="logit")) > Warning message: > glm.fit: fitted probabilities numerically 0 or 1 occurred > > Why am I getting this warning when I have data points of varying values for > y=1 and y=0? In other words, I don't think I have the linear separation > issue discussed in one of the links I provided.I bet that you do... You can get the warning without that effect (one of my own examples is the probability of menarche in a data set that includes infants and old age pensioners), but not with a huge odds ratio as well. Take a look at d <- as.data.frame(l_yx) with(d, y[order(x)]) if it comes out as all zeros followed by all ones or vice versa, then you have the problem.> > PS - Then I do this and I get a odds ratio a crazy size: >> l_sm = summary(l_logit) # coef pval is $coefficients[8], log odds > $coefficients[2] >> l_exp_coef = exp(l_logit$coefficients)[2] # exponentiate the > coeffcients >> l_exp_coef > x > 3161.781 > > So for one unit increase in the predictor variable I get 3160.781% > (3161.781 - 1 = 3160.781) increase in odds? That can't be correct either. > How do I correct for this issue? (I tried multiplying the predictor > variables by a constant and the odds ratio goes down, but the warning above > still persists and shouldn't the odds ratio be predictor variable size > independent?)-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com