Mike Lawrence
2007-Mar-20 15:20 UTC
[R] How does glm(family='binomial') deal with perfect sucess?
Hi all, Trying to understand the logistic regression performed by glm (i.e. when family='binomial'), and I'm curious to know how it treats perfect success. That is, lets say I have the following summary data x=c(1,2,3,4,5,6) y=c(0,.04,.26,.76,.94,1) w=c(100,100,100,100,100,100) where x is y is the probability of success at each value of x, calculated across w observations. When I use glm my.glm.obj=glm(y~x,family='binomial',weights=w) the regression comes out fine, but if I try what I understand to be the equivalent lm procedure (i.e. fitting a straight line to the logit transformed y values): my.lm.obj=lm(qlogis(y)~x,weights=w) I get an error because, of course, logit(1) = log(1/0) = log(Inf) = Inf (similarly, logit(0) = log(0/1) = log(0) = -Inf). I'd be very interested to see how glm deals with these extremes. Cheers, Mike -- Mike Lawrence http://artsweb.uwaterloo.ca/~m4lawren "The road to wisdom? Well, it's plain and simple to express: Err and err and err again, but less and less and less." - Piet Hein
Douglas Bates
2007-Mar-20 21:48 UTC
[R] How does glm(family='binomial') deal with perfect sucess?
On 3/20/07, Mike Lawrence <M4Lawren at uwaterloo.ca> wrote:> Hi all, > > Trying to understand the logistic regression performed by glm (i.e. when > family='binomial'), and I'm curious to know how it treats perfect > success. That is, lets say I have the following summary data > > x=c(1,2,3,4,5,6) > y=c(0,.04,.26,.76,.94,1) > w=c(100,100,100,100,100,100) > > where x is y is the probability of success at each value of x, > calculated across w observations. When I use glm > > my.glm.obj=glm(y~x,family='binomial',weights=w) > > the regression comes out fine, but if I try what I understand to be the > equivalent lm procedure (i.e. fitting a straight line to the logit > transformed y values): > > my.lm.obj=lm(qlogis(y)~x,weights=w) > > I get an error because, of course, logit(1) = log(1/0) = log(Inf) = Inf > (similarly, logit(0) = log(0/1) = log(0) = -Inf).> I'd be very interested to see how glm deals with these extremes.Well, for one thing glm doesn't fit the model that way. It uses iteratively reweighted least squares in which the calculation of the next set of parameter values is performed as a weighted least squares fit on the scale of the mean, not on the scale of the linear predictor, for exactly the reason that you have encountered.