You can fit a linear probability model with glm and a bit of arm twisting. First, make your own copy of the binomial function: > dump('binomial', file='mybinom.R') Edit it to change the function name to "mybinom" (or anything else you like), and to add 'identity' to the list of okLinks. Source the file back in, and use mybiom('identity') to fit the model. Caveat Emptor: This will work just fine if all of your data is far enough away from 0 or 1. But if the predicted value for any data point, at any time during the iteration, is <=0 or >=1 then the calculation of the log-likelihood will involve an NA and the glm routine will fail. NAs produced deep inside a computation tend to produce unhelpful and/or misleading error messages (sometimes the NA can propogate for some ways through the code before creating a failure). You can also get the counterintuitive result that models with few or poor covariates work (all the predictions are near to the mean), but more useful covariates cause the model to fail. Linear links for both the binomial and Poisson are a challenging computational problem. But they are becoming important in medical work due to recent appreciation that the absolute risk attached to a variable is often more relevant than the relative risk (odds ratio or risk ratio). Terry Therneau
On Fri, 16 Nov 2007, Terry Therneau wrote:> You can fit a linear probability model with glm and a bit of arm twisting. > First, make your own copy of the binomial function: > > dump('binomial', file='mybinom.R') > > Edit it to change the function name to "mybinom" (or anything else you > like), and to add 'identity' to the list of okLinks.Hmm ... I think you are generalizing from another R-like system. binomial("identity") works out of the box, and R's glm() will backtrack if it encounters a fitted value < 0 or > 1. Now, the back-tracking can get stuck but it often does a reasonable job. Examples: set.seed(1) x <- seq(0, 1, length=10) y <- rbinom(10, 10, 0.1 + 0.8*x) glm(cbind(y, 10-y) ~ x, binomial("identity"), start=c(0.5,0)) works. But variants, e.g. y <- rbinom(10, 10, 0.1 + 0.9*x) backtrack and give warnings. What does not work is binomial(identity), unlike binomial(logit).> Source the file back in, and use mybiom('identity') to fit the model. > > Caveat Emptor: This will work just fine if all of your data is far enough away > from 0 or 1. But if the predicted value for any data point, at any time during > the iteration, is <=0 or >=1 then the calculation of the log-likelihood will > involve an NA and the glm routine will fail. NAs produced deep inside a > computation tend to produce unhelpful and/or misleading error messages > (sometimes the NA can propogate for some ways through the code before creating a > failure). You can also get the counterintuitive result that models with few or > poor covariates work (all the predictions are near to the mean), but more useful > covariates cause the model to fail. > > Linear links for both the binomial and Poisson are a challenging computational > problem. But they are becoming important in medical work due to recent > appreciation that the absolute risk attached to a variable is often more > relevant than the relative risk (odds ratio or risk ratio). > > Terry Therneau > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
>From: Prof Brian Ripley <ripley at stats.ox.ac.uk> >Date: 2007/11/16 Fri AM 09:28:27 CST >To: Terry Therneau <therneau at mayo.edu> >Cc: markleeds at verizon.net, r-help at r-project.org >Subject: Re: [R] alternative to logistic regressionThanks to both of you, Terry and Brian for your comments. I'm not sure what I am going to do yet because I don't have enough data yet to explore/ confirm my linear hypothesis but your comments will help if I go that route. I just had one other question since I have you both thinking about GLM's at the moment : Suppose one is doing logistic or more generally multinomial regression with one predictor. The predictor is quantitative in the range of [-1,1] but, if I scale it, then the range becomes whatever it becomes. But, there's also the possibility of making the predictor a factor say by deciling it and then say letting the deciles be the factors. My question is whether would one expect roughly the same probability forecasts from two models, one using the numerical predictor and one using the factors ? I imagine that it shouldn't matter so much but I have ZERO experience in logistic regression and I'm not confident with my current intuition. Thanks so much for talking about my problem and I really appreciate your insights. Mark>On Fri, 16 Nov 2007, Terry Therneau wrote: > >> You can fit a linear probability model with glm and a bit of arm twisting. >> First, make your own copy of the binomial function: >> > dump('binomial', file='mybinom.R') >> >> Edit it to change the function name to "mybinom" (or anything else you >> like), and to add 'identity' to the list of okLinks. > >Hmm ... I think you are generalizing from another R-like system. > >binomial("identity") works out of the box, and R's glm() will >backtrack if it encounters a fitted value < 0 or > 1. Now, the >back-tracking can get stuck but it often does a reasonable job. > >Examples: > >set.seed(1) >x <- seq(0, 1, length=10) >y <- rbinom(10, 10, 0.1 + 0.8*x) >glm(cbind(y, 10-y) ~ x, binomial("identity"), start=c(0.5,0)) > >works. But variants, e.g. > >y <- rbinom(10, 10, 0.1 + 0.9*x) > >backtrack and give warnings. > >What does not work is binomial(identity), unlike binomial(logit). > >> Source the file back in, and use mybiom('identity') to fit the model. >> >> Caveat Emptor: This will work just fine if all of your data is far enough away >> from 0 or 1. But if the predicted value for any data point, at any time during >> the iteration, is <=0 or >=1 then the calculation of the log-likelihood will >> involve an NA and the glm routine will fail. NAs produced deep inside a >> computation tend to produce unhelpful and/or misleading error messages >> (sometimes the NA can propogate for some ways through the code before creating a >> failure). You can also get the counterintuitive result that models with few or >> poor covariates work (all the predictions are near to the mean), but more useful >> covariates cause the model to fail. >> >> Linear links for both the binomial and Poisson are a challenging computational >> problem. But they are becoming important in medical work due to recent >> appreciation that the absolute risk attached to a variable is often more >> relevant than the relative risk (odds ratio or risk ratio). >> >> Terry Therneau >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > >-- >Brian D. Ripley, ripley at stats.ox.ac.uk >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272866 (PA) >Oxford OX1 3TG, UK Fax: +44 1865 272595
Brian Ripley wrote "Hmm ... I think you are generalizing from another R-like system." Actually no, I was reading R source code that has no comments, and assumed incorrectly that 1. the variable "okLinks" was the list of allowed links and that 2. the error message further in the code "... available links are 'logit', 'probit', 'cloglog', 'cauchit', 'log' was accurate. Nevertheless, I should have tested my suggestion more thoroughly before I posted it. The step halving that I now see in glm.fit is a welcome addition, but does not completely abrogate the fitting issues that can arise. Terry T.
>From: Prof Brian Ripley <ripley at stats.ox.ac.uk> >Date: 2007/11/16 Fri AM 09:44:59 CST >To: markleeds at verizon.net >Cc: Terry Therneau <therneau at mayo.edu>, r-help at r-project.org >Subject: Re: Re: [R] alternative to logistic regressionThanks Brian: I'll look at the MASS book example for sure but I don't think I was so clear in my last question so let me explain again. What I meant to say was : Suppose Person A and Person B both have the same raw data which is categorical response ( say 3 responses ) and 1 numeric predictor. Now, suppose person A fits a logit regression with the logit link and family = binomal so that it's an S curve in the probability space and the the predictor was numeric so the x axis was numeric. suppose person B fits a logit regression with the logit link and family = binomal so that it's an S curve in the probability space and the the predictor was a factor so the x axis was say deciles. They both then predict off of their respective models given a new value of the predictor ( Person A's predictor is in the form of a number and Person B's predictor is say a decile where the number fell in. Would their forecast of the probability given that predictor be roughly the same ? I'm sorry to be a pest but I'm not clear on that. Thanks and I'm sorry to bother you so much.>On Fri, 16 Nov 2007, markleeds at verizon.net wrote: > >>> From: Prof Brian Ripley <ripley at stats.ox.ac.uk> >>> Date: 2007/11/16 Fri AM 09:28:27 CST >>> To: Terry Therneau <therneau at mayo.edu> >>> Cc: markleeds at verizon.net, r-help at r-project.org >>> Subject: Re: [R] alternative to logistic regression >> >> Thanks to both of you, Terry and Brian for your comments. I'm not sure what I am going to do yet because I don't have enough data yet to explore/ >> confirm my linear hypothesis but your comments >> will help if I go that route. >> >> I just had one other question since I have you both thinking about GLM's at the moment : Suppose one >> is doing logistic or more generally multinomial regression with one predictor. The predictor is quantitative >> in the range of [-1,1] but, if I scale it, then >> the range becomes whatever it becomes. >> >> But, there's also the possibility of making the predictor a factor say >> by deciling it and then say letting the deciles be the factors. >> >> My question is whether would one expect roughly the same probability >> forecasts from two models, one using the numerical predictor and one >> using the factors ? I imagine that it shouldn't matter so much but I >> have ZERO experience in logistic regression and I'm not confident with >> my current intuition. Thanks so much for talking about my problem and I >> really appreciate your insights. > >It's just as in linear regression. If there really is a linear >relationship the predictions will be the same. But it is quadratic, they >will be very different. Discreting a numeric explanatory variable is a >common way to look for non-linearity (as in the 'cpus' example studied in >MASS). > > >-- >Brian D. Ripley, ripley at stats.ox.ac.uk >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >University of Oxford, Tel: +44 1865 272861 (self) >1 South Parks Road, +44 1865 272866 (PA) >Oxford OX1 3TG, UK Fax: +44 1865 272595