Emmanuel Charpentier
2001-Dec-14 15:47 UTC
[R] Logistic regression : dicrepancies between glm and nls ?
Dear list, I'm trying to learn how to use nlme to be able to fit ad analyse mixed-model logistic regressions. In order to keep things simple, I started by the simplest possible model : a one (fixed-effect ...) continuous variable. This problem is, of course, solved by glm, but I wanted to look at a "hand-made" nls fit, in order to be able to "generalize" to nlme fits. > ## Let's start with the simplest model : one fixed-effect continuous variable > > ## Dummy data > > size<-500 > > logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable > > alpha<-0.5 > > beta<-2 > > ## Simulate a response > ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx)) > > logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/ + (1+exp(alpha+beta*logdata$x)))) > > ## Realty check : are the data "reasonably random" ? > > table(logdata$y) 0 1 251 249 > > by(logdata$x,logdata$y,summary) INDICES: 0 Min. 1st Qu. Median Mean 3rd Qu. Max. -9.993 -7.243 -4.594 -4.873 -2.678 1.081 ------------------------------------------------------------ INDICES: 1 Min. 1st Qu. Median Mean 3rd Qu. Max. -1.526 1.791 5.008 4.663 7.235 9.983 So far, no reason to wail ... > ## another reality check : what's the "classical" logistic regression ? > > logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit)) > > ## nls should give the same estimates, up to convergence discrepancies > > logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata, + start=list(a=0,b=1)) > > ## let's see ... > > summary(logdata.glm) Call: glm(formula = y ~ x, family = binomial(link = logit), data = logdata) Deviance Residuals: Min 1Q Median 3Q Max -2.4394149 -0.0308105 -0.0001991 0.0082031 2.0389572 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.9044 0.2849 3.174 0.00150 ** x 1.8675 0.2739 6.818 9.2e-12 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 693.14 on 499 degrees of freedom Residual deviance: 100.76 on 498 degrees of freedom AIC: 104.76 Number of Fisher Scoring iterations: 8 > > summary(logdata.nls) Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x)) Parameters: Estimate Std. Error t value Pr(>|t|) a 0.8979 0.1285 6.99 8.86e-12 *** b 1.5806 0.1474 10.73 < 2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.1811 on 498 degrees of freedom Correlation of Parameter Estimates: a b 0.5964 Hmmm ... the alpha estimators are quite close to each other, but the beta estimators are quite different. Furthermore, the standard errors are quite different. Further simulation work showed that : a) the alpha estimators can be much more different than they are in the present example ; b) the "biases" (differences between estimators) do *not* depend of initial values choosen for the nls estimation ; c) they depend on the size of the sample (the larger the sample, the spaller the "biases") ; d) the standard error estimates given by glm are lrger than those given by nls. Can someone explain to me why those two methods of fitting a (quite) simple model give so different results ? I *think* that two methods should end um with the same estimators (at least asymptotically). Where an why am I wrong ? Sincerely, Emmanuel Charpentier -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Prof Brian Ripley
2001-Dec-14 16:13 UTC
[R] Logistic regression : dicrepancies between glm and nls ?
Your call to nls fits by least squares, whereas glm fits by maximum likelihood. Not the same thing: ml gives more weights to values with fitted values near zero or one. On Fri, 14 Dec 2001, Emmanuel Charpentier wrote:> Dear list, > > I'm trying to learn how to use nlme to be able to fit ad analyse > mixed-model logistic regressions. In order to keep things simple, I > started by the simplest possible model : a one (fixed-effect ...) > continuous variable. This problem is, of course, solved by glm, but I > wanted to look at a "hand-made" nls fit, in order to be able to > "generalize" to nlme fits. > > > ## Let's start with the simplest model : one fixed-effect continuous > variable > > > > ## Dummy data > > > > size<-500 > > > > logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable > > > > alpha<-0.5 > > > > beta<-2 > > > > ## Simulate a response > > ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx)) > > > > logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/ > + (1+exp(alpha+beta*logdata$x)))) > > > > ## Realty check : are the data "reasonably random" ? > > > > table(logdata$y) > > 0 1 > 251 249 > > > > by(logdata$x,logdata$y,summary) > INDICES: 0 > Min. 1st Qu. Median Mean 3rd Qu. Max. > -9.993 -7.243 -4.594 -4.873 -2.678 1.081 > ------------------------------------------------------------ > INDICES: 1 > Min. 1st Qu. Median Mean 3rd Qu. Max. > -1.526 1.791 5.008 4.663 7.235 9.983 > > > So far, no reason to wail ... > > > ## another reality check : what's the "classical" logistic regression ? > > > > logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit)) > > > > ## nls should give the same estimates, up to convergence discrepancies > > > > logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata, > + start=list(a=0,b=1)) > > > > ## let's see ... > > > > summary(logdata.glm) > > Call: > glm(formula = y ~ x, family = binomial(link = logit), data = logdata) > > Deviance Residuals: > Min 1Q Median 3Q Max > -2.4394149 -0.0308105 -0.0001991 0.0082031 2.0389572 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > (Intercept) 0.9044 0.2849 3.174 0.00150 ** > x 1.8675 0.2739 6.818 9.2e-12 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > (Dispersion parameter for binomial family taken to be 1) > > Null deviance: 693.14 on 499 degrees of freedom > Residual deviance: 100.76 on 498 degrees of freedom > AIC: 104.76 > > Number of Fisher Scoring iterations: 8 > > > > > summary(logdata.nls) > > Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x)) > > Parameters: > Estimate Std. Error t value Pr(>|t|) > a 0.8979 0.1285 6.99 8.86e-12 *** > b 1.5806 0.1474 10.73 < 2e-16 *** > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > Residual standard error: 0.1811 on 498 degrees of freedom > > Correlation of Parameter Estimates: > a > b 0.5964 > > > Hmmm ... the alpha estimators are quite close to each other, but the > beta estimators are quite different. Furthermore, the standard errors > are quite different. > > Further simulation work showed that : > a) the alpha estimators can be much more different than they are in the > present example ; > b) the "biases" (differences between estimators) do *not* depend of > initial values choosen for the nls estimation ; > c) they depend on the size of the sample (the larger the sample, the > spaller the "biases") ; > d) the standard error estimates given by glm are lrger than those given > by nls. > > Can someone explain to me why those two methods of fitting a (quite) > simple model give so different results ? > > I *think* that two methods should end um with the same estimators (at > least asymptotically). Where an why am I wrong ? > > Sincerely, > > Emmanuel Charpentier > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._