thr3ads.net - R help - [R] Logistic regression : dicrepancies between glm and nls ? [Dec 2001]

If this information is useful, please help other people find it:
Share via:

Emmanuel Charpentier

2001-Dec-14 15:47 UTC

[R] Logistic regression : dicrepancies between glm and nls ?

Dear list,

I'm trying to learn how to use nlme to be able to fit ad analyse 
mixed-model logistic regressions. In order to keep things simple, I 
started by the simplest possible model : a one (fixed-effect ...) 
continuous variable. This problem is, of course, solved by glm, but I 
wanted to look at a  "hand-made" nls fit, in order to be able to 
"generalize" to nlme fits.

 > ## Let's start with the simplest model : one fixed-effect continuous 
variable
 >
 > ## Dummy data
 >
 > size<-500
 >
 > logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable
 >
 > alpha<-0.5
 >
 > beta<-2
 >
 > ## Simulate a response
 > ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx))
 >
 > logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/
+                                   (1+exp(alpha+beta*logdata$x))))
 >
 > ## Realty check : are the data "reasonably random" ?
 >
 > table(logdata$y)

  0   1
251 249
 >
 > by(logdata$x,logdata$y,summary)
INDICES: 0
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -9.993  -7.243  -4.594  -4.873  -2.678   1.081
------------------------------------------------------------
INDICES: 1
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 -1.526   1.791   5.008   4.663   7.235   9.983


So far, no reason to wail ...
 
 > ## another reality check : what's the "classical" logistic
regression ?
 >
 > logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit))
 >
 > ## nls should give the same estimates, up to convergence discrepancies
 >
 > logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata,
+                  start=list(a=0,b=1))
 >
 > ## let's see ...
 >
 > summary(logdata.glm)

Call:
glm(formula = y ~ x, family = binomial(link = logit), data = logdata)

Deviance Residuals:
       Min          1Q      Median          3Q         Max  
-2.4394149  -0.0308105  -0.0001991   0.0082031   2.0389572  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.9044     0.2849   3.174  0.00150 **
x             1.8675     0.2739   6.818  9.2e-12 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 693.14  on 499  degrees of freedom
Residual deviance: 100.76  on 498  degrees of freedom
AIC: 104.76

Number of Fisher Scoring iterations: 8

 >
 > summary(logdata.nls)

Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x))

Parameters:
  Estimate Std. Error t value Pr(>|t|)    
a   0.8979     0.1285    6.99 8.86e-12 ***
b   1.5806     0.1474   10.73  < 2e-16 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1

Residual standard error: 0.1811 on 498 degrees of freedom

Correlation of Parameter Estimates:
       a
b 0.5964


Hmmm ... the alpha estimators are quite close to each other, but the 
beta estimators are quite different. Furthermore, the standard errors 
are quite different.

Further simulation work showed that :
a) the alpha estimators can be much more different than they are in the 
present example ;
b) the "biases" (differences between estimators) do *not* depend of 
initial values choosen for the nls estimation ;
c) they depend on the size of the sample (the larger the sample, the 
spaller the "biases") ;
d) the standard error estimates given by glm are lrger than those given 
by nls.

Can someone explain to me why those two methods of fitting a (quite) 
simple model give so different results ?

I *think* that two methods should end um with the same estimators (at 
least asymptotically). Where an why am I wrong ?

Sincerely,

                            Emmanuel Charpentier

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian Ripley

2001-Dec-14 16:13 UTC

head link

[R] Logistic regression : dicrepancies between glm and nls ?

Your call to nls fits by least squares, whereas glm fits by maximum
likelihood.  Not the same thing: ml gives more weights to values with
fitted values near zero or one.


On Fri, 14 Dec 2001, Emmanuel Charpentier wrote:
> Dear list,
>
> I'm trying to learn how to use nlme to be able to fit ad analyse
> mixed-model logistic regressions. In order to keep things simple, I
> started by the simplest possible model : a one (fixed-effect ...)
> continuous variable. This problem is, of course, solved by glm, but I
> wanted to look at a  "hand-made" nls fit, in order to be able to
> "generalize" to nlme fits.
>
>  > ## Let's start with the simplest model : one fixed-effect
continuous
> variable
>  >
>  > ## Dummy data
>  >
>  > size<-500
>  >
>  > logdata<-data.frame(x=20*runif(size)-10) # A U([-10,10]) variable
>  >
>  > alpha<-0.5
>  >
>  > beta<-2
>  >
>  > ## Simulate a response
>  > ## y : a boolean (0|1) with probability e^(a+bx)/(1+e^(a+bx))
>  >
>  > logdata$y<-as.numeric(runif(size)<(exp(alpha+beta*logdata$x)/
> +                                   (1+exp(alpha+beta*logdata$x))))
>  >
>  > ## Realty check : are the data "reasonably random" ?
>  >
>  > table(logdata$y)
>
>   0   1
> 251 249
>  >
>  > by(logdata$x,logdata$y,summary)
> INDICES: 0
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  -9.993  -7.243  -4.594  -4.873  -2.678   1.081
> ------------------------------------------------------------
> INDICES: 1
>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>  -1.526   1.791   5.008   4.663   7.235   9.983
>
>
> So far, no reason to wail ...
>
>  > ## another reality check : what's the "classical"
logistic regression ?
>  >
>  > logdata.glm<-glm(y~x, data=logdata, family=binomial(link=logit))
>  >
>  > ## nls should give the same estimates, up to convergence
discrepancies
>  >
>  > logdata.nls<-nls(y~exp(a+b*x)/(1+exp(a+b*x)), data=logdata,
> +                  start=list(a=0,b=1))
>  >
>  > ## let's see ...
>  >
>  > summary(logdata.glm)
>
> Call:
> glm(formula = y ~ x, family = binomial(link = logit), data = logdata)
>
> Deviance Residuals:
>        Min          1Q      Median          3Q         Max
> -2.4394149  -0.0308105  -0.0001991   0.0082031   2.0389572
>
> Coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)   0.9044     0.2849   3.174  0.00150 **
> x             1.8675     0.2739   6.818  9.2e-12 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>     Null deviance: 693.14  on 499  degrees of freedom
> Residual deviance: 100.76  on 498  degrees of freedom
> AIC: 104.76
>
> Number of Fisher Scoring iterations: 8
>
>  >
>  > summary(logdata.nls)
>
> Formula: y ~ exp(a + b * x)/(1 + exp(a + b * x))
>
> Parameters:
>   Estimate Std. Error t value Pr(>|t|)
> a   0.8979     0.1285    6.99 8.86e-12 ***
> b   1.5806     0.1474   10.73  < 2e-16 ***
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
>
> Residual standard error: 0.1811 on 498 degrees of freedom
>
> Correlation of Parameter Estimates:
>        a
> b 0.5964
>
>
> Hmmm ... the alpha estimators are quite close to each other, but the
> beta estimators are quite different. Furthermore, the standard errors
> are quite different.
>
> Further simulation work showed that :
> a) the alpha estimators can be much more different than they are in the
> present example ;
> b) the "biases" (differences between estimators) do *not* depend
of
> initial values choosen for the nls estimation ;
> c) they depend on the size of the sample (the larger the sample, the
> spaller the "biases") ;
> d) the standard error estimates given by glm are lrger than those given
> by nls.
>
> Can someone explain to me why those two methods of fitting a (quite)
> simple model give so different results ?
>
> I *think* that two methods should end um with the same estimators (at
> least asymptotically). Where an why am I wrong ?
>
> Sincerely,
>
>                             Emmanuel Charpentier
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Reasonably Related Threads

Search for more apparently analagous threads

R help - Dec 2001 - Logistic regression : dicrepancies between glm and nls ?

[R] Logistic regression : dicrepancies between glm and nls ?

[R] Logistic regression : dicrepancies between glm and nls ?

Reasonably Related Threads