thr3ads.net - R help - [R] glm and lrm disagree with zero table cells [Oct 2002]

If this information is useful, please help other people find it:
Share via:

Eric Rescorla

2002-Oct-24 15:52 UTC

[R] glm and lrm disagree with zero table cells

I've noticed that glm and lrm give extremely different results if you
attempt to fit a saturated model to a dataset with zero cells. Consider,
for instance the data from, Agresti's Death Penalty example [0].

The crosstab table is:

, , PENALTY = NO

       VIC
DEF     BLACK WHITE
  BLACK    97    52
  WHITE     9   132

, , PENALTY = YES

       VIC
DEF     BLACK WHITE
  BLACK     6    11
  WHITE     0    19


Regression with an unsaturated model produces essentially
the same fit parameters with both glm and lrm. However, 
if we try to fit a saturated model....

FITTING WITH GLM:> summary(glm(PENALTY~DEF*VIC,binomial))
Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6195  -0.5186  -0.5186  -0.3465   2.3845  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -2.7830     0.4207  -6.615 3.71e-11 ***
DEFWHITE           -4.7823     8.8981  -0.537   0.5910    
VICWHITE            1.2296     0.5358   2.295   0.0217 *  
DEFWHITE:VICWHITE   4.3973     8.9076   0.494   0.6216    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 226.51  on 325  degrees of freedom
Residual deviance: 218.39  on 322  degrees of freedom
AIC: 226.39

Number of Fisher Scoring iterations: 6


FITTING WITH LRM:> lrm(PENALTY~DEF*VIC)
Logistic Regression Model

lrm(formula = PENALTY ~ DEF * VIC)


Frequencies of Responses
 NO YES 
290  36 

       Obs  Max Deriv Model L.R.       d.f.          P          C        Dxy 
       326      0.002       8.13          3     0.0435      0.624      0.248 
     Gamma      Tau-a         R2      Brier 
     0.383      0.049      0.049      0.096 

                      Coef   S.E.    Wald Z P     
Intercept             -2.783  0.4207 -6.62  0.0000
DEF=WHITE             -5.490 20.8691 -0.26  0.7925
VIC=WHITE              1.230  0.5358  2.29  0.0217
DEF=WHITE * VIC=WHITE  5.105 20.8732  0.24  0.8068



If we fill in the remaining table cell with a dummy value, [1]
however, then glm and lrm produce essentially the same result.
Here's the glm result.
> summary(glm(PENALTY~DEF*VIC,binomial))
Call:
glm(formula = PENALTY ~ DEF * VIC, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.6195  -0.5186  -0.5186  -0.3465   2.3845  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        -2.7829     0.4192  -6.639 3.15e-11 ***
DEFWHITE            0.5857     1.1343   0.516   0.6056    
VICWHITE            1.2296     0.5346   2.300   0.0215 *  
DEFWHITE:VICWHITE  -0.9707     1.2070  -0.804   0.4213    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` '
1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 230.90  on 326  degrees of freedom
Residual deviance: 224.88  on 323  degrees of freedom
AIC: 232.88

Number of Fisher Scoring iterations: 4


So, my question here is: is this normal behavior? If it
is, perhaps someone could speculate on why the results are
different.

Thanks,
-Ekr


[0] Agresti, A., "Categorical Data Analysis", Wiley 1990.
The data set can be found at http://www.rtfm.com/death.txt

[1] http://www.rtfm.com/death-filled-in.txt





-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Peter Dalgaard BSA

2002-Oct-24 17:54 UTC

head link

[R] glm and lrm disagree with zero table cells

Eric Rescorla <ekr at rtfm.com> writes:
> I've noticed that glm and lrm give extremely different results if you
> attempt to fit a saturated model to a dataset with zero cells. Consider,
...> Coefficients:
>                   Estimate Std. Error z value Pr(>|z|)    
> (Intercept)        -2.7830     0.4207  -6.615 3.71e-11 ***
> DEFWHITE           -4.7823     8.8981  -0.537   0.5910    
> VICWHITE            1.2296     0.5358   2.295   0.0217 *  
> DEFWHITE:VICWHITE   4.3973     8.9076   0.494   0.6216    
> ---
> Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 226.51  on 325  degrees of freedom
> Residual deviance: 218.39  on 322  degrees of freedom
...>        Obs  Max Deriv Model L.R.       d.f.          P          C       
Dxy
>        326      0.002       8.13          3     0.0435      0.624     
0.248
>      Gamma      Tau-a         R2      Brier 
>      0.383      0.049      0.049      0.096 
> 
>                       Coef   S.E.    Wald Z P     
> Intercept             -2.783  0.4207 -6.62  0.0000
> DEF=WHITE             -5.490 20.8691 -0.26  0.7925
> VIC=WHITE              1.230  0.5358  2.29  0.0217
> DEF=WHITE * VIC=WHITE  5.105 20.8732  0.24  0.8068
These are *not* extremely different! In fact, they are essentially
equivalent. The maximum likelihood estimates of the 2nd and 4th
coefficients are theoretically infinite, and it is only a matter of
when the two routines decide to stop the iterations. The 1st and 3rd
coefficients are the same, so are their SEs, and also the sum of the
2nd and 4th is -.39 in both cases. The likelihood ratio in the first
model is 226.51-218.39 = 8.12 and in the second it is given as 8.13.

[And where did lrm() come from? It's not in the standard packages, and
I'm not going to wade through 150+ CRAN packages to locate it...] 

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

2002-Oct-24 20:09 UTC

head link

[R] glm and lrm disagree with zero table cells

On Thu, 24 Oct 2002, Eric Rescorla wrote:
> I've noticed that glm and lrm give extremely different results if you
> attempt to fit a saturated model to a dataset with zero cells. Consider,
> for instance the data from, Agresti's Death Penalty example [0].
The MLEs are infinite, so there are bound to be problems.  Both methods
will stop short of infinity, when the coefficients or likelihood stop
changing, but where they stop will depend on the exact tolerance and
stopping rule.

They aren't actually that different in terms of fitted values.  For
example, the zero cell has a fitted probability of 0.00025 and 0.0005 in
the two models.


	-thomas



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Possibly Parallel Threads

Search for more maybe matching threads

R help - Oct 2002 - glm and lrm disagree with zero table cells

[R] glm and lrm disagree with zero table cells

[R] glm and lrm disagree with zero table cells

[R] glm and lrm disagree with zero table cells

Possibly Parallel Threads