thr3ads.net - R help - [R] question about glm behavior [Nov 2007]

If this information is useful, please help other people find it:
Share via:

Yip

2007-Nov-13 18:58 UTC

[R] question about glm behavior

Hello,

I was trying a glm fitting (as shown below) and I got a warning and a fitted
residual deviance larger than the null deviance. Is this the expected
behavor of glm? I would expect that even though the warning might be
warranted I should not get worse fitting with an additional covariate in the
model. Could anyone tell me what I'm missing?

I get the same results in both R2.5.1 on windows and 2.5.0 on
amd64-pc-linux.

Thank you!

--Yiping
> x1 [1] -4.320069e-02 -4.574697e-02 -4.091768e-02 -4.405891e-02  1.723491e-02
 [6] -4.534468e-02 -3.514106e-02 -4.115642e-02 -4.384857e-02 -4.098890e-02
[11] -4.423853e-02 -4.320503e-02 -3.949471e-02 -4.438473e-02 -4.328487e-02
[16] -4.100160e-02 -4.238224e-02 -3.961957e-02 -4.343796e-02 -4.375786e-02
[21] -4.291601e-02 -4.363903e-02 -3.944323e-02 -4.492861e-02 -4.314135e-02
[26] -4.407560e-02 -4.262769e-02 -4.446548e-02 -4.332781e-02 -4.330175e-02
[31] -4.533415e-02  3.365331e-01 -4.394758e-02 -4.291074e-02 -4.400895e-02
[36] -4.230687e-02 -4.389724e-02  3.303411e-01 -3.907463e-02  3.147079e-01
[41] -4.174071e-02  2.152082e-01 -4.189925e-02 -4.189424e-02 -3.934256e-02
[46]  3.075004e-01 -4.372070e-02 -4.253140e-02 -3.940970e-02 -4.363639e-02
[51] -4.294994e-02 -2.925661e-02 -2.721621e-02 -4.225099e-02  2.083608e-01
[56] -4.356620e-02 -4.142498e-02 -4.303415e-02 -4.297786e-02  5.919961e-05
[61] -4.367790e-02 -4.370738e-02 -4.025630e-02 -3.830009e-02  3.117086e-01
[66]  3.290152e-01 -2.538811e-02 -4.468090e-02  2.784418e-01  3.511454e-02
[71]  2.913742e-01 -4.387962e-02 -4.268680e-02 -4.171391e-02 -4.202010e-02
[76] -4.322166e-02 -3.706578e-02 -4.390152e-02 -3.587444e-02 -4.384691e-02
[81] -1.480347e-03 -4.389834e-02 -4.185581e-02 -6.265638e-03 -4.202021e-02
[86] -1.808579e-03 -4.264013e-02> x2 [1] -0.030178811 -0.030383785 -0.024195005 -0.015303415  0.505774406
 [6] -0.023545322 -0.026341561 -0.031700423 -0.033170361 -0.016807687
[11] -0.030765392 -0.032847977 -0.022241150 -0.030362745 -0.034153581
[16] -0.034486433 -0.026723467 -0.024891104 -0.029309960 -0.028852347
[21] -0.031251711 -0.025874702 -0.025032587 -0.035013137 -0.019088676
[26] -0.021198249 -0.029941158 -0.026721874 -0.023171718 -0.032476985
[31] -0.028476595 -0.048960041 -0.029837101 -0.017454529 -0.026821415
[36] -0.027268238 -0.023836780 -0.065271644 -0.034818157 -0.032746126
[41] -0.025742516 -0.057888806 -0.029212633 -0.022687920 -0.027572481
[46]  0.027638085 -0.030762472 -0.024144720 -0.027526008 -0.025081009
[51] -0.026338116 -0.017626651  0.001067303 -0.028009695 -0.039969065
[56] -0.025602297 -0.027279180 -0.015731185 -0.023141642  0.333521248
[61] -0.025868744 -0.021978049 -0.022180398 -0.010942728  0.021911932
[66] -0.103832477 -0.017300324 -0.024064572 -0.036209915  0.602510894
[71] -0.023094473 -0.024840084 -0.035736611 -0.027606170 -0.019964185
[76] -0.029183955 -0.023701532 -0.027828418 -0.014666741 -0.024613236
[81]  0.284105237 -0.026196040 -0.029447691  0.280181037 -0.027221380
[86]  0.184637507 -0.039031585> g [1] 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 1 2
1 2
[39] 1 1 2 1 2 1 1 1 0 1 2 1 1 2 2 2 2 1 1 1 1 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2
2 1
[77] 1 2 2 2 2 1 1 2 2 1 1> f [1] 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0
1 0
[39] 1 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0
0 1
[77] 1 0 0 0 0 1 1 0 0 1 1> table(f,g)   g
f    0  1  2
  0  0  0 44
  1  2 38  3> glm(f~x1+x2+g, family=binomial(link="logit"), na.action=na.omit)
Call:  glm(formula = f ~ x1 + x2 + g, family = binomial(link =
"logit"),
na.action = na.omit) 

Coefficients:
(Intercept)           x1           x2            g  
  9.184e+15   -4.359e+15    7.889e+14   -6.190e+15  

Degrees of Freedom: 86 Total (i.e. Null);  83 Residual
Null Deviance:      120.6 
Residual Deviance: 216.3        AIC: 224.3 
Warning message:
fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y,
weights = weights, start = start, etastart = etastart,  > 
-- 
View this message in context:
http://www.nabble.com/question-about-glm-behavior-tf4799848.html#a13732230
Sent from the R help mailing list archive at Nabble.com.

Prof Brian Ripley

2007-Nov-13 20:31 UTC

head link

[R] question about glm behavior

On Tue, 13 Nov 2007, Yip wrote:
> I was trying a glm fitting (as shown below) and I got a warning and a 
> fitted residual deviance larger than the null deviance. Is this the 
> expected behavor of glm?
It can be.  One of two things is going on here:

A) There is complete separation between the classes: there is a linear 
combination of the explanatory variables than is negative on class 0 and 
positive on class 1.  So there is no (finite) MLE and the best fit has 
probability one for all cases.

B) The fit is extreme, with most cases with fitted probabilities 
indistinguishable from one, but with some that are not.

I think it is B) in your case.
> I would expect that even though the warning might be warranted I should 
> not get worse fitting with an additional covariate in the model. Could 
> anyone tell me what I'm missing?
That the fit has not converged (you were warned).  Trying trace:
> glm(f~x1+x2+g, family=binomial(link="logit"), trace=TRUE)Deviance = 33.11507 Iterations - 1
Deviance = 24.94139 Iterations - 2
Deviance = 22.43989 Iterations - 3
Deviance = 21.39709 Iterations - 4
Deviance = 20.84556 Iterations - 5
Deviance = 20.35107 Iterations - 6
Deviance = 19.58265 Iterations - 7
Deviance = 18.79010 Iterations - 8
Deviance = 18.54549 Iterations - 9
Deviance = 17.93380 Iterations - 10
Deviance = 50.95846 Iterations - 11
Deviance = 4123.945 Iterations - 12
....

so the algorithm ran into numerical difficulties after 10 steps (and was 
in any case converging slowly).  What has happened is that 40 of the 
cases have fitted probabilities so near one that they cannot be 
represented accurately.  This is inevitable in the current formulation:
> binomial()$variancefunction (mu)
mu * (1 - mu)
...

and 1-mu cannot be computed accurately, hence the warning.

There are numerically stable ways to do this calculation, but not I think 
within the glm() framework.

> I get the same results in both R2.5.1 on windows and 2.5.0 on
> amd64-pc-linux.
>
> Thank you!
>
> --Yiping
>
>> x1
> [1] -4.320069e-02 -4.574697e-02 -4.091768e-02 -4.405891e-02  1.723491e-02
> [6] -4.534468e-02 -3.514106e-02 -4.115642e-02 -4.384857e-02 -4.098890e-02
> [11] -4.423853e-02 -4.320503e-02 -3.949471e-02 -4.438473e-02 -4.328487e-02
> [16] -4.100160e-02 -4.238224e-02 -3.961957e-02 -4.343796e-02 -4.375786e-02
> [21] -4.291601e-02 -4.363903e-02 -3.944323e-02 -4.492861e-02 -4.314135e-02
> [26] -4.407560e-02 -4.262769e-02 -4.446548e-02 -4.332781e-02 -4.330175e-02
> [31] -4.533415e-02  3.365331e-01 -4.394758e-02 -4.291074e-02 -4.400895e-02
> [36] -4.230687e-02 -4.389724e-02  3.303411e-01 -3.907463e-02  3.147079e-01
> [41] -4.174071e-02  2.152082e-01 -4.189925e-02 -4.189424e-02 -3.934256e-02
> [46]  3.075004e-01 -4.372070e-02 -4.253140e-02 -3.940970e-02 -4.363639e-02
> [51] -4.294994e-02 -2.925661e-02 -2.721621e-02 -4.225099e-02  2.083608e-01
> [56] -4.356620e-02 -4.142498e-02 -4.303415e-02 -4.297786e-02  5.919961e-05
> [61] -4.367790e-02 -4.370738e-02 -4.025630e-02 -3.830009e-02  3.117086e-01
> [66]  3.290152e-01 -2.538811e-02 -4.468090e-02  2.784418e-01  3.511454e-02
> [71]  2.913742e-01 -4.387962e-02 -4.268680e-02 -4.171391e-02 -4.202010e-02
> [76] -4.322166e-02 -3.706578e-02 -4.390152e-02 -3.587444e-02 -4.384691e-02
> [81] -1.480347e-03 -4.389834e-02 -4.185581e-02 -6.265638e-03 -4.202021e-02
> [86] -1.808579e-03 -4.264013e-02
>> x2
> [1] -0.030178811 -0.030383785 -0.024195005 -0.015303415  0.505774406
> [6] -0.023545322 -0.026341561 -0.031700423 -0.033170361 -0.016807687
> [11] -0.030765392 -0.032847977 -0.022241150 -0.030362745 -0.034153581
> [16] -0.034486433 -0.026723467 -0.024891104 -0.029309960 -0.028852347
> [21] -0.031251711 -0.025874702 -0.025032587 -0.035013137 -0.019088676
> [26] -0.021198249 -0.029941158 -0.026721874 -0.023171718 -0.032476985
> [31] -0.028476595 -0.048960041 -0.029837101 -0.017454529 -0.026821415
> [36] -0.027268238 -0.023836780 -0.065271644 -0.034818157 -0.032746126
> [41] -0.025742516 -0.057888806 -0.029212633 -0.022687920 -0.027572481
> [46]  0.027638085 -0.030762472 -0.024144720 -0.027526008 -0.025081009
> [51] -0.026338116 -0.017626651  0.001067303 -0.028009695 -0.039969065
> [56] -0.025602297 -0.027279180 -0.015731185 -0.023141642  0.333521248
> [61] -0.025868744 -0.021978049 -0.022180398 -0.010942728  0.021911932
> [66] -0.103832477 -0.017300324 -0.024064572 -0.036209915  0.602510894
> [71] -0.023094473 -0.024840084 -0.035736611 -0.027606170 -0.019964185
> [76] -0.029183955 -0.023701532 -0.027828418 -0.014666741 -0.024613236
> [81]  0.284105237 -0.026196040 -0.029447691  0.280181037 -0.027221380
> [86]  0.184637507 -0.039031585
>> g
> [1] 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 1 2
> 1 2
> [39] 1 1 2 1 2 1 1 1 0 1 2 1 1 2 2 2 2 1 1 1 1 2 1 2 1 2 1 2 2 1 1 2 2 2 2
2
> 2 1
> [77] 1 2 2 2 2 1 1 2 2 1 1
>> f
> [1] 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0
> 1 0
> [39] 1 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0
0
> 0 1
> [77] 1 0 0 0 0 1 1 0 0 1 1
>> table(f,g)
>   g
> f    0  1  2
>  0  0  0 44
>  1  2 38  3
>> glm(f~x1+x2+g, family=binomial(link="logit"),
na.action=na.omit)
>
> Call:  glm(formula = f ~ x1 + x2 + g, family = binomial(link =
"logit"),
> na.action = na.omit)
>
> Coefficients:
> (Intercept)           x1           x2            g
>  9.184e+15   -4.359e+15    7.889e+14   -6.190e+15
>
> Degrees of Freedom: 86 Total (i.e. Null);  83 Residual
> Null Deviance:      120.6
> Residual Deviance: 216.3        AIC: 224.3
> Warning message:
> fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y,
> weights = weights, start = start, etastart = etastart,
>>
>
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Emmanuel Charpentier

2007-Nov-13 20:36 UTC

head link

[R] question about glm behavior

Yip a ?crit :> Hello,
> 
> I was trying a glm fitting (as shown below) and I got a warning and a
fitted
> residual deviance larger than the null deviance. Is this the expected
> behavor of glm? I would expect that even though the warning might be
> warranted I should not get worse fitting with an additional covariate in
the
> model. Could anyone tell me what I'm missing?
[ Big snip ... ]
>> table(f,g)
>    g
> f    0  1  2
>   0  0  0 44
>   1  2 38  3
Ah, ah : please note that (f==1)==(g<2)
>> glm(f~x1+x2+g, family=binomial(link="logit"),
na.action=na.omit)
> 
> Call:  glm(formula = f ~ x1 + x2 + g, family = binomial(link =
"logit"),
> na.action = na.omit) 
> 
> Coefficients:
> (Intercept)           x1           x2            g  
>   9.184e+15   -4.359e+15    7.889e+14   -6.190e+15  
> 
> Degrees of Freedom: 86 Total (i.e. Null);  83 Residual
> Null Deviance:      120.6 
> Residual Deviance: 216.3        AIC: 224.3 
> Warning message:
> fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y,
> weights = weights, start = start, etastart = etastart,  
This is discussed in MASS, chap. 7, "Generalized" Linear models"
in a
section named "Problems with binomial GLMs", pp 197-9 an in the
literature cited herein. I won't insult Venables and Ripley's excellent
writing by trying to paraphrase them.

However, I note that glm tells you that he fitted probabilities to 0 or
1, i. e. infinite odd-ratios. Is that representable in a computer ?

Furthermore, the extremely large absolute values of the coefficients
should also ring an alarm bell...

Did you try this ? :

glm(f~x1+x2, subset=(g==2), family=binomial(link=logit))

					Emmanuel Charpentier

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Nov 2007 - question about glm behavior

[R] question about glm behavior

[R] question about glm behavior

[R] question about glm behavior

Seemingly Similar Threads