Hello, I was trying a glm fitting (as shown below) and I got a warning and a fitted residual deviance larger than the null deviance. Is this the expected behavor of glm? I would expect that even though the warning might be warranted I should not get worse fitting with an additional covariate in the model. Could anyone tell me what I'm missing? I get the same results in both R2.5.1 on windows and 2.5.0 on amd64-pc-linux. Thank you! --Yiping> x1[1] -4.320069e-02 -4.574697e-02 -4.091768e-02 -4.405891e-02 1.723491e-02 [6] -4.534468e-02 -3.514106e-02 -4.115642e-02 -4.384857e-02 -4.098890e-02 [11] -4.423853e-02 -4.320503e-02 -3.949471e-02 -4.438473e-02 -4.328487e-02 [16] -4.100160e-02 -4.238224e-02 -3.961957e-02 -4.343796e-02 -4.375786e-02 [21] -4.291601e-02 -4.363903e-02 -3.944323e-02 -4.492861e-02 -4.314135e-02 [26] -4.407560e-02 -4.262769e-02 -4.446548e-02 -4.332781e-02 -4.330175e-02 [31] -4.533415e-02 3.365331e-01 -4.394758e-02 -4.291074e-02 -4.400895e-02 [36] -4.230687e-02 -4.389724e-02 3.303411e-01 -3.907463e-02 3.147079e-01 [41] -4.174071e-02 2.152082e-01 -4.189925e-02 -4.189424e-02 -3.934256e-02 [46] 3.075004e-01 -4.372070e-02 -4.253140e-02 -3.940970e-02 -4.363639e-02 [51] -4.294994e-02 -2.925661e-02 -2.721621e-02 -4.225099e-02 2.083608e-01 [56] -4.356620e-02 -4.142498e-02 -4.303415e-02 -4.297786e-02 5.919961e-05 [61] -4.367790e-02 -4.370738e-02 -4.025630e-02 -3.830009e-02 3.117086e-01 [66] 3.290152e-01 -2.538811e-02 -4.468090e-02 2.784418e-01 3.511454e-02 [71] 2.913742e-01 -4.387962e-02 -4.268680e-02 -4.171391e-02 -4.202010e-02 [76] -4.322166e-02 -3.706578e-02 -4.390152e-02 -3.587444e-02 -4.384691e-02 [81] -1.480347e-03 -4.389834e-02 -4.185581e-02 -6.265638e-03 -4.202021e-02 [86] -1.808579e-03 -4.264013e-02> x2[1] -0.030178811 -0.030383785 -0.024195005 -0.015303415 0.505774406 [6] -0.023545322 -0.026341561 -0.031700423 -0.033170361 -0.016807687 [11] -0.030765392 -0.032847977 -0.022241150 -0.030362745 -0.034153581 [16] -0.034486433 -0.026723467 -0.024891104 -0.029309960 -0.028852347 [21] -0.031251711 -0.025874702 -0.025032587 -0.035013137 -0.019088676 [26] -0.021198249 -0.029941158 -0.026721874 -0.023171718 -0.032476985 [31] -0.028476595 -0.048960041 -0.029837101 -0.017454529 -0.026821415 [36] -0.027268238 -0.023836780 -0.065271644 -0.034818157 -0.032746126 [41] -0.025742516 -0.057888806 -0.029212633 -0.022687920 -0.027572481 [46] 0.027638085 -0.030762472 -0.024144720 -0.027526008 -0.025081009 [51] -0.026338116 -0.017626651 0.001067303 -0.028009695 -0.039969065 [56] -0.025602297 -0.027279180 -0.015731185 -0.023141642 0.333521248 [61] -0.025868744 -0.021978049 -0.022180398 -0.010942728 0.021911932 [66] -0.103832477 -0.017300324 -0.024064572 -0.036209915 0.602510894 [71] -0.023094473 -0.024840084 -0.035736611 -0.027606170 -0.019964185 [76] -0.029183955 -0.023701532 -0.027828418 -0.014666741 -0.024613236 [81] 0.284105237 -0.026196040 -0.029447691 0.280181037 -0.027221380 [86] 0.184637507 -0.039031585> g[1] 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 1 2 1 2 [39] 1 1 2 1 2 1 1 1 0 1 2 1 1 2 2 2 2 1 1 1 1 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 2 1 [77] 1 2 2 2 2 1 1 2 2 1 1> f[1] 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 1 0 [39] 1 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 0 1 [77] 1 0 0 0 0 1 1 0 0 1 1> table(f,g)g f 0 1 2 0 0 0 44 1 2 38 3> glm(f~x1+x2+g, family=binomial(link="logit"), na.action=na.omit)Call: glm(formula = f ~ x1 + x2 + g, family = binomial(link = "logit"), na.action = na.omit) Coefficients: (Intercept) x1 x2 g 9.184e+15 -4.359e+15 7.889e+14 -6.190e+15 Degrees of Freedom: 86 Total (i.e. Null); 83 Residual Null Deviance: 120.6 Residual Deviance: 216.3 AIC: 224.3 Warning message: fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, weights = weights, start = start, etastart = etastart,>-- View this message in context: http://www.nabble.com/question-about-glm-behavior-tf4799848.html#a13732230 Sent from the R help mailing list archive at Nabble.com.
On Tue, 13 Nov 2007, Yip wrote:> I was trying a glm fitting (as shown below) and I got a warning and a > fitted residual deviance larger than the null deviance. Is this the > expected behavor of glm?It can be. One of two things is going on here: A) There is complete separation between the classes: there is a linear combination of the explanatory variables than is negative on class 0 and positive on class 1. So there is no (finite) MLE and the best fit has probability one for all cases. B) The fit is extreme, with most cases with fitted probabilities indistinguishable from one, but with some that are not. I think it is B) in your case.> I would expect that even though the warning might be warranted I should > not get worse fitting with an additional covariate in the model. Could > anyone tell me what I'm missing?That the fit has not converged (you were warned). Trying trace:> glm(f~x1+x2+g, family=binomial(link="logit"), trace=TRUE)Deviance = 33.11507 Iterations - 1 Deviance = 24.94139 Iterations - 2 Deviance = 22.43989 Iterations - 3 Deviance = 21.39709 Iterations - 4 Deviance = 20.84556 Iterations - 5 Deviance = 20.35107 Iterations - 6 Deviance = 19.58265 Iterations - 7 Deviance = 18.79010 Iterations - 8 Deviance = 18.54549 Iterations - 9 Deviance = 17.93380 Iterations - 10 Deviance = 50.95846 Iterations - 11 Deviance = 4123.945 Iterations - 12 .... so the algorithm ran into numerical difficulties after 10 steps (and was in any case converging slowly). What has happened is that 40 of the cases have fitted probabilities so near one that they cannot be represented accurately. This is inevitable in the current formulation:> binomial()$variancefunction (mu) mu * (1 - mu) ... and 1-mu cannot be computed accurately, hence the warning. There are numerically stable ways to do this calculation, but not I think within the glm() framework.> I get the same results in both R2.5.1 on windows and 2.5.0 on > amd64-pc-linux. > > Thank you! > > --Yiping > >> x1 > [1] -4.320069e-02 -4.574697e-02 -4.091768e-02 -4.405891e-02 1.723491e-02 > [6] -4.534468e-02 -3.514106e-02 -4.115642e-02 -4.384857e-02 -4.098890e-02 > [11] -4.423853e-02 -4.320503e-02 -3.949471e-02 -4.438473e-02 -4.328487e-02 > [16] -4.100160e-02 -4.238224e-02 -3.961957e-02 -4.343796e-02 -4.375786e-02 > [21] -4.291601e-02 -4.363903e-02 -3.944323e-02 -4.492861e-02 -4.314135e-02 > [26] -4.407560e-02 -4.262769e-02 -4.446548e-02 -4.332781e-02 -4.330175e-02 > [31] -4.533415e-02 3.365331e-01 -4.394758e-02 -4.291074e-02 -4.400895e-02 > [36] -4.230687e-02 -4.389724e-02 3.303411e-01 -3.907463e-02 3.147079e-01 > [41] -4.174071e-02 2.152082e-01 -4.189925e-02 -4.189424e-02 -3.934256e-02 > [46] 3.075004e-01 -4.372070e-02 -4.253140e-02 -3.940970e-02 -4.363639e-02 > [51] -4.294994e-02 -2.925661e-02 -2.721621e-02 -4.225099e-02 2.083608e-01 > [56] -4.356620e-02 -4.142498e-02 -4.303415e-02 -4.297786e-02 5.919961e-05 > [61] -4.367790e-02 -4.370738e-02 -4.025630e-02 -3.830009e-02 3.117086e-01 > [66] 3.290152e-01 -2.538811e-02 -4.468090e-02 2.784418e-01 3.511454e-02 > [71] 2.913742e-01 -4.387962e-02 -4.268680e-02 -4.171391e-02 -4.202010e-02 > [76] -4.322166e-02 -3.706578e-02 -4.390152e-02 -3.587444e-02 -4.384691e-02 > [81] -1.480347e-03 -4.389834e-02 -4.185581e-02 -6.265638e-03 -4.202021e-02 > [86] -1.808579e-03 -4.264013e-02 >> x2 > [1] -0.030178811 -0.030383785 -0.024195005 -0.015303415 0.505774406 > [6] -0.023545322 -0.026341561 -0.031700423 -0.033170361 -0.016807687 > [11] -0.030765392 -0.032847977 -0.022241150 -0.030362745 -0.034153581 > [16] -0.034486433 -0.026723467 -0.024891104 -0.029309960 -0.028852347 > [21] -0.031251711 -0.025874702 -0.025032587 -0.035013137 -0.019088676 > [26] -0.021198249 -0.029941158 -0.026721874 -0.023171718 -0.032476985 > [31] -0.028476595 -0.048960041 -0.029837101 -0.017454529 -0.026821415 > [36] -0.027268238 -0.023836780 -0.065271644 -0.034818157 -0.032746126 > [41] -0.025742516 -0.057888806 -0.029212633 -0.022687920 -0.027572481 > [46] 0.027638085 -0.030762472 -0.024144720 -0.027526008 -0.025081009 > [51] -0.026338116 -0.017626651 0.001067303 -0.028009695 -0.039969065 > [56] -0.025602297 -0.027279180 -0.015731185 -0.023141642 0.333521248 > [61] -0.025868744 -0.021978049 -0.022180398 -0.010942728 0.021911932 > [66] -0.103832477 -0.017300324 -0.024064572 -0.036209915 0.602510894 > [71] -0.023094473 -0.024840084 -0.035736611 -0.027606170 -0.019964185 > [76] -0.029183955 -0.023701532 -0.027828418 -0.014666741 -0.024613236 > [81] 0.284105237 -0.026196040 -0.029447691 0.280181037 -0.027221380 > [86] 0.184637507 -0.039031585 >> g > [1] 2 1 1 2 2 2 1 2 2 2 2 2 2 2 2 1 2 2 2 1 1 1 1 2 2 2 2 2 2 1 1 1 1 0 1 2 > 1 2 > [39] 1 1 2 1 2 1 1 1 0 1 2 1 1 2 2 2 2 1 1 1 1 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 > 2 1 > [77] 1 2 2 2 2 1 1 2 2 1 1 >> f > [1] 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 > 1 0 > [39] 1 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 0 1 1 0 0 0 0 0 > 0 1 > [77] 1 0 0 0 0 1 1 0 0 1 1 >> table(f,g) > g > f 0 1 2 > 0 0 0 44 > 1 2 38 3 >> glm(f~x1+x2+g, family=binomial(link="logit"), na.action=na.omit) > > Call: glm(formula = f ~ x1 + x2 + g, family = binomial(link = "logit"), > na.action = na.omit) > > Coefficients: > (Intercept) x1 x2 g > 9.184e+15 -4.359e+15 7.889e+14 -6.190e+15 > > Degrees of Freedom: 86 Total (i.e. Null); 83 Residual > Null Deviance: 120.6 > Residual Deviance: 216.3 AIC: 224.3 > Warning message: > fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, > weights = weights, start = start, etastart = etastart, >> > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Yip a ?crit :> Hello, > > I was trying a glm fitting (as shown below) and I got a warning and a fitted > residual deviance larger than the null deviance. Is this the expected > behavor of glm? I would expect that even though the warning might be > warranted I should not get worse fitting with an additional covariate in the > model. Could anyone tell me what I'm missing?[ Big snip ... ]>> table(f,g) > g > f 0 1 2 > 0 0 0 44 > 1 2 38 3Ah, ah : please note that (f==1)==(g<2)>> glm(f~x1+x2+g, family=binomial(link="logit"), na.action=na.omit) > > Call: glm(formula = f ~ x1 + x2 + g, family = binomial(link = "logit"), > na.action = na.omit) > > Coefficients: > (Intercept) x1 x2 g > 9.184e+15 -4.359e+15 7.889e+14 -6.190e+15 > > Degrees of Freedom: 86 Total (i.e. Null); 83 Residual > Null Deviance: 120.6 > Residual Deviance: 216.3 AIC: 224.3 > Warning message: > fitted probabilities numerically 0 or 1 occurred in: glm.fit(x = X, y = Y, > weights = weights, start = start, etastart = etastart,This is discussed in MASS, chap. 7, "Generalized" Linear models" in a section named "Problems with binomial GLMs", pp 197-9 an in the literature cited herein. I won't insult Venables and Ripley's excellent writing by trying to paraphrase them. However, I note that glm tells you that he fitted probabilities to 0 or 1, i. e. infinite odd-ratios. Is that representable in a computer ? Furthermore, the extremely large absolute values of the coefficients should also ring an alarm bell... Did you try this ? : glm(f~x1+x2, subset=(g==2), family=binomial(link=logit)) Emmanuel Charpentier
Apparently Analagous Threads
- Surprising results from summary(lm()) on data with NO variation
- assistant
- Regression Error: Otherwise good variable causes singularity. Why?
- [RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
- get means of elements of 5 matrices in a list