thr3ads.net - R help - [R] Extreme AIC or BIC values in glm(), logistic regression [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Maggie Wang

2009-Mar-18 03:52 UTC

[R] Extreme AIC or BIC values in glm(), logistic regression

Dear R-users,

I use glm() to do logistic regression and use stepAIC() to do stepwise model
selection.

The common AIC value comes out is about 100, a good fit is as low as around
70. But for some model, the AIC went to extreme values like 1000. When I
check the P-values, All the independent variables (about 30 of them)
included in the equation are very significant, which is impossible, because
we expect some would be dropped.  This situation is not uncommon.

A summary output like this:

Coefficients:
                              Estimate Std. Error   z value Pr(>|z|)
(Intercept)                   4.883e+14  1.671e+07  29217415   <2e-16 ***
g761                         -5.383e+14  9.897e+07  -5438529   <2e-16 ***
g2809                        -1.945e+15  1.082e+08 -17977871   <2e-16 ***
g3106                        -2.803e+15  9.351e+07 -29976674   <2e-16 ***
g4373                        -9.272e+14  6.534e+07 -14190077   <2e-16 ***
g4583                        -2.279e+15  1.223e+08 -18640563   <2e-16 ***
g761:g2809                   -5.101e+14  4.693e+08  -1086931   <2e-16 ***
g761:g3106                   -3.399e+16  6.923e+08 -49093218   <2e-16 ***
g2809:g3106                   3.016e+15  6.860e+08   4397188   <2e-16 ***
g761:g4373                    3.180e+15  4.595e+08   6920270   <2e-16 ***
g2809:g4373                  -5.184e+15  4.436e+08 -11685382   <2e-16 ***
g3106:g4373                   1.589e+16  2.572e+08  61788148   <2e-16 ***
g761:g4583                   -1.419e+16  8.199e+08 -17303033   <2e-16 ***
g2809:g4583                  -2.540e+16  8.151e+08 -31156781   <2e-16 ***
........
(omit)
........

f. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

   Null deviance:  120.32  on 86  degrees of freedom
Residual deviance: 1009.22  on 55  degrees of freedom
AIC: 1073.2

Number of Fisher Scoring iterations: 25

Could anyone suggest what does this mean?   How can I perform a reliable
logistic regression?

Thank you so much for the help!

Best Regards,
Maggie

	[[alternative HTML version deleted]]

Dieter Menne

2009-Mar-18 07:30 UTC

head link

[R] Extreme AIC or BIC values in glm(), logistic regression

Maggie Wang <haitian <at> ust.hk> writes:
> I use glm() to do logistic regression and use stepAIC() to do stepwise
model
> selection.
> 
> The common AIC value comes out is about 100, a good fit is as low as around
> 70. But for some model, the AIC went to extreme values like 1000. When I
> check the P-values, All the independent variables (about 30 of them)
> included in the equation are very significant, which is impossible, because
> we expect some would be dropped.  This situation is not uncommon.
> 
> A summary output like this:
> 
> Coefficients:
>                               Estimate Std. Error   z value Pr(>|z|)
> (Intercept)                   4.883e+14  1.671e+07  29217415   <2e-16
***
> g761                         -5.383e+14  9.897e+07  -5438529   <2e-16
***
> g2809                        -1.945e+15  1.082e+08 -17977871   <2e-16
***
> g3106                        -2.803e+15  9.351e+07 -29976674   <2e-16
***
I suspect that you have specified your target variables incorrectly. 
Note that there three method to define the variables which is better explained
in MASS, chapter Binomial data in the budworm context.

Try to extract a few of your data and post these here in a self running 
example.

Dieter

Thomas Lumley

2009-Mar-18 07:38 UTC

head link

[R] Extreme AIC or BIC values in glm(), logistic regression

With 30 variables and only 55 residual degrees of freedom you probably have
perfect separation due to not having enough data.  Look at the coefficients --
they are infinite, implying perfect overfitting.

       -thomas

On Wed, 18 Mar 2009, Maggie Wang wrote:
> Dear R-users,
>
> I use glm() to do logistic regression and use stepAIC() to do stepwise
model
> selection.
>
> The common AIC value comes out is about 100, a good fit is as low as around
> 70. But for some model, the AIC went to extreme values like 1000. When I
> check the P-values, All the independent variables (about 30 of them)
> included in the equation are very significant, which is impossible, because
> we expect some would be dropped.  This situation is not uncommon.
>
> A summary output like this:
>
> Coefficients:
>                              Estimate Std. Error   z value Pr(>|z|)
> (Intercept)                   4.883e+14  1.671e+07  29217415   <2e-16
***
> g761                         -5.383e+14  9.897e+07  -5438529   <2e-16
***
> g2809                        -1.945e+15  1.082e+08 -17977871   <2e-16
***
> g3106                        -2.803e+15  9.351e+07 -29976674   <2e-16
***
> g4373                        -9.272e+14  6.534e+07 -14190077   <2e-16
***
> g4583                        -2.279e+15  1.223e+08 -18640563   <2e-16
***
> g761:g2809                   -5.101e+14  4.693e+08  -1086931   <2e-16
***
> g761:g3106                   -3.399e+16  6.923e+08 -49093218   <2e-16
***
> g2809:g3106                   3.016e+15  6.860e+08   4397188   <2e-16
***
> g761:g4373                    3.180e+15  4.595e+08   6920270   <2e-16
***
> g2809:g4373                  -5.184e+15  4.436e+08 -11685382   <2e-16
***
> g3106:g4373                   1.589e+16  2.572e+08  61788148   <2e-16
***
> g761:g4583                   -1.419e+16  8.199e+08 -17303033   <2e-16
***
> g2809:g4583                  -2.540e+16  8.151e+08 -31156781   <2e-16
***
> ........
> (omit)
> ........
>
> f. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> (Dispersion parameter for binomial family taken to be 1)
>
>   Null deviance:  120.32  on 86  degrees of freedom
> Residual deviance: 1009.22  on 55  degrees of freedom
> AIC: 1073.2
>
> Number of Fisher Scoring iterations: 25
>
> Could anyone suggest what does this mean?   How can I perform a reliable
> logistic regression?
>
> Thank you so much for the help!
>
> Best Regards,
> Maggie
>
> 	[[alternative HTML version deleted]]
>
>
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Gad Abraham

2009-Mar-19 00:55 UTC

head link

[R] Extreme AIC or BIC values in glm(), logistic regression

Maggie Wang wrote:> Dear R-users,
> 
> I use glm() to do logistic regression and use stepAIC() to do stepwise
model
> selection.
> 
> The common AIC value comes out is about 100, a good fit is as low as around
> 70. But for some model, the AIC went to extreme values like 1000. When I
> check the P-values, All the independent variables (about 30 of them)
> included in the equation are very significant, which is impossible, because
> we expect some would be dropped.  This situation is not uncommon.
> 
> A summary output like this:
> 
> Coefficients:
>                               Estimate Std. Error   z value Pr(>|z|)
> (Intercept)                   4.883e+14  1.671e+07  29217415   <2e-16
***
> g761                         -5.383e+14  9.897e+07  -5438529   <2e-16
***
> g2809                        -1.945e+15  1.082e+08 -17977871   <2e-16
***
> g3106                        -2.803e+15  9.351e+07 -29976674   <2e-16
***
> g4373                        -9.272e+14  6.534e+07 -14190077   <2e-16
***
> g4583                        -2.279e+15  1.223e+08 -18640563   <2e-16
***
> g761:g2809                   -5.101e+14  4.693e+08  -1086931   <2e-16
***
> g761:g3106                   -3.399e+16  6.923e+08 -49093218   <2e-16
***
> g2809:g3106                   3.016e+15  6.860e+08   4397188   <2e-16
***
> g761:g4373                    3.180e+15  4.595e+08   6920270   <2e-16
***
> g2809:g4373                  -5.184e+15  4.436e+08 -11685382   <2e-16
***
> g3106:g4373                   1.589e+16  2.572e+08  61788148   <2e-16
***
> g761:g4583                   -1.419e+16  8.199e+08 -17303033   <2e-16
***
> g2809:g4583                  -2.540e+16  8.151e+08 -31156781   <2e-16
***
I don't have an answer (and you haven't supplied the full code), but one
obvious thing is that the estimated coefficients are extremely large 
(this is the linear predictor scale, so in the response scale it's even 
worse since you exponentiate it). Perhaps this is due to very high 
collinearity of your variables (however the standard error is low 
relative to the estimate so maybe not), and/or issues of scaling (i.e., 
your variables are very small, use scale() to standardise them.)

-- 
Gad Abraham
MEng Student, Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham

Maybe Matching Threads

Extreme AIC in glm(), perfect separation, svm() tuning

R help - Mar 2009 - Extreme AIC or BIC values in glm(), logistic regression

[R] Extreme AIC or BIC values in glm(), logistic regression

[R] Extreme AIC or BIC values in glm(), logistic regression

[R] Extreme AIC or BIC values in glm(), logistic regression

[R] Extreme AIC or BIC values in glm(), logistic regression

Maybe Matching Threads