Constantinos Antoniou
2005-May-23 18:59 UTC
[R] comparing glm models - lower AIC but insignificant coefficients
Hello, I am a new R user and I am trying to estimate some generalized linear models (glm). I am trying to compare a model with a gaussian distribution and an identity link function, and a poisson model with a log link function. My problem is that while the gaussian model has significantly lower (i.e. "better") AIC (Akaike Information Criterion) most of the coefficients are not significant. On the other hand, the poisson model has a higher (i.e. "worse") AIC, but almost all the coefficients are extremely significant (expect for one that still has p=0.07). Summary output of the two models follows... [sorry for the large number of independent variables, but the issue is less pronounced with fewer covariates]. My question is two-fold: - AIC supposedly can be used to compare non-nested models (although there are concerns and I have also seen a couple in this list's archives). Is this a case where AIC is not a good measure to compare the two models? If so, is there another measure (besides choosing the model with the significant coefficients)? [These are time-series data, so I am also looking at acf/pacf of the residuals]. - Could the very high significance of the coefficients in the poisson model hint at some issue? Thanking you in advance, Costas +++++++++++++++++++++++ POISSON - LOG LINK +++++++++++++++++++++++ Call: glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi * Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ 12) + cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] + TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) + I((TotalDeadInjured[1:46])^3), family = poisson(link = log)) Deviance Residuals: Min 1Q Median 3Q Max -3.6900 -1.1901 -0.1847 0.9477 4.3967 Coefficients: Estimate Std. Error z value Pr(>|z|) Month[3:48] -7.712e-02 5.530e-03 -13.947 < 2e-16 *** sin(pi * Month[3:48]/6) -1.419e-01 2.759e-02 -5.144 2.68e-07 *** cos(pi * Month[3:48]/6) -8.407e-02 1.799e-02 -4.672 2.99e-06 *** sin(pi * Month[3:48]/12) -2.776e-02 1.558e-02 -1.782 0.074702 . cos(pi * Month[3:48]/12) 5.195e-02 1.608e-02 3.232 0.001231 ** ThousandCars[3:48] 2.733e-02 2.255e-03 12.118 < 2e-16 *** monthcycle[3:48] 6.307e-02 6.546e-03 9.635 < 2e-16 *** TotalDeadInjured[1:46] -2.925e-02 8.460e-03 -3.457 0.000546 *** I((TotalDeadInjured[1:46])^2) 1.218e-04 3.613e-05 3.370 0.000750 *** I((TotalDeadInjured[1:46])^3) -1.640e-07 4.961e-08 -3.306 0.000946 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 78694.70 on 46 degrees of freedom Residual deviance: 130.03 on 36 degrees of freedom AIC: 476.08 Number of Fisher Scoring iterations: 4 +++++++++++++++++++++++++ GAUSSIAN ++++++++++++++++++++++++++ Call: glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi * Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ 12) + cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] + TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) + I((TotalDeadInjured[1:46])^3), family = gaussian(link = identity)) Deviance Residuals: Min 1Q Median 3Q Max -61.326 -12.012 -1.756 14.204 78.991 Coefficients: Estimate Std. Error t value Pr(>|t|) Month[3:48] -8.111e+00 2.115e+00 -3.835 0.000487 *** sin(pi * Month[3:48]/6) -2.639e+01 1.095e+01 -2.409 0.021246 * cos(pi * Month[3:48]/6) -1.700e+01 7.138e+00 -2.382 0.022629 * sin(pi * Month[3:48]/12) 2.392e-01 6.524e+00 0.037 0.970956 cos(pi * Month[3:48]/12) 8.785e+00 6.317e+00 1.391 0.172835 ThousandCars[3:48] 2.219e+00 8.604e-01 2.579 0.014146 * monthcycle[3:48] 5.364e+00 2.494e+00 2.151 0.038301 * TotalDeadInjured[1:46] -4.974e+00 3.263e+00 -1.524 0.136171 I((TotalDeadInjured[1:46])^2) 2.154e-02 1.410e-02 1.527 0.135382 I((TotalDeadInjured[1:46])^3) -2.999e-05 1.959e-05 -1.530 0.134637 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for gaussian family taken to be 831.6357) Null deviance: 1927714 on 46 degrees of freedom Residual deviance: 29939 on 36 degrees of freedom AIC: 450.54 Number of Fisher Scoring iterations: 2
Kjetil Brinchmann Halvorsen
2005-May-23 22:18 UTC
[R] comparing glm models - lower AIC but insignificant coefficients
Constantinos Antoniou wrote:> Hello, > > I am a new R user and I am trying to estimate some generalized linear > models (glm). I am trying to compare a model with a gaussian > distribution and an identity link function, and a poisson model with > a log link function. My problem is that while the gaussian model has > significantly lower (i.e. "better") AIC (Akaike Information > Criterion) most of the coefficients are not significant. On the other > hand, the poisson model has a higher (i.e. "worse") AIC, but almost > all the coefficients are extremely significant (expect for one that > still has p=0.07). > > Summary output of the two models follows... [sorry for the large > number of independent variables, but the issue is less pronounced > with fewer covariates]. > > My question is two-fold: > - AIC supposedly can be used to compare non-nested models (although > there are concerns and I have also seen a couple in this list's > archives). Is this a case where AIC is not a good measure to compare > the two models? If so, is there another measure (besides choosing the > model with the significant coefficients)? [These are time-series > data, so I am also looking at acf/pacf of the residuals].The topic of using AIC to compare non-nested models have been discussed on the list, please search. But even if AIC can be used to compare non-nested models, the AIC as calculated by R is not suited. The AIC includes an arbitrary additive constant, as the log-likelihood does. And this additive constant depend usually on constants in the density which are inconsequential for AIC, and may be omitted. And even if they were included, it seem doubtfull to me that this would help for comparision of Poisson and normal models, since the underlying measure is different! The experts can comment on that. That said, I would tend to use Poisson if I had count data and a poisson model looks remotely sensible. That will give a more interpretable model, which seems more important than purely data-analytic considerations. And lasstly, if the poisson assumptions seems reasonable, there will be a non-constant variance, and if you use a normal model you should use weighted least squares or tran sform the response (square root). If you try that, maybe you will see that the normal model give lower p-values for the coefficients. Also make a plot of residuals versus fitted value!> - Could the very high significance of the coefficients in the poisson > model hint at some issue?Maybe that the model fits better than the normal? Kjetil> > Thanking you in advance, > > Costas > > > +++++++++++++++++++++++ > POISSON - LOG LINK > +++++++++++++++++++++++ > > > Call: > glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi * > Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ > 12) + > cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] + > TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) + > I((TotalDeadInjured[1:46])^3), family = poisson(link = log)) > > Deviance Residuals: > Min 1Q Median 3Q Max > -3.6900 -1.1901 -0.1847 0.9477 4.3967 > > Coefficients: > Estimate Std. Error z value Pr(>|z|) > Month[3:48] -7.712e-02 5.530e-03 -13.947 < 2e-16 *** > sin(pi * Month[3:48]/6) -1.419e-01 2.759e-02 -5.144 2.68e-07 *** > cos(pi * Month[3:48]/6) -8.407e-02 1.799e-02 -4.672 2.99e-06 *** > sin(pi * Month[3:48]/12) -2.776e-02 1.558e-02 -1.782 0.074702 . > cos(pi * Month[3:48]/12) 5.195e-02 1.608e-02 3.232 0.001231 ** > ThousandCars[3:48] 2.733e-02 2.255e-03 12.118 < 2e-16 *** > monthcycle[3:48] 6.307e-02 6.546e-03 9.635 < 2e-16 *** > TotalDeadInjured[1:46] -2.925e-02 8.460e-03 -3.457 0.000546 *** > I((TotalDeadInjured[1:46])^2) 1.218e-04 3.613e-05 3.370 0.000750 *** > I((TotalDeadInjured[1:46])^3) -1.640e-07 4.961e-08 -3.306 0.000946 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for poisson family taken to be 1) > > Null deviance: 78694.70 on 46 degrees of freedom > Residual deviance: 130.03 on 36 degrees of freedom > AIC: 476.08 > > Number of Fisher Scoring iterations: 4 > > +++++++++++++++++++++++++ > GAUSSIAN > ++++++++++++++++++++++++++ > > Call: > glm(formula = TotalDeadInjured[3:48] ~ -1 + Month[3:48] + sin(pi * > Month[3:48]/6) + cos(pi * Month[3:48]/6) + sin(pi * Month[3:48]/ > 12) + > cos(pi * Month[3:48]/12) + ThousandCars[3:48] + monthcycle[3:48] + > TotalDeadInjured[1:46] + I((TotalDeadInjured[1:46])^2) + > I((TotalDeadInjured[1:46])^3), family = gaussian(link = identity)) > > Deviance Residuals: > Min 1Q Median 3Q Max > -61.326 -12.012 -1.756 14.204 78.991 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > Month[3:48] -8.111e+00 2.115e+00 -3.835 0.000487 *** > sin(pi * Month[3:48]/6) -2.639e+01 1.095e+01 -2.409 0.021246 * > cos(pi * Month[3:48]/6) -1.700e+01 7.138e+00 -2.382 0.022629 * > sin(pi * Month[3:48]/12) 2.392e-01 6.524e+00 0.037 0.970956 > cos(pi * Month[3:48]/12) 8.785e+00 6.317e+00 1.391 0.172835 > ThousandCars[3:48] 2.219e+00 8.604e-01 2.579 0.014146 * > monthcycle[3:48] 5.364e+00 2.494e+00 2.151 0.038301 * > TotalDeadInjured[1:46] -4.974e+00 3.263e+00 -1.524 0.136171 > I((TotalDeadInjured[1:46])^2) 2.154e-02 1.410e-02 1.527 0.135382 > I((TotalDeadInjured[1:46])^3) -2.999e-05 1.959e-05 -1.530 0.134637 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > (Dispersion parameter for gaussian family taken to be 831.6357) > > Null deviance: 1927714 on 46 degrees of freedom > Residual deviance: 29939 on 36 degrees of freedom > AIC: 450.54 > > Number of Fisher Scoring iterations: 2 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >-- Kjetil Halvorsen. Peace is the most effective weapon of mass construction. -- Mahdi Elmandjra -- No virus found in this outgoing message. Checked by AVG Anti-Virus.
Reasonably Related Threads
- One final minor trivial insignificant pointless patch
- insignificant factors in regression model
- Do we have to control for block in block designs if it is insignificant?
- Insignificant variable improves AIC (multinom)?
- update fit (removing insignificant variables)