I have a question about how to compare a GLM with a GAM model using anova function. A GLM is performed for example: model1 <-glm(formula = exitus ~ age+gender+diabetes, family = "binomial", na.action = na.exclude) A second nested model could be: model2 <-glm(formula = exitus ~ age+gender, family = "binomial", na.action na.exclude) To compare these two GLM models the instruction is: anova(model1,model2, test="F") Similarly for GAM models model3 <-gam(formula = exitus ~ s(age)+gender, family = "binomial", na.action = na.exclude) "R" allows to compare these two models (GLM vs. GAM) anova(model2,model3, test="F") This instruction returns a p-value with no error or warning, but this test is based on maximum likelihood, and GAM models are not fitted with maximum likelihood criteria, thus I think this p-value is not correct. Please, I really appreciate any information about how to compare a GLM with a GAM model. [[alternative HTML version deleted]]
On Tue, 26 Oct 2004, SUBIRANA CACHINERO, ISAAC wrote:> I have a question about how to compare a GLM with a GAM model using anova > function.You don't say what gam() function you are using. There are at least two out there and they work in quite different ways.> A GLM is performed for example: > > model1 <-glm(formula = exitus ~ age+gender+diabetes, family = "binomial", > na.action = na.exclude) > > A second nested model could be: > > model2 <-glm(formula = exitus ~ age+gender, family = "binomial", na.action > na.exclude)<snip>> > Similarly for GAM models > > model3 <-gam(formula = exitus ~ s(age)+gender, family = "binomial", > na.action = na.exclude) > > "R" allows to compare these two models (GLM vs. GAM) > > anova(model2,model3, test="F") > > This instruction returns a p-value with no error or warning, but this test > is based on maximum likelihood, and GAM models are not fitted with maximum > likelihood criteria, thus I think this p-value is not correct.Probably not. If the number of degrees of freedom for age is small it may be quite a good approximation. If you are using mgcv::gam you will have seen a warning to this effect on the help page for anova.gam. If you need a more accurate test you could simulate from model2 and compare the simulated distribution of the p-value to the value in the observed data. -thomas
R does not contain a gam() function. *Two* contributed packages, gam and mgcv, do. Please do as the posting guide asks and clarify what you are talking about here. Your penultimate para is not logical: the tests are _not_ based on maximum likelihood if ML fitting is not used. However, there are other model comparison tests that apply to non-ML fitting. If you mean anova.gam() in ?mgcv, do read the help page which says WARNING: Unless the models have no penalized terms then these methods are only approximate. but there is also such a function in package gam. On Tue, 26 Oct 2004, SUBIRANA CACHINERO, ISAAC wrote:> I have a question about how to compare a GLM with a GAM model using anova > function. > > A GLM is performed for example: > > model1 <-glm(formula = exitus ~ age+gender+diabetes, family = "binomial", > na.action = na.exclude) > > A second nested model could be: > > model2 <-glm(formula = exitus ~ age+gender, family = "binomial", na.action > na.exclude) > > To compare these two GLM models the instruction is: > > anova(model1,model2, test="F") > > Similarly for GAM models > > model3 <-gam(formula = exitus ~ s(age)+gender, family = "binomial", > na.action = na.exclude) > > "R" allows to compare these two models (GLM vs. GAM) > > anova(model2,model3, test="F") > > This instruction returns a p-value with no error or warning, but this test > is based on maximum likelihood, and GAM models are not fitted with maximum > likelihood criteria, thus I think this p-value is not correct. > > Please, I really appreciate any information about how to compare a GLM with > a GAM model.PLEASE do read the posting guide before posting. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
For GAMs fitted using mgcv::gam, I think that there's an argument that the most self consistent comparison would be based on the GCV/UBRE score for the models, since this has usually been used to select the degree of smoothness of the GAM model (you can fit the GLM using gam(), so this is available for both models). As you (and the help files) point out, the p-values produced by anova.gam are only approximate, when the models are fitted using penalties on the likelihood. There are two issues here, I think. i) The calculations are conditional on the smoothing parameters, which have usually been estimated. ii) Given the smoothing parameters, the GAMs are estimated by penalized likelihood maximization, not MLE. (ii)is an issue when comparing two or more models using anova.gam, since the calculations use the MLE theory based distributional results, but with the model degrees of freedom set to the effective degrees of freedom of the model. This is not as outrageous as it sounds: the GAM can always be re-parameterized using the eigen-vectors of its penalty matrix. This parameterization is orthogonal, and has a diagonal penalty matrix with a elements that increase rather smoothly and steeply. This tends to mean that, for any given smoothing parameter, some model parameters are effectively zeroed while some are effectively un-penalized. If all parameters were in one of these two categories then using distributional theory for un-penalized GLMs, with model degrees of freedom set to the EDF of the GAM would be perfectly legitimate. Of course in reality there are almost always a number of parameters subject to intermediate amounts of penalization. The approach used in anova.gam is effectively saying that these are equivalent to a smaller number of unpenalized parameters, which can only be an approximation! Note that single argument calls to anova.gam do not use the MLE based results. Interestingly, taking a mixed model approach (e.g. mgcv::gamm, for simple version) doesn't make things much better: in that case you *can* get unpenalized MLE's, but testing for presence or absence of smooth terms involves LRT's at the boundary of the parameter space, while other tests are often conditional on the parameters of the random effect distributions...> I have a question about how to compare a GLM with a GAM model using anova > function. > > A GLM is performed for example: > > model1 <-glm(formula = exitus ~ age+gender+diabetes, family = "binomial", > na.action = na.exclude) > > A second nested model could be: > > model2 <-glm(formula = exitus ~ age+gender, family = "binomial", na.action > na.exclude) > > > > To compare these two GLM models the instruction is: > > anova(model1,model2, test="F") > > > > Similarly for GAM models > > model3 <-gam(formula = exitus ~ s(age)+gender, family = "binomial", > na.action = na.exclude) > > > > "R" allows to compare these two models (GLM vs. GAM) > > anova(model2,model3, test="F") > > This instruction returns a p-value with no error or warning, but this test > is based on maximum likelihood, and GAM models are not fitted with maximum > likelihood criteria, thus I think this p-value is not correct. > > Please, I really appreciate any information about how to compare a GLM with > a GAM model.best, Simon _____________________________________________________________________> Simon Wood simon at stats.gla.ac.uk www.stats.gla.ac.uk/~simon/ >> Department of Statistics, University of Glasgow, Glasgow, G12 8QQ >>> Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814