Greetings all The help file for GAMM in mgcv indicates that the log likelihood for a GAMM reported using summary(my.gamm$lme) (as an example) is not correct. However, in a past R-help post (included below), there is some indication that the likelihood ratio test in anova.lme(mygamm$lme, mygamm1$lme) is valid. How can I tell if anova.lme results are meaningful (are AIC, BIC, and logLik estimates accurate)? The data include hydroacoustic estimates of fish biomass (lbloat) in 1,000 meter long intervals (elementary sampling units) from multiple transects (each 20-30 km long, tranf) in two different lakes and three different years. bloat.gamm1 <- gamm(lbloat ~ s(depth), correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), data=fish3) bloat.gamm2 <- gamm(lbloat ~ lakef + s(depth), correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), data=fish3) bloat.gamm3 <- gamm(lbloat ~ lakef + s(depth, by=lakef), correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), data=fish3)> anova.lme(bloat.gamm1$lme, bloat.gamm2$lme, bloat.gamm3$lme)Model df AIC BIC logLik Test L.Ratio p-value bloat.gamm1$lme 1 6 7916.315 7950.702 -3952.158 bloat.gamm2$lme 2 7 7902.718 7942.835 -3944.359 1 vs 2 15.597489 0.0001 bloat.gamm3$lme 3 9 7910.987 7962.567 -3946.494 2 vs 3 4.269119 0.1183 Thanks Dave Hi R user, I am using the gamm() function of the mgcv-package. Now I would like to decide on the random effects to include in the model. Within a GAMM framework, is it allowed to compare the following two models inv_1<-gamm(y~te(sat,inv),data=daten_final, random=list(proband=~1)) inv_2<-gamm(y~te(sat,inv),data=daten_final, random=list(proband=~sat)) with a likelihood ratio test for a traditional GLMM, like this: anova(inv_1$lme, inv_2$lme) The output is as follows: Model df AIC BIC logLik Test L.Ratio p-value inv_2$lme 1 10 21495.90 21557.59 -10737.95 inv_1$lme 2 8 23211.12 23260.46 -11597.56 1 vs 2 1719.214 <.0001 Or is this not in tune with the automatic smoothing parameter selection (i.e. it is not exactly the same for both model alternatives)? If not, how can I decide on the selection of random effects? This comparison is just as valid as it is for a regular linear mixed model, which is all that the GAMM is in this case --- the smoothing parameters are just variance components in your example. In general you have to be a bit careful with generalized likelihood ratio tests involving variance components, of course, since the null hypothesis often involves restricting some variance parameters to the edge of their possible range, which rather messes up the Taylor expansion about the null parameter values that underpins the large sample distributional results used in the glrt. Your example does involve such a problematic comparison, but the result is so clear cut here that there is not really any doubt that inv_2 is better in this case (I wonder if inv_1 even passes basic model checking?). See Pinheiro and Bates (2000) for more info. hope this is some use, Simon David Warner Research Fishery Biologist USGS Great Lakes Science Center 1451 Green Road Ann Arbor MI 48105 734.214.9392 [[alternative HTML version deleted]]
On Wed, 2008-11-19 at 09:33 -0500, David M Warner wrote:> Greetings all > The help file for GAMM in mgcv indicates that the log likelihood for a > GAMM reported using > > summary(my.gamm$lme) (as an example) is not correct.It is slightly more specific than that. It mentions that glmmPQL was used, and in your examples below, it isn't used as your GAMM's are Gaussian. Having said that, the log likelihood for the Gaussian GAMM will be based on whatever you set argument 'method' to. It defaults to "ML" which is full Maximum Likelihood and IIRC is the correct form for checking the fixed effects in a likelihood ratio test. REML produces unbiased estimates of the variances, say of random effects, whereas ML doesn't. If you are fitting Gaussian GAMMs then the book by Pinhiero and Bates that supports the nlme package (where the lme function comes from) will be very useful as the GAMM is just an LME. Simon Woods book on GAMs is perhaps the best place to find out about what gamm() is doing currently. HTH G> > However, in a past R-help post (included below), there is some indication > that the likelihood ratio test in anova.lme(mygamm$lme, mygamm1$lme) is > valid. > > How can I tell if anova.lme results are meaningful (are AIC, BIC, and > logLik estimates accurate)? > > The data include hydroacoustic estimates of fish biomass (lbloat) in 1,000 > meter long intervals (elementary sampling units) from multiple transects > (each 20-30 km long, tranf) in two different lakes and three different > years. > > bloat.gamm1 <- gamm(lbloat ~ s(depth), correlation=corSpher(c(30000, > 0.01),form = ~ x+y|tranf, nugget=TRUE), data=fish3) > > bloat.gamm2 <- gamm(lbloat ~ lakef + s(depth), > correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), > data=fish3) > > bloat.gamm3 <- gamm(lbloat ~ lakef + s(depth, by=lakef), > correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), > data=fish3) > > > anova.lme(bloat.gamm1$lme, bloat.gamm2$lme, bloat.gamm3$lme) > Model df AIC BIC logLik Test L.Ratio > p-value > bloat.gamm1$lme 1 6 7916.315 7950.702 -3952.158 > bloat.gamm2$lme 2 7 7902.718 7942.835 -3944.359 1 vs 2 15.597489 > 0.0001 > bloat.gamm3$lme 3 9 7910.987 7962.567 -3946.494 2 vs 3 4.269119 > 0.1183 > > Thanks > Dave > > > > > > > Hi R user, > > I am using the gamm() function of the mgcv-package. Now I would like to > decide on the random effects to include in the model. Within a GAMM > framework, is it allowed to compare the following two models > > inv_1<-gamm(y~te(sat,inv),data=daten_final, random=list(proband=~1)) > > inv_2<-gamm(y~te(sat,inv),data=daten_final, random=list(proband=~sat)) > > > with a likelihood ratio test for a traditional GLMM, like this: > > anova(inv_1$lme, inv_2$lme) > > The output is as follows: > > Model df AIC BIC logLik Test L.Ratio p-value > inv_2$lme 1 10 21495.90 21557.59 -10737.95 > inv_1$lme 2 8 23211.12 23260.46 -11597.56 1 vs 2 1719.214 <.0001 > > > Or is this not in tune with the automatic smoothing parameter selection > (i.e. it is not exactly the same for both model alternatives)? > > > If not, how can I decide on the selection of random effects? > > > > > This comparison is just as valid as it is for a regular linear mixed > model, > which is all that the GAMM is in this case --- the smoothing parameters > are > just variance components in your example. > > In general you have to be a bit careful with generalized likelihood ratio > tests involving variance components, of course, since the null hypothesis > > often involves restricting some variance parameters to the edge of their > possible range, which rather messes up the Taylor expansion about the null > > parameter values that underpins the large sample distributional results > used > in the glrt. Your example does involve such a problematic comparison, but > the > result is so clear cut here that there is not really any doubt that inv_2 > is > better in this case (I wonder if inv_1 even passes basic model checking?). > > See Pinheiro and Bates (2000) for more info. > > hope this is some use, > Simon > > > > > > > > > > > > > > David Warner > Research Fishery Biologist > USGS Great Lakes Science Center > 1451 Green Road > Ann Arbor MI 48105 > 734.214.9392 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
David, Your examples assume gaussian errors and an identity link, so in this case your GAMMs are just linear mixed models and the likelihoods are fine. (This would not have been the case for non-gaussian errors and/or non-identity link, when PQL would have been used for fitting and the `likelihood' is actually that of a working model fittied to working data). The only further wrinkles to be aware of are (i) that likelihood ratio testing will only be valid if you select maximum likelihood rather than REML as the fitting method, since your models all differ in fixed effects structure, and (ii) that the lrt comparison of bloat.gamm2 and bloat.gamm3 will only be approximate, since the null hypothesis in that comparison is equivalent to setting some variance components (reciprocal smoothing parameters) to zero, with the usual problems which that entails. best, Simon On Wednesday 19 November 2008 14:33, David M Warner wrote:> Greetings all > The help file for GAMM in mgcv indicates that the log likelihood for a > GAMM reported using > > summary(my.gamm$lme) (as an example) is not correct. > > However, in a past R-help post (included below), there is some indication > that the likelihood ratio test in anova.lme(mygamm$lme, mygamm1$lme) is > valid. > > How can I tell if anova.lme results are meaningful (are AIC, BIC, and > logLik estimates accurate)? > > The data include hydroacoustic estimates of fish biomass (lbloat) in 1,000 > meter long intervals (elementary sampling units) from multiple transects > (each 20-30 km long, tranf) in two different lakes and three different > years. > > bloat.gamm1 <- gamm(lbloat ~ s(depth), correlation=corSpher(c(30000, > 0.01),form = ~ x+y|tranf, nugget=TRUE), data=fish3) > > bloat.gamm2 <- gamm(lbloat ~ lakef + s(depth), > correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), > data=fish3) > > bloat.gamm3 <- gamm(lbloat ~ lakef + s(depth, by=lakef), > correlation=corSpher(c(30000, 0.01),form = ~ x+y|tranf, nugget=TRUE), > data=fish3) > > > anova.lme(bloat.gamm1$lme, bloat.gamm2$lme, bloat.gamm3$lme) > > Model df AIC BIC logLik Test L.Ratio > p-value > bloat.gamm1$lme 1 6 7916.315 7950.702 -3952.158 > bloat.gamm2$lme 2 7 7902.718 7942.835 -3944.359 1 vs 2 15.597489 > 0.0001 > bloat.gamm3$lme 3 9 7910.987 7962.567 -3946.494 2 vs 3 4.269119 > 0.1183 > > Thanks > Dave > > > > > > > Hi R user, > > I am using the gamm() function of the mgcv-package. Now I would like to > decide on the random effects to include in the model. Within a GAMM > framework, is it allowed to compare the following two models > > inv_1<-gamm(y~te(sat,inv),data=daten_final, random=list(proband=~1)) > > inv_2<-gamm(y~te(sat,inv),data=daten_final, random=list(proband=~sat)) > > > with a likelihood ratio test for a traditional GLMM, like this: > > anova(inv_1$lme, inv_2$lme) > > The output is as follows: > > Model df AIC BIC logLik Test L.Ratio p-value > inv_2$lme 1 10 21495.90 21557.59 -10737.95 > inv_1$lme 2 8 23211.12 23260.46 -11597.56 1 vs 2 1719.214 <.0001 > > > Or is this not in tune with the automatic smoothing parameter selection > (i.e. it is not exactly the same for both model alternatives)? > > > If not, how can I decide on the selection of random effects? > > > > > This comparison is just as valid as it is for a regular linear mixed > model, > which is all that the GAMM is in this case --- the smoothing parameters > are > just variance components in your example. > > In general you have to be a bit careful with generalized likelihood ratio > tests involving variance components, of course, since the null hypothesis > > often involves restricting some variance parameters to the edge of their > possible range, which rather messes up the Taylor expansion about the null > > parameter values that underpins the large sample distributional results > used > in the glrt. Your example does involve such a problematic comparison, but > the > result is so clear cut here that there is not really any doubt that inv_2 > is > better in this case (I wonder if inv_1 even passes basic model checking?). > > See Pinheiro and Bates (2000) for more info. > > hope this is some use, > Simon > > > > > > > > > > > > > > David Warner > Research Fishery Biologist > USGS Great Lakes Science Center > 1451 Green Road > Ann Arbor MI 48105 > 734.214.9392 > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code.--> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283