thr3ads.net - R help - [R] mgcv, testing gamm vs lme, which degrees of freedom? [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Carlo Fezzi

2010-Jun-16 19:33 UTC

[R] mgcv, testing gamm vs lme, which degrees of freedom?

Dear all,

I am using the "mgcv" package by Simon Wood to estimate an additive
mixed
model in which I assume normal distribution for the residuals. I would
like to test this model vs a standard parametric mixed model, such as the
ones which are possible to estimate with "lme".

Since the smoothing splines can be written as random effects, is it
correct to use an (approximate) likelihood ratio test for this? If so,
which is the correct number of degrees of freedom? Sometime the function
LogLik() seems to provide strange results regarding the number of degrees
of freedom (df) for the gam, for instance in the example I copied below
the df for the "gamm" are equal to the ones for the "lme",
but the
summary(model.gam) seems to indicate a much higher edf for the gamm.

I would be very grateful to anybody who could point out a solution,

Best wishes,

Carlo

Example below:

----

rm(list = ls())
library(mgcv)
library(nlme)

set.seed(123)

x  <- runif(100,1,10)				# regressor
b0 <- rep(rnorm(10,mean=1,sd=2),each=10)	# random intercept
id <- rep(1:10, each=10) 			# identifier

y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable

f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )  #
lme model

f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    # gamm

## same number of "df" according to logLik:
logLik(f1)
logLik(f2$lme)

## much higher edf according to summary:
summary(f2$gam)

-----------

Joris Meys

2010-Jun-18 13:59 UTC

head link

[R] mgcv, testing gamm vs lme, which degrees of freedom?

I see this question is still open, so let me try to give you some
answer. As far as I understood, the smoothing splines are not used as
pure random effects, but as a combination of random and fixed effects.
Very simplified, the first two basis functions (intercept and linear
effect) are added as a fixed effect, whereas all other basis functions
go to the random component. Hence, the df for the log likelihood will
be indeed smaller than in the case of gam, where all splines are
considered fixed effects.

I've struggled with this issue myself as well, and I still don't
understand it fully. I believe I have the concepts down by now, but
the mathematical background requires more study for me I'm afraid. Did
you read the book of Simon Wood already?

http://www.amazon.ca/Generalized-Additive-Models-Introduction-R/dp/1584884746

Simon Wood advises to compare models using the anova's for the
lme-part of the object, i.e.
anova(model1$lme, model2$lme).
where one model contains the smoothing term of interest and the other
doesn't. I base most of my model testing in the gamm context on these
tests. To be completely correct, this -apparently- only counts for
gamms using the identity link.

Cheers
Joris

On Wed, Jun 16, 2010 at 9:33 PM, Carlo Fezzi <c.fezzi at uea.ac.uk>
wrote:> Dear all,
>
> I am using the "mgcv" package by Simon Wood to estimate an
additive mixed
> model in which I assume normal distribution for the residuals. I would
> like to test this model vs a standard parametric mixed model, such as the
> ones which are possible to estimate with "lme".
>
> Since the smoothing splines can be written as random effects, is it
> correct to use an (approximate) likelihood ratio test for this? If so,
> which is the correct number of degrees of freedom? Sometime the function
> LogLik() seems to provide strange results regarding the number of degrees
> of freedom (df) for the gam, for instance in the example I copied below
> the df for the "gamm" are equal to the ones for the
"lme", but the
> summary(model.gam) seems to indicate a much higher edf for the gamm.
>
> I would be very grateful to anybody who could point out a solution,
>
> Best wishes,
>
> Carlo
>
> Example below:
>
> ----
>
> rm(list = ls())
> library(mgcv)
> library(nlme)
>
> set.seed(123)
>
> x ?<- runif(100,1,10) ? ? ? ? ? ? ? ? ? ? ? ? ? # regressor
> b0 <- rep(rnorm(10,mean=1,sd=2),each=10) ? ? ? ?# random intercept
> id <- rep(1:10, each=10) ? ? ? ? ? ? ? ? ? ? ? ?# identifier
>
> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1) ?# dependent variable
>
> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )
?# lme model
>
> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" ) ? ?#
gamm
>
> ## same number of "df" according to logLik:
> logLik(f1)
> logLik(f2$lme)
>
> ## much higher edf according to summary:
> summary(f2$gam)
>
> -----------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

tel : +32 9 264 59 87
Joris.Meys at Ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

Simon Wood

2010-Jun-18 15:03 UTC

head link

[R] mgcv, testing gamm vs lme, which degrees of freedom?

On Wednesday 16 June 2010 20:33, Carlo Fezzi wrote:> Dear all,
>
> I am using the "mgcv" package by Simon Wood to estimate an
additive mixed
> model in which I assume normal distribution for the residuals. I would
> like to test this model vs a standard parametric mixed model, such as the
> ones which are possible to estimate with "lme".
>
> Since the smoothing splines can be written as random effects, is it
> correct to use an (approximate) likelihood ratio test for this? -- yes this is ok (subject to the usual caveats about testing on the boundary 
of the parameter space) but your 2 example models below will have  the same 
number of degrees of freedom!
> If so, 
> which is the correct number of degrees of freedom? --- The edf from the lme object, if you are testing using the log likelihood 
returned by the  lme representation of the model.
> Sometime the function 
> LogLik() seems to provide strange results regarding the number of degrees
> of freedom (df) for the gam, for instance in the example I copied below
> the df for the "gamm" are equal to the ones for the
"lme", but the
> summary(model.gam) seems to indicate a much higher edf for the gamm.--- the edf for the lme representation of the model counts only the fixed 
effects + the variance parameters (which includes smoothing parameters). Each 
smooth typically contributes only one or two fixed effect parameters, with 
the rest of the coefficients for the smooth treated as random effects.

--- the edf for the gam representation of the same model differs in that it 
also counts the *effective* number of parameters used to represent each 
smooth: this includes contributions from all those coefficients that the lme 
representation treated as strictly random. 

best,
Simon

> I would be very grateful to anybody who could point out a solution,
>
> Best wishes,
>
> Carlo
>
> Example below:
>
> ----
>
> rm(list = ls())
> library(mgcv)
> library(nlme)
>
> set.seed(123)
>
> x  <- runif(100,1,10)				# regressor
> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)	# random intercept
> id <- rep(1:10, each=10) 			# identifier
>
> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>
> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) , method="ML" )
# lme model
>
> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )    #
gamm
>
> ## same number of "df" according to logLik:
> logLik(f1)
> logLik(f2$lme)
>
> ## much higher edf according to summary:
> summary(f2$gam)
>
> -----------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.
-- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

Carlo Fezzi

2010-Jun-18 16:27 UTC

head link

[R] mgcv, testing gamm vs lme, which degrees of freedom?

Dear Simon,

thanks a lot for your prompt reply.

Unfortunately I am still confused about which is the correct way to test
the two models... as you point out: why in my example the two models have
the same degrees of freedom?

Intuitively it seems to me the gamm model is more flexible since, as I
understand also from you response, it should contain more random effects
than the other model because some of the smooth function parameters are
represented as such. This should not be taken into account when testing
one model vs the other?

Continuing with my example, the two models:

f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML")
f3 <- gamm(y ~ x + I(x^2), random = list(id=~1), method="ML" )

Can be tested with:

anova(f3$lme,f2$lme)

But why are the df the same? Model f2 appears to be more flexible and, as
such, should have more (random) parameters. Should not a test of one model
vs the other take this into account?

Sorry if this may sound dull, many thanks for your help,

Carlo


> On Wednesday 16 June 2010 20:33, Carlo Fezzi wrote:
>> Dear all,
>>
>> I am using the "mgcv" package by Simon Wood to estimate an
additive
>> mixed
>> model in which I assume normal distribution for the residuals. I would
>> like to test this model vs a standard parametric mixed model, such as
>> the
>> ones which are possible to estimate with "lme".
>>
>> Since the smoothing splines can be written as random effects, is it
>> correct to use an (approximate) likelihood ratio test for this?
> -- yes this is ok (subject to the usual caveats about testing on the
> boundary
> of the parameter space) but your 2 example models below will have  the
> same
> number of degrees of freedom!
>
>> If so,
>> which is the correct number of degrees of freedom?
> --- The edf from the lme object, if you are testing using the log
> likelihood
> returned by the  lme representation of the model.
>
>> Sometime the function
>> LogLik() seems to provide strange results regarding the number of
>> degrees
>> of freedom (df) for the gam, for instance in the example I copied below
>> the df for the "gamm" are equal to the ones for the
"lme", but the
>> summary(model.gam) seems to indicate a much higher edf for the gamm.
> --- the edf for the lme representation of the model counts only the fixed
> effects + the variance parameters (which includes smoothing parameters).
> Each
> smooth typically contributes only one or two fixed effect parameters, with
> the rest of the coefficients for the smooth treated as random effects.
>
> --- the edf for the gam representation of the same model differs in that
> it
> also counts the *effective* number of parameters used to represent each
> smooth: this includes contributions from all those coefficients that the
> lme
> representation treated as strictly random.
>
> best,
> Simon
>
>
>> I would be very grateful to anybody who could point out a solution,
>>
>> Best wishes,
>>
>> Carlo
>>
>> Example below:
>>
>> ----
>>
>> rm(list = ls())
>> library(mgcv)
>> library(nlme)
>>
>> set.seed(123)
>>
>> x  <- runif(100,1,10)				# regressor
>> b0 <- rep(rnorm(10,mean=1,sd=2),each=10)	# random intercept
>> id <- rep(1:10, each=10) 			# identifier
>>
>> y <- b0 + x - 0.1 * x^3 + rnorm(100,0,1)  # dependent variable
>>
>> f1 <- lme(y ~ x + I(x^2), random = list(id=~1) ,
method="ML" )  # lme
>> model
>>
>> f2 <- gamm(y ~ s(x), random = list(id=~1), method="ML" )  
# gamm
>>
>> ## same number of "df" according to logLik:
>> logLik(f1)
>> logLik(f2$lme)
>>
>> ## much higher edf according to summary:
>> summary(f2$gam)
>>
>> -----------
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html and provide commented,
>> minimal,
>> self-contained, reproducible code.
>
> --
>> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
>> +44 1225 386603  www.maths.bath.ac.uk/~sw283
>

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jun 2010 - mgcv, testing gamm vs lme, which degrees of freedom?

[R] mgcv, testing gamm vs lme, which degrees of freedom?

[R] mgcv, testing gamm vs lme, which degrees of freedom?

[R] mgcv, testing gamm vs lme, which degrees of freedom?

[R] mgcv, testing gamm vs lme, which degrees of freedom?

Possibly Parallel Threads