thr3ads.net - R help - [R] compare gam fits [Aug 2010]

If this information is useful, please help other people find it:
Share via:

Mike Lawrence

2010-Aug-05 16:49 UTC

[R] compare gam fits

Hi folks,

I originally tried R-SIG-Mixed-Models for this one
(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004170.html),
but I think that the final steps to a solution aren't mixed-model
specific, so I thought I'd ask my final questions here.

I used gamm4 to fit a generalized additive mixed model to data from a
AxBxC design, where A is a random effect (human participants in an
experiment), B is a 2-level factor predictor variable, and C is a
continuous variable that is likely non-linear. I tell gamm4 to fit a
smooth across C to each level of B independently, and I can use
predict.gam(...,se.fit=T) to obtain predictions from the fitted model
as well as the standard error for the prediction. I'd like to
visualize the BxC interaction to see if smoothing C within each level
of B was really necessary, and if so, where it is (along the C
dimension) that B affects the smooth. It's easy enough to obtain the
predicted B1-B2 difference function, but I'm stuck on how to convey
the uncertainty of this function (e.g. computing the confidence
interval of the difference at each value of C).

One thought is that predict.gam(...,se.fit=T) returns SE values, so if
I could find out the N on which these SE values are computed, I could
compute the difference CI as
sqrt( ( (SD_B1)^2 + (SD_B2)^2 ) / N ) * qt( .975, df=N-1 )

However, I can't seem to figure out what value of N was used to
compute the SEs that predict.gam(...,se.fit=T) produces. Can anyone
point me to where I might find N?

Further, is N-1 the proper df for the call to qt()?

Finally, with a smooth function and 95% confidence intervals computed
at each of a large number of points, don't I run into a problem of an
inflated Type I error rate? Or does the fact that each point is not
independent from those next to it make this an inappropriate concern?

Cheers,

Mike

-- 
Mike Lawrence
Graduate Student
Department of Psychology
Dalhousie University

Looking to arrange a meeting? Check my public calendar:
http://tr.im/mikes_public_calendar

~ Certainty is folly... I think. ~

Gavin Simpson

2010-Aug-17 19:34 UTC

head link

[R] compare gam fits

On Thu, 2010-08-05 at 13:49 -0300, Mike Lawrence wrote:> Hi folks,
I've been pondering this for a little while, drumming up courage to
reply, whilst I've been away on fieldwork with patchy email access... so
here goes with the neck sticking out thing... ;-)
> I originally tried R-SIG-Mixed-Models for this one
> (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004170.html),
> but I think that the final steps to a solution aren't mixed-model
> specific, so I thought I'd ask my final questions here.
Two things spring to mind:

1) I don't know enough to know if these models are nested but there
should be a way to make them nested...

Anyway, why not fit a model with a 'by' term and model without the
'by'
term and compare them using a likelihood ratio test? Something like;

m1 <- gamm(y ~ B + s(C, by = B), data = foo,
           random = list(A = ~ 1), method = "ML", ....)
m2 <- gamm(y ~ s(C), data = foo, random = list(A = ~ 1),
           method = "ML", ....)

anova(m1$lme, m2$lme)

With s() smooths these won't be truly nested but you could also use te()
smooths, but I don't know the specifics of how you can ensure these
models are strictly nested. Simon Wood may be able to help you if you go
down this route.

2) Do the whole thing in gam() which can also deal with simple random
effects. You'll need to look at ?gam.models to see how to set these
things up, but at least you won't have to deal with potential problems
of fitting these things via nlme::lme. You'd still need to work out the
nesting issue and then fit the models with the same formulas as above,
but without the random bit.
> I used gamm4 to fit a generalized additive mixed model to data from a
> AxBxC design, where A is a random effect (human participants in an
> experiment), B is a 2-level factor predictor variable, and C is a
> continuous variable that is likely non-linear. I tell gamm4 to fit a
> smooth across C to each level of B independently, and I can use
> predict.gam(...,se.fit=T) to obtain predictions from the fitted model
> as well as the standard error for the prediction. I'd like to
> visualize the BxC interaction to see if smoothing C within each level
> of B was really necessary, and if so, where it is (along the C
> dimension) that B affects the smooth. It's easy enough to obtain the
> predicted B1-B2 difference function, but I'm stuck on how to convey
> the uncertainty of this function (e.g. computing the confidence
> interval of the difference at each value of C).
> 
> One thought is that predict.gam(...,se.fit=T) returns SE values, so if
> I could find out the N on which these SE values are computed, I could
> compute the difference CI as
> sqrt( ( (SD_B1)^2 + (SD_B2)^2 ) / N ) * qt( .975, df=N-1 )
> 
> However, I can't seem to figure out what value of N was used to
> compute the SEs that predict.gam(...,se.fit=T) produces. Can anyone
> point me to where I might find N?
> 
> Further, is N-1 the proper df for the call to qt()?
> 
> Finally, with a smooth function and 95% confidence intervals computed
> at each of a large number of points, don't I run into a problem of an
> inflated Type I error rate? Or does the fact that each point is not
> independent from those next to it make this an inappropriate concern?
IIRC, yes. These will be pointwise confidence intervals and won't hold
across the whole fitted smooth at the 95% level. Simon's book
demonstrates two methods that could be used to generate simultaneous
confidence intervals via simulation from the fitted models. The first
involves simulating from a multivariate normal distribution with
parameters given by the coefficients of the model. This will yield
confidence intervals that are conditional upon the smoothing parameter
(i.e. they will be a bit narrow as they don't take into account the fact
that the smoothing parameter was estimated and not known a priori). To
get round this, a parametric bootstrap can be used to simulate from the
model. An example for a Poisson model is given in Simon's book for the
mackerel egg example.

In the back of my mind I recall reading something that is niggling me
with regard to your approach and how you interpret these confidence
intervals (perhaps this is just the pointwise ones...?). But I am not
100% sure that you can do the sort of comparison you want by evaluating
if the curves are different by virtue of being outside the confidence
intervals...?

Anyway, I think the LRT approach and 'by' smooths might be a better way
to go.

Apologies if you've considered this by the way; I've not followed all
the threads on SIG-Mixed.

HTH

G
> 
> Cheers,
> 
> Mike
> 
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Simon Wood

2010-Aug-22 10:35 UTC

head link

[R] compare gam fits

You can do this using the predict.gam(...,type="lpmatrix"). Here's
some code
providing an example....


## example of smooths conditioned on factors from ?gam.models
library(mgcv)
dat <- gamSim(4)
## fit model...
b <- gam(y ~ fac+s(x2,by=fac)+s(x0),data=dat)
plot(b,pages=1)
## now predict on x2 grid at 2 levels of `fac'...
x2 <- seq(0,1,length=100);x3 <- c(x2,x2)
newd <-
data.frame(x2=x3,x0=rep(.5,100),fac=c(rep("1",100),rep("2",100)))
Xp <- predict(b,newd,type="lpmatrix")
X <- Xp[1:100,]-Xp[101:200,]
X[,1:3] <- 0;X[,22:39] <- 0
## Now X%*%coef(b) gives difference between smooths for fac=="1" and
## fac=="2"...
md <- X%*%coef(b)
## following evaluates diag(X%*%vcov(b)%*%t(X)) the s.e. of
## difference just computed....
se <- rowSums((X%*%vcov(b))*X)
## So getting upper and lower confidence limits is easy
## (could use qt(.975,b$df.residual) in place of 2 below)
ul <- md + 2 * se;ll <- md - 2 * se
## plot smooths and difference
par(mfrow=c(2,2));plot(b,select=1);plot(b,select=2)
plot(x2,md,ylim=range(c(ul,ll)),type="l")
lines(x2,ul,col=2);lines(x2,ll,col=2)



On Thursday 05 August 2010 17:49, Mike Lawrence wrote:> Hi folks,
>
> I originally tried R-SIG-Mixed-Models for this one
> (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q3/004170.html),
> but I think that the final steps to a solution aren't mixed-model
> specific, so I thought I'd ask my final questions here.
>
> I used gamm4 to fit a generalized additive mixed model to data from a
> AxBxC design, where A is a random effect (human participants in an
> experiment), B is a 2-level factor predictor variable, and C is a
> continuous variable that is likely non-linear. I tell gamm4 to fit a
> smooth across C to each level of B independently, and I can use
> predict.gam(...,se.fit=T) to obtain predictions from the fitted model
> as well as the standard error for the prediction. I'd like to
> visualize the BxC interaction to see if smoothing C within each level
> of B was really necessary, and if so, where it is (along the C
> dimension) that B affects the smooth. It's easy enough to obtain the
> predicted B1-B2 difference function, but I'm stuck on how to convey
> the uncertainty of this function (e.g. computing the confidence
> interval of the difference at each value of C).
>
> One thought is that predict.gam(...,se.fit=T) returns SE values, so if
> I could find out the N on which these SE values are computed, I could
> compute the difference CI as
> sqrt( ( (SD_B1)^2 + (SD_B2)^2 ) / N ) * qt( .975, df=N-1 )
>
> However, I can't seem to figure out what value of N was used to
> compute the SEs that predict.gam(...,se.fit=T) produces. Can anyone
> point me to where I might find N?
>
> Further, is N-1 the proper df for the call to qt()?
>
> Finally, with a smooth function and 95% confidence intervals computed
> at each of a large number of points, don't I run into a problem of an
> inflated Type I error rate? Or does the fact that each point is not
> independent from those next to it make this an inappropriate concern?
>
> Cheers,
>
> Mike
-- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Aug 2010 - compare gam fits

[R] compare gam fits

[R] compare gam fits

[R] compare gam fits

Apparently Analagous Threads