thr3ads.net - R help - [R] Why are the number of coefficients varying? [mgcv][gam] [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Andrew Crane-Droesch

2013-Jan-28 05:36 UTC

[R] Why are the number of coefficients varying? [mgcv][gam]

Dear List,

I'm using gam in a multiple imputation framework -- specifying the knot 
locations, and saving the results of multiple models, each of which is 
fit with slightly different data (because some of it is predicted when 
missing).  In MI, coefficients from multiple models are averaged, as are 
variance-covariance matrices.  VCV's get an additional correction to 
account for how variable they are between each other.

For this to work in the context of a penalized spline model, the knots 
need to be specified identically for each model (this is assisted by 
context knowledge), and each model needs to have the same number of 
knots.  This is what I've done, below.  I run that code multiple times 
with slightly different (imputed) datasets, but the number of 
coefficients varies (between 263-265).

What gives?  Why don't all of my models have the same number of 
coefficients?

Thanks in advance!

Best,
Andrew

BCAR.knots = c(2,15,60,120)
INAR.knots = c(50,100,200,300)
bcph.knots = c(7.5,8.5,9.5,10.5)
htt.knots = c(350,450,550,650)
bc.prc.C.knots = c(.3,.45,.6,.8)
phi.knots = c(4.5,5.5,6.5,7.5)
CEC.knots = c(5,12,19,26)
soc.knots = c(10,20,30,40)
sand.knots = c(.2,.4,.6,.8)
clay.knots = c(.15,.3,.45,.6)
abslat.knots = c(10,20,30,45)
lon.knots = c(-50,0,50,125)

dum = as.vector(rep(1,length(trialid)))

doyee = NA
r.ints            = tryCatch(gam(RR ~
             as.factor(pot.trial)
         +    as.factor(year)
         +    as.factor(crop.legume)
         +    as.factor(crop.fruit)
         +    as.factor(feedstock)
         +    s(trialid,bs="re",by=dum)
         +
s(BCAR.imp,bs="cr",k=length(BCAR.knots),by=as.factor(pot.trial))
         + 
s(INAR.imp,bs="cr",k=length(INAR.knots),by=as.factor(crop.legume))
         + s(bcph.imp,bs="cr",k=length(bcph.knots),by=as.factor(year))
         +    s(htt.imp,bs="cr",k=length(htt.knots))
         + s(bc.prc.C.imp,bs="cr",k=length(bc.prc.C.knots))
         + s(phi.imp,bs="cr",k=length(phi.knots),by=as.factor(year))
         +    s(CEC.imp,bs="cr",k=length(CEC.knots))
         +    s(soc.imp,bs="cr",k=length(soc.knots))
         + s(sand.imp,bs="cr",k=length(sand.knots),by=as.factor(year))
         +    s(clay.imp,bs="cr",k=length(clay.knots))
         +    s(abslat,bs="cr",k=length(abslat.knots))
         +    te(INAR.imp,soc.imp,CEC.imp,k=c(4,4,4))
         +    te(soc.imp,BCAR.imp,k=c(4,4))
         +    te(phi.imp,bcph.imp,k=c(4,4))
         +    te(clay.imp,CEC.imp,k=c(4,4))
         + te(htt.imp,bc.prc.C.imp,k=c(4,4),by=as.factor(feedstock))
     ,data=grain.data
     ,knots=    list( 
BCAR.imp=BCAR.knots,INAR.imp=INAR.knots,bcph.imp=bcph.knots,htt.imp=htt.knots,bc.prc.C.imp=bc.prc.C.knots,
phi.imp=phi.knots,CEC.imp=CEC.knots,soc.imp=soc.knots,sand.imp=sand.knots,clay.imp=clay.knots,
             abslat.imp=abslat.knots
             )
     ,method = "REML"
     ,weights = 1/nsame
), error=function(e) e, finally=doyee)
if(inherits(r.ints, "error")) {r.ints=doyee; print("an error
happened
but it got handled.")}



-- 

*Andrew Crane-Droesch*
Energy and Resources Group
UC Berkeley
+1 215 435 2644
andrewcd@berkeley.edu
skype: andrew.crane-droesch
http://andrewcd.berkeley.edu



	[[alternative HTML version deleted]]

Simon Wood

2013-Jan-28 07:54 UTC

head link

[R] Why are the number of coefficients varying? [mgcv][gam]

Hi Andrew,

Do you know which coefficients are missing (i.e. which terms have 
missing coefficients)? This would help to narrow down the possibilities. 
Is it certain that all levels of all factors occur in all replicates? If 
not then you will get different numbers of coefficients (in the 
parametric part of the model).

btw. what is tryCatch catching here?

best,
simon

On 28/01/13 05:36, Andrew Crane-Droesch wrote:> Dear List,
>
> I'm using gam in a multiple imputation framework -- specifying the knot
> locations, and saving the results of multiple models, each of which is
> fit with slightly different data (because some of it is predicted when
> missing).  In MI, coefficients from multiple models are averaged, as are
> variance-covariance matrices.  VCV's get an additional correction to
> account for how variable they are between each other.
>
> For this to work in the context of a penalized spline model, the knots
> need to be specified identically for each model (this is assisted by
> context knowledge), and each model needs to have the same number of
> knots.  This is what I've done, below.  I run that code multiple times
> with slightly different (imputed) datasets, but the number of
> coefficients varies (between 263-265).
>
> What gives?  Why don't all of my models have the same number of
> coefficients?
>
> Thanks in advance!
>
> Best,
> Andrew
>
> BCAR.knots = c(2,15,60,120)
> INAR.knots = c(50,100,200,300)
> bcph.knots = c(7.5,8.5,9.5,10.5)
> htt.knots = c(350,450,550,650)
> bc.prc.C.knots = c(.3,.45,.6,.8)
> phi.knots = c(4.5,5.5,6.5,7.5)
> CEC.knots = c(5,12,19,26)
> soc.knots = c(10,20,30,40)
> sand.knots = c(.2,.4,.6,.8)
> clay.knots = c(.15,.3,.45,.6)
> abslat.knots = c(10,20,30,45)
> lon.knots = c(-50,0,50,125)
>
> dum = as.vector(rep(1,length(trialid)))
>
> doyee = NA
> r.ints            = tryCatch(gam(RR ~
>               as.factor(pot.trial)
>           +    as.factor(year)
>           +    as.factor(crop.legume)
>           +    as.factor(crop.fruit)
>           +    as.factor(feedstock)
>           +    s(trialid,bs="re",by=dum)
>           +
s(BCAR.imp,bs="cr",k=length(BCAR.knots),by=as.factor(pot.trial))
>           +
>
s(INAR.imp,bs="cr",k=length(INAR.knots),by=as.factor(crop.legume))
>           +
s(bcph.imp,bs="cr",k=length(bcph.knots),by=as.factor(year))
>           +    s(htt.imp,bs="cr",k=length(htt.knots))
>           + s(bc.prc.C.imp,bs="cr",k=length(bc.prc.C.knots))
>           +
s(phi.imp,bs="cr",k=length(phi.knots),by=as.factor(year))
>           +    s(CEC.imp,bs="cr",k=length(CEC.knots))
>           +    s(soc.imp,bs="cr",k=length(soc.knots))
>           +
s(sand.imp,bs="cr",k=length(sand.knots),by=as.factor(year))
>           +    s(clay.imp,bs="cr",k=length(clay.knots))
>           +    s(abslat,bs="cr",k=length(abslat.knots))
>           +    te(INAR.imp,soc.imp,CEC.imp,k=c(4,4,4))
>           +    te(soc.imp,BCAR.imp,k=c(4,4))
>           +    te(phi.imp,bcph.imp,k=c(4,4))
>           +    te(clay.imp,CEC.imp,k=c(4,4))
>           + te(htt.imp,bc.prc.C.imp,k=c(4,4),by=as.factor(feedstock))
>       ,data=grain.data
>       ,knots=    list(
>
BCAR.imp=BCAR.knots,INAR.imp=INAR.knots,bcph.imp=bcph.knots,htt.imp=htt.knots,bc.prc.C.imp=bc.prc.C.knots,
>
phi.imp=phi.knots,CEC.imp=CEC.knots,soc.imp=soc.knots,sand.imp=sand.knots,clay.imp=clay.knots,
>               abslat.imp=abslat.knots
>               )
>       ,method = "REML"
>       ,weights = 1/nsame
> ), error=function(e) e, finally=doyee)
> if(inherits(r.ints, "error")) {r.ints=doyee; print("an error
happened
> but it got handled.")}
>
>
>

-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283

Andrew Crane-Droesch

2013-Jan-29 00:20 UTC

head link

[R] Why are the number of coefficients varying? [mgcv][gam]

Hi Simon,

Thanks for replying.

On further investigation, I can't reproduce this error on my local 
machine -- it only occurs when sending to a cluster (to run the multiple 
imputations in parallel) that I've got access to.  I send to a friend's 
web server, and I get the same sort of error (but a different set of 
results!) that the cluster gave me.  The seed is set identically across 
the three machines.  gam.check indicates convergence after 16 iterations 
locally, but 21 iterations on both remote machines.  And both remote 
machines give results that penalize the random effects, and the first, 
second and fourth spline terms effectively to zero (res.df ~1e-7).

I then checked versions.  My local machine has mgcv 1.7.22, the cluster 
has 1.7.3, and the server has 1.7.12.  My local machine has R 2.15.1, 
the cluster has 2.12.2, and the server has 2.14.1.  I updated the 
server's R version, and the result was fixed.  Will see if the people 
who manage the cluster can update the cluster.

The tryCatch is there because my imputation models that feed the gams 
are not bug-free.

Thanks anyway for replying.

Best,
Andrew

Maybe Matching Threads

Search for more reasonably related threads

R help - Jan 2013 - Why are the number of coefficients varying? [mgcv][gam]

[R] Why are the number of coefficients varying? [mgcv][gam]

[R] Why are the number of coefficients varying? [mgcv][gam]

[R] Why are the number of coefficients varying? [mgcv][gam]

Maybe Matching Threads