Andrew Crane-Droesch
2013-Jan-28 05:36 UTC
[R] Why are the number of coefficients varying? [mgcv][gam]
Dear List, I'm using gam in a multiple imputation framework -- specifying the knot locations, and saving the results of multiple models, each of which is fit with slightly different data (because some of it is predicted when missing). In MI, coefficients from multiple models are averaged, as are variance-covariance matrices. VCV's get an additional correction to account for how variable they are between each other. For this to work in the context of a penalized spline model, the knots need to be specified identically for each model (this is assisted by context knowledge), and each model needs to have the same number of knots. This is what I've done, below. I run that code multiple times with slightly different (imputed) datasets, but the number of coefficients varies (between 263-265). What gives? Why don't all of my models have the same number of coefficients? Thanks in advance! Best, Andrew BCAR.knots = c(2,15,60,120) INAR.knots = c(50,100,200,300) bcph.knots = c(7.5,8.5,9.5,10.5) htt.knots = c(350,450,550,650) bc.prc.C.knots = c(.3,.45,.6,.8) phi.knots = c(4.5,5.5,6.5,7.5) CEC.knots = c(5,12,19,26) soc.knots = c(10,20,30,40) sand.knots = c(.2,.4,.6,.8) clay.knots = c(.15,.3,.45,.6) abslat.knots = c(10,20,30,45) lon.knots = c(-50,0,50,125) dum = as.vector(rep(1,length(trialid))) doyee = NA r.ints = tryCatch(gam(RR ~ as.factor(pot.trial) + as.factor(year) + as.factor(crop.legume) + as.factor(crop.fruit) + as.factor(feedstock) + s(trialid,bs="re",by=dum) + s(BCAR.imp,bs="cr",k=length(BCAR.knots),by=as.factor(pot.trial)) + s(INAR.imp,bs="cr",k=length(INAR.knots),by=as.factor(crop.legume)) + s(bcph.imp,bs="cr",k=length(bcph.knots),by=as.factor(year)) + s(htt.imp,bs="cr",k=length(htt.knots)) + s(bc.prc.C.imp,bs="cr",k=length(bc.prc.C.knots)) + s(phi.imp,bs="cr",k=length(phi.knots),by=as.factor(year)) + s(CEC.imp,bs="cr",k=length(CEC.knots)) + s(soc.imp,bs="cr",k=length(soc.knots)) + s(sand.imp,bs="cr",k=length(sand.knots),by=as.factor(year)) + s(clay.imp,bs="cr",k=length(clay.knots)) + s(abslat,bs="cr",k=length(abslat.knots)) + te(INAR.imp,soc.imp,CEC.imp,k=c(4,4,4)) + te(soc.imp,BCAR.imp,k=c(4,4)) + te(phi.imp,bcph.imp,k=c(4,4)) + te(clay.imp,CEC.imp,k=c(4,4)) + te(htt.imp,bc.prc.C.imp,k=c(4,4),by=as.factor(feedstock)) ,data=grain.data ,knots= list( BCAR.imp=BCAR.knots,INAR.imp=INAR.knots,bcph.imp=bcph.knots,htt.imp=htt.knots,bc.prc.C.imp=bc.prc.C.knots, phi.imp=phi.knots,CEC.imp=CEC.knots,soc.imp=soc.knots,sand.imp=sand.knots,clay.imp=clay.knots, abslat.imp=abslat.knots ) ,method = "REML" ,weights = 1/nsame ), error=function(e) e, finally=doyee) if(inherits(r.ints, "error")) {r.ints=doyee; print("an error happened but it got handled.")} -- *Andrew Crane-Droesch* Energy and Resources Group UC Berkeley +1 215 435 2644 andrewcd@berkeley.edu skype: andrew.crane-droesch http://andrewcd.berkeley.edu [[alternative HTML version deleted]]
Simon Wood
2013-Jan-28 07:54 UTC
[R] Why are the number of coefficients varying? [mgcv][gam]
Hi Andrew, Do you know which coefficients are missing (i.e. which terms have missing coefficients)? This would help to narrow down the possibilities. Is it certain that all levels of all factors occur in all replicates? If not then you will get different numbers of coefficients (in the parametric part of the model). btw. what is tryCatch catching here? best, simon On 28/01/13 05:36, Andrew Crane-Droesch wrote:> Dear List, > > I'm using gam in a multiple imputation framework -- specifying the knot > locations, and saving the results of multiple models, each of which is > fit with slightly different data (because some of it is predicted when > missing). In MI, coefficients from multiple models are averaged, as are > variance-covariance matrices. VCV's get an additional correction to > account for how variable they are between each other. > > For this to work in the context of a penalized spline model, the knots > need to be specified identically for each model (this is assisted by > context knowledge), and each model needs to have the same number of > knots. This is what I've done, below. I run that code multiple times > with slightly different (imputed) datasets, but the number of > coefficients varies (between 263-265). > > What gives? Why don't all of my models have the same number of > coefficients? > > Thanks in advance! > > Best, > Andrew > > BCAR.knots = c(2,15,60,120) > INAR.knots = c(50,100,200,300) > bcph.knots = c(7.5,8.5,9.5,10.5) > htt.knots = c(350,450,550,650) > bc.prc.C.knots = c(.3,.45,.6,.8) > phi.knots = c(4.5,5.5,6.5,7.5) > CEC.knots = c(5,12,19,26) > soc.knots = c(10,20,30,40) > sand.knots = c(.2,.4,.6,.8) > clay.knots = c(.15,.3,.45,.6) > abslat.knots = c(10,20,30,45) > lon.knots = c(-50,0,50,125) > > dum = as.vector(rep(1,length(trialid))) > > doyee = NA > r.ints = tryCatch(gam(RR ~ > as.factor(pot.trial) > + as.factor(year) > + as.factor(crop.legume) > + as.factor(crop.fruit) > + as.factor(feedstock) > + s(trialid,bs="re",by=dum) > + s(BCAR.imp,bs="cr",k=length(BCAR.knots),by=as.factor(pot.trial)) > + > s(INAR.imp,bs="cr",k=length(INAR.knots),by=as.factor(crop.legume)) > + s(bcph.imp,bs="cr",k=length(bcph.knots),by=as.factor(year)) > + s(htt.imp,bs="cr",k=length(htt.knots)) > + s(bc.prc.C.imp,bs="cr",k=length(bc.prc.C.knots)) > + s(phi.imp,bs="cr",k=length(phi.knots),by=as.factor(year)) > + s(CEC.imp,bs="cr",k=length(CEC.knots)) > + s(soc.imp,bs="cr",k=length(soc.knots)) > + s(sand.imp,bs="cr",k=length(sand.knots),by=as.factor(year)) > + s(clay.imp,bs="cr",k=length(clay.knots)) > + s(abslat,bs="cr",k=length(abslat.knots)) > + te(INAR.imp,soc.imp,CEC.imp,k=c(4,4,4)) > + te(soc.imp,BCAR.imp,k=c(4,4)) > + te(phi.imp,bcph.imp,k=c(4,4)) > + te(clay.imp,CEC.imp,k=c(4,4)) > + te(htt.imp,bc.prc.C.imp,k=c(4,4),by=as.factor(feedstock)) > ,data=grain.data > ,knots= list( > BCAR.imp=BCAR.knots,INAR.imp=INAR.knots,bcph.imp=bcph.knots,htt.imp=htt.knots,bc.prc.C.imp=bc.prc.C.knots, > phi.imp=phi.knots,CEC.imp=CEC.knots,soc.imp=soc.knots,sand.imp=sand.knots,clay.imp=clay.knots, > abslat.imp=abslat.knots > ) > ,method = "REML" > ,weights = 1/nsame > ), error=function(e) e, finally=doyee) > if(inherits(r.ints, "error")) {r.ints=doyee; print("an error happened > but it got handled.")} > > >-- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283
Andrew Crane-Droesch
2013-Jan-29 00:20 UTC
[R] Why are the number of coefficients varying? [mgcv][gam]
Hi Simon, Thanks for replying. On further investigation, I can't reproduce this error on my local machine -- it only occurs when sending to a cluster (to run the multiple imputations in parallel) that I've got access to. I send to a friend's web server, and I get the same sort of error (but a different set of results!) that the cluster gave me. The seed is set identically across the three machines. gam.check indicates convergence after 16 iterations locally, but 21 iterations on both remote machines. And both remote machines give results that penalize the random effects, and the first, second and fourth spline terms effectively to zero (res.df ~1e-7). I then checked versions. My local machine has mgcv 1.7.22, the cluster has 1.7.3, and the server has 1.7.12. My local machine has R 2.15.1, the cluster has 2.12.2, and the server has 2.14.1. I updated the server's R version, and the result was fixed. Will see if the people who manage the cluster can update the cluster. The tryCatch is there because my imputation models that feed the gams are not bug-free. Thanks anyway for replying. Best, Andrew