Xing Zhao
2014-Mar-17 23:39 UTC
[R] mgcv, should include a intercept for the 'by' varying coefficient model, which is unconstrained
Dear Dr. Wood and other mgcv experts In ?gam.models, it says that the numeric "by" variable is genrally not subjected to an identifiability constraint, and I used the example in ?gam.models, finding some differences (code below). I think the the problem might become serious when several varying coefficient terms are specified in one model, such as gam(y ~ s(x0,by=x1) + s(x0,by=x2) + s(x0,by=x3),data=dat). In this case, those three terms are all not constraint, as they generally will not meet the three conditions for constraint. I can still implement it like gam(y ~ s(x0,by=x1) + s(x0,by=x2) + s(x0,by=x3),data=dat), but is it safe? Is there a best way to implement the model? Thank you for your help Best, Xing require(mgcv) set.seed(10) ## simulate date from y = f(x2)*x1 + error dat <- gamSim(3,n=400) b<-gam(y ~ s(x2,by=x1),data=dat) b1<-gam(y ~ s(x2,by=x1)-1,data=dat)> range(fitted(b)-fitted(b1))[1] -0.13027648 0.08117196> summary(dat$f-fitted(b))Min. 1st Qu. Median Mean 3rd Qu. Max. -0.5265 0.2628 1.2290 1.7710 2.6280 8.8580> summary(dat$f-fitted(b1))Min. 1st Qu. Median Mean 3rd Qu. Max. -0.4618 0.2785 1.2250 1.7390 2.5370 8.7310> summary(dat$y-fitted(b))Min. 1st Qu. Median Mean 3rd Qu. Max. -6.23500 -1.32700 -0.06752 0.00000 1.54900 7.01800> summary(dat$y-fitted(b1))Min. 1st Qu. Median Mean 3rd Qu. Max. -6.26700 -1.40300 -0.09908 -0.03199 1.51900 6.96700