willow1980
2009-May-05 15:53 UTC
[R] A question about using “by” in GAM model fitting of interaction between smooth terms and factor
I am a little bit confusing about the following help message on how to fit a GAM model with interaction between factor and smooth terms from http://rss.acs.unt.edu/Rdoc/library/mgcv/html/gam.models.html: ?Sometimes models of the form: E(y)=b0+f(x)z need to be estimated (where f is a smooth function, as usual.) The appropriate formula is: y~z+s(x,by=z) - the by argument ensures that the smooth function gets multiplied by covariate z, but GAM smooths are centred (average value zero), so the z+ term is needed as well (f is being represented by a constant plus a centred smooth). If we'd wanted: E(y)=f(x)z then the appropriate formula would be: y~z+s(x,by=z)-1.? When I tried two scripts, I found they gave the same results. That is, the codes ?y~z+s(x,by=z)? and ?y~z+s(x,by=z)-1? gave the same results. The following is my result: ########################################################################### ?anova(model1,model2,test="Chisq") Analysis of Deviance Table Model 1: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR, byear, by = SES) Model 2: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR, byear, by = SES) - 1 Resid. Df Resid. Dev Df Deviance P(>|Chi|) 1 1.2076e+03 1458.4 2 1.2076e+03 1458.4 1.9099e-11 5.030e-10 2.074e-10? ########################################################################### Is this in conflict with above statement that ?If we'd wanted: E(y)=f(x)z then the appropriate formula would be: y~z+s(x,by=z)-1.?? Also, if you are familiar with GAM modelling, please have a look at my modelling process. That is, I want to study how one factor together with two smooth terms will influence the response. In model2, I also fitted the interaction between two smooth terms, together with the interaction of this interaction with factor. Is model 2 reasonable? I find it is rather complicated to interpret the plot of model 2. Thank you very much for helping! -- View this message in context: http://www.nabble.com/A-question-about-using-%E2%80%9Cby%E2%80%9D-in-GAM-model-fitting-of-interaction-between-smooth-terms-and-factor-tp23390342p23390342.html Sent from the R help mailing list archive at Nabble.com.
Simon Wood
2009-May-06 07:50 UTC
[R] A question about using “by” in GAM model fitting of interaction between smooth terms and factor
The problem here is that the help page you are looking at appears to be from an earlier version of `mgcv' than you are using (it's from a version that did not support factor `by' variables). Take a look at ?gam.models for the version that you are actually using. The reason that your models give the same fit is because ~z and ~z-1 differ only in the identifiability constraints used, when `z' is a factor (for all linear type models). As far as model reasonableness is concerned: it's a bit difficult to say without knowing the context. The only thing that stands out is that you are using an isotropic `s' term for the interaction --- this is fine if `byear' and `FAFR' are really naturally on the same scale, but if not tensor product smooths (`te') may be preferable, as the are independent of the relative scaling of the variables. For plot interpretability, I'd drop the `main effect' smooths and just leave in the interaction. best, Simon On Tuesday 05 May 2009 16:53, willow1980 wrote:> I am a little bit confusing about the following help message on how to fit > a GAM model with interaction between factor and smooth terms from > http://rss.acs.unt.edu/Rdoc/library/mgcv/html/gam.models.html: > ?Sometimes models of the form: > E(y)=b0+f(x)z > need to be estimated (where f is a smooth function, as usual.) The > appropriate formula is: > y~z+s(x,by=z) > - the by argument ensures that the smooth function gets multiplied by > covariate z, but GAM smooths are centred (average value zero), so the z+ > term is needed as well (f is being represented by a constant plus a centred > smooth). If we'd wanted: > E(y)=f(x)z > then the appropriate formula would be: y~z+s(x,by=z)-1.? > When I tried two scripts, I found they gave the same results. That is, the > codes ?y~z+s(x,by=z)? and ?y~z+s(x,by=z)-1? gave the same results. The > following is my result: > ########################################################################### > ?anova(model1,model2,test="Chisq") > Analysis of Deviance Table > > Model 1: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR, > byear, by = SES) > Model 2: FLBS ~ SES + s(FAFR, by = SES) + s(byear, by = SES) + s(FAFR, > byear, by = SES) - 1 > Resid. Df Resid. Dev Df Deviance P(>|Chi|) > 1 1.2076e+03 1458.4 > 2 1.2076e+03 1458.4 1.9099e-11 5.030e-10 2.074e-10? > ########################################################################### > Is this in conflict with above statement that ?If we'd wanted: E(y)=f(x)z > then the appropriate formula would be: y~z+s(x,by=z)-1.?? Also, if you are > familiar with GAM modelling, please have a look at my modelling process. > That is, I want to study how one factor together with two smooth terms will > influence the response. In model2, I also fitted the interaction between > two smooth terms, together with the interaction of this interaction with > factor. Is model 2 reasonable? I find it is rather complicated to interpret > the plot of model 2. > Thank you very much for helping!--> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283