Christos Argyropoulos
2010-Apr-14 18:19 UTC
[R] Selecting derivative order penalty for thin plate spline regression (GAM - mgcv)
Hi, I am using GAMs (package mgcv) to smooth event rates in a penalized regression setting and I was wondering if/how one can select the order of the derivative penalty. For my particular problem the order of the penalty (parameter "m" inside the "s" terms of the formula argument) appears to have a larger effect on the AIC/deviance of the estimated model than the number (or even the location!) of the knots for the covariate of interest. In particular, the estimated smooth changes shape from a linear (default "m" (=2) value for a TP smooth or a P-spline smooth) with a edf of 2.06 to a non-linear one with a edf of 4.8-5.1 when the "m" is raised to 3. There are no changes in the estimate shape of the smooth when I tried higher values of m and different bases (thin plate, p-spline). The overall significance of the smooth term changes, but is <0.05 in both cases, however the interpretation afforded by the shapes of the smooths are different. Smoothing the same dataset with a different approach to GAMs (BayesX) results in shapes that are more like the ones I have been getting with m>=3 rather than m=2 (I have not tried the conditional autoregressive regressions of WinBUGS yet). Any suggestion on how to proceed to test the optimal order of the penalty would be appreciated. The 2 approaches I am thinking of trying are: a) use un-penalized smoothing regressions and comparing the 2 models with ANOVA b) First, fit the "m=2" model and extract the smoothing parameters of all other smooth terms from that model. Second, fit a model in which the smooth of the covariate of interest is set to "m=3" , fixing the parameters of all other smooth terms appearing in the model statement to the values estimated in the first step. Then I could compare the (m=2) v.s. (m=3) models with ANOVA as the 2 models are properly nested within each other. Any other ideas? Sincerely, Christos Argyropoulos University of Pittsburgh _________________________________________________________________ Hotmail: Trusted email with powerful SPAM protection. [[alternative HTML version deleted]]
Simon Wood
2010-Apr-15 14:53 UTC
[R] Selecting derivative order penalty for thin plate spline regression (GAM - mgcv)
Christos, I would base choise of `m' on the AIC or GCV scores, (or on the REML or Marginal likelihood scores, if these have been used for smoothness selection). I don't think the m=2 basis will be strictly nested within the m=3 basis will it? So that rules out you option a. Option b is poor since the smoothing parameters really have a different meaning in the two cases. Choosing `m' according to the same criterion you used for smoothness selection seems like the most self consistent approach. best, Simon On Wednesday 14 April 2010 19:19, Christos Argyropoulos wrote:> Hi, > > > > I am using GAMs (package mgcv) to smooth event rates in a penalized > regression setting and I was wondering if/how one can > > select the order of the derivative penalty. > > > > For my particular problem the order of the penalty (parameter "m" inside > the "s" terms of the formula argument) appears to > > have a larger effect on the AIC/deviance of the estimated model than the > number (or even the location!) of the knots for the covariate > > of interest. In particular, the estimated smooth changes shape from a > linear (default "m" (=2) value for a TP smooth or a P-spline > > smooth) with a edf of 2.06 to a non-linear one with a edf of 4.8-5.1 when > the "m" is raised to 3. There are no changes in the > > estimate shape of the smooth when I tried higher values of m and different > bases (thin plate, p-spline). > > > > The overall significance of the smooth term changes, but is <0.05 in both > cases, however the interpretation afforded by the > > shapes of the smooths are different. > > > > Smoothing the same dataset with a different approach to GAMs (BayesX) > results in shapes that are more like the ones I have been getting with m>=3 > rather than m=2 (I have not tried the conditional autoregressive > regressions of WinBUGS yet). > > Any suggestion on how to proceed to test the optimal order of the penalty > would be appreciated. The 2 approaches I am thinking of trying are: > > a) use un-penalized smoothing regressions and comparing the 2 models with > ANOVA > > b) First, fit the "m=2" model and extract the smoothing parameters of all > other smooth terms from that model. Second, fit a model in which the smooth > of the covariate of interest is set to "m=3" , fixing the parameters of all > other smooth terms appearing in the model statement to the values estimated > in the first step. Then I could compare the (m=2) v.s. (m=3) models with > ANOVA as the 2 models are properly nested within each other. > > > > Any other ideas? > > > > Sincerely, > > > > Christos Argyropoulos > > University of Pittsburgh > > > > > > _________________________________________________________________ > Hotmail: Trusted email with powerful SPAM protection. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code.--> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283