Hi.. I'm using gam() to fit a spline model for a data set that has two predictor variables (say A and B). The results indicate that the higher order interaction terms are significant. The R^2 jumps from .5 to .9 when I change the maximum order for the interaction from 10 to 15 (i.e. (AB)^10 to (AB)^15). Is there any way of finding out which of the terms in the model are really "significant" so that I could drop some of the terms from the model? Thanks, nirmal
Doing subset selection purely based on statistical significance this way is known to be very problematic, to put it mildly. I'd suggest that you read up on Prof. Harrell's recent book, Regression Modeling Strategies, on how to do appropriate model selection. Andy> -----Original Message----- > From: Nirmal Govind [mailto:nirmalg at psu.edu] > Sent: Monday, April 21, 2003 7:24 PM > To: r-help at stat.math.ethz.ch > Subject: [R] significant terms in spline model using GAM > > > Hi.. I'm using gam() to fit a spline model for a data set > that has two predictor > variables (say A and B). The results indicate that the higher > order interaction > terms are significant. The R^2 jumps from .5 to .9 when I > change the maximum > order for the interaction from 10 to 15 (i.e. (AB)^10 to > (AB)^15). Is there any > way of finding out which of the terms in the model are really > "significant" so > that I could drop some of the terms from the model? > > Thanks, > nirmal > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
On Mon, 21 Apr 2003 19:24:02 -0400 (EDT) Nirmal Govind <nirmalg at psu.edu> wrote:> Hi.. I'm using gam() to fit a spline model for a data set that has two predictor > variables (say A and B). The results indicate that the higher order interaction > terms are significant. The R^2 jumps from .5 to .9 when I change the maximum > order for the interaction from 10 to 15 (i.e. (AB)^10 to (AB)^15). Is there any > way of finding out which of the terms in the model are really "significant" so > that I could drop some of the terms from the model? > > Thanks, > nirmal >Dropping insignificant terms in that way is usually bad statistical practice. --- Frank E Harrell Jr Prof. of Biostatistics & Statistics Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences U. Virginia School of Medicine http://hesweb1.med.virginia.edu/biostat
> Hi.. I'm using gam() to fit a spline model for a data set that has two predictor > variables (say A and B). The results indicate that the higher order interaction > terms are significant. The R^2 jumps from .5 to .9 when I change the maximum > order for the interaction from 10 to 15 (i.e. (AB)^10 to (AB)^15).- This is perhaps not the best way of thinking about the interaction terms, there are certainly no terms like (AB)^10 or (AB)^15 in the basis produced by s(A,B,k=10 or 15).> Is there any > way of finding out which of the terms in the model are really "significant" so > that I could drop some of the terms from the model?The default model selection used by gam() is GCV, a mean square error criterion, and I'm not sure how useful it is to mix model selection by hypothesis testing with GCV model selection. I think that your results indicate that in GCV terms your original choice of k=10 was too restrictive. If you want to do model selection by hypothesis testing you can - s(A,B,k=10,fx=TRUE) is nested within s(A,B,k=15,fx=TRUE), for example - however the process is not automated - you would have to construct F-ratios (or deviance differences) yourself from the response data and the fitted values. best, Simon _____________________________________________________________________> Simon Wood simon at stats.gla.ac.uk www.stats.gla.ac.uk/~simon/ >> Department of Statistics, University of Glasgow, Glasgow, G12 8QQ >>> Direct telephone: (0)141 330 4530 Fax: (0)141 330 4814