Hi I was wondering if anyone knew how to work out the number of knots that should be applied to each variable when using gams in the mgcv library? Any help or references would be much appreciated. Thanks Kathryn Baldwin
This is the "knotty" problem of overfitting/balancing variance and bias. There is no easy answer, but one obvious place to look for words of wisdom is the Hastie/Tibshirani book, "Generalized Additive Models," #43 in the Chapman and Hall little green book series. See especially Chapter 3. Bert Gunter Genentech Nonclinical Statistics South San Francisco, CA 94404 -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kathryn Baldwin Sent: Tuesday, November 28, 2006 2:40 PM To: r-help at stat.math.ethz.ch Subject: [R] GAMS and Knots Hi I was wondering if anyone knew how to work out the number of knots that should be applied to each variable when using gams in the mgcv library? Any help or references would be much appreciated. Thanks Kathryn Baldwin ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Kathryn, I very warmly recommend Simon Wood's book on the subject. Here is a link to the book information on Amazon (about which no recommendation should be inferred!) http://www.amazon.com/Generalized-Additive-Models-Statistical-Science/dp/1584884746/sr=8-2/qid=1164758688/ref=pd_bbs_sr_2/103-6675834-9672605?ie=UTF8&s=books Cheers Andrew On Wed, Nov 29, 2006 at 11:40:20AM +1300, Kathryn Baldwin wrote:> Hi > I was wondering if anyone knew how to work out the number of knots that > should be applied to each variable when using gams in the mgcv library? > Any help or references would be much appreciated. > Thanks > Kathryn Baldwin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Andrew Robinson Department of Mathematics and Statistics Tel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 http://www.ms.unimelb.edu.au/~andrewpr http://blogs.mbs.edu/fishing-in-the-bay/
On Wed, 2006-11-29 at 11:40 +1300, Kathryn Baldwin wrote:> Hi > I was wondering if anyone knew how to work out the number of knots that > should be applied to each variable when using gams in the mgcv library? > Any help or references would be much appreciated. > Thanks > Kathryn Baldwinmgcv works out an optimal number of knots to use, using a Generalised Cross-Validation (GCV) routine. Take a look at: Simon N. Wood. mgcv: GAMs and generalized ridge regression for R. R News, 1(2):20-25, June 2001. And Simon's new book: Simon N. Wood. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC, Boca Raton, FL, 2006. ISBN 1-584-88474-6. For further info on using mgcv for GAMs. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC & ENSIS, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/cv/ London, UK. WC1E 6BT. [w] http://www.ucl.ac.uk/~ucfagls/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
The number of knots is really one of the modelling assumptions. Provided you don't make it restrictively small then the results should not be very sensitive to this assumption, since the actual degrees of freedom of each smooth are determined by how heavily the smooth is penalized, rather than simply being the number of knots. The degree of penalization is selected automatically by GCV or AIC/UBRE. See ?choose.k in the `mgcv' help files for more information on this, as well as information on checking whether the assumed number of knots was large enough. Simon On Tuesday 28 November 2006 22:40, Kathryn Baldwin wrote:> Hi > I was wondering if anyone knew how to work out the number of knots that > should be applied to each variable when using gams in the mgcv library? > Any help or references would be much appreciated. > Thanks > Kathryn Baldwin > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, minimal, > self-contained, reproducible code.--> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK > +44 1225 386603 www.maths.bath.ac.uk/~sw283