r-help.20.trevva at spamgourmet.com
2012-Jun-21 00:28 UTC
[R] MGCV: Use of irls.reg option
Hi, In the help files in the ?mgcv package for the gam.control() function, there is an option irls.reg. The help files describe this option as: For most models this should be 0. The iteratively re-weighted least squares method by which GAMs are fitted can fail to converge in some circumstances. For example, data with many zeroes can cause problems in a model with a log link, because a mean of zero corresponds to an infinite range of linear predictor values. Such convergence problems are caused by a fundamental lack of identifiability, but do not show up as lack of identifiability in the penalized linear model problems that have to be solved at each stage of iteration. In such circumstances it is possible to apply a ridge regression penalty to the model to impose identifiability, and irls.reg is the size of the penalty. I am trying to fit a poisson GLM model with a log-link function and am having problems similar to those described - in particular, the model has a spatial s(lon,lat) term and there are lot of zeros around the edges of my domain which are making the TPRS do strange thing. It sounds like irls.reg might be the answer to my problems. The question I have is how to use it? What is an appropriate value? I can't seem to find any more information than that provided, and I don't know if I really understand what it is doing. Are there any examples or references on this that I have overlooked during my googling that could help? Best wishes, Mark Payne DTU Aqua, Copenhagen, Denmark
Hi Mark, irls.reg is kind of `legacy code'. Does model fitting actually fail for your example, or is it just that the estimated spatial smooth looks unpleasant? best, Simon On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com wrote:> Hi, > > In the help files in the mgcv package for the gam.control() function, > there is an option irls.reg. The help files describe this option as: > > For most models this should be 0. The iteratively re-weighted least squares > method by which GAMs are fitted can fail to converge in some circumstances. > For example, data with many zeroes can cause problems in a model with a log > link, because a mean of zero corresponds to an infinite range of > linear predictor > values. Such convergence problems are caused by a fundamental lack of > identifiability, but do not show up as lack of identifiability in the > penalized linear > model problems that have to be solved at each stage of iteration. In such > circumstances it is possible to apply a ridge regression penalty to the model to > impose identifiability, and irls.reg is the size of the penalty. > > I am trying to fit a poisson GLM model with a log-link function and am > having problems similar to those described - in particular, the model > has a spatial s(lon,lat) term and there are lot of zeros around the > edges of my domain which are making the TPRS do strange thing. It > sounds like irls.reg might be the answer to my problems. The question > I have is how to use it? What is an appropriate value? I can't seem to > find any more information than that provided, and I don't know if I > really understand what it is doing. Are there any examples or > references on this that I have overlooked during my googling that > could help? > > Best wishes, > > Mark Payne > DTU Aqua, > Copenhagen, Denmark > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
r-help.20.trevva at spamgourmet.com
2012-Jun-22 08:13 UTC
[R] MGCV: Use of irls.reg option
Hi Simon, Thanks for taking the time to reply. Please let me explain a few more details. The problem that I am working on is essentially the same as the Bristol Channel Sole Egg distribution example in your book and in the "soap" paper but instead it is Herring Larvae in the English Channel - same same, but different. The model structure is: mdl <- gam(nlarv ~ s(lon,lat) + s(day of year) + factor(year),data=dat, family=poisson(log="link")) ie a multiplicative structure with a poisson observation model. Now, the problem is that at the edges of the distribution in space (lon, lat) (and to a lesser extent time) the observations are rich in zeros, as we move away from the main spawning grounds. The model appears to be converging ok, but the residuals look horrible. In particular, there are some extremely large residuals around the edges (pearson residuals of 1000 or so), where we get a few larvae in a region where they are otherwise unlikely. When I look at the TPRS (on the linear predictor scale) it appears to be heading towards minus infinity - essentially we end up in a situation where we observe a single larvae, but the expected mean number is 1e-10, which creates these very large residuals. This was where I happened across the irls.reg argument - the description in the help file (i.e. lack of identifiability) sounds very much like the problem that I am having, which is what inspired the question. I've also tried using the "soap" smoother in place of the TPRS - the problem is not as severe, and I can limit it by making the boundaries of the soap film extremely tight around the non-zero data but the same underlying problem is still lurking in the corners... Do you have any suggestions as to how I can get around this edge-effects problem? Mark ---- Hi Mark, irls.reg is kind of `legacy code'. Does model fitting actually fail for your example, or is it just that the estimated spatial smooth looks unpleasant? best, Simon On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com wrote:> Hi, > > In the help files in the mgcv package for the gam.control() function, > there is an option irls.reg. The help files describe this option as: > > For most models this should be 0. The iteratively re-weighted least squares > method by which GAMs are fitted can fail to converge in some circumstances. > For example, data with many zeroes can cause problems in a model with a log > link, because a mean of zero corresponds to an infinite range of > linear predictor > values. Such convergence problems are caused by a fundamental lack of > identifiability, but do not show up as lack of identifiability in the > penalized linear > model problems that have to be solved at each stage of iteration. In such > circumstances it is possible to apply a ridge regression penalty to the model to > impose identifiability, and irls.reg is the size of the penalty. > > I am trying to fit a poisson GLM model with a log-link function and am > having problems similar to those described - in particular, the model > has a spatial s(lon,lat) term and there are lot of zeros around the > edges of my domain which are making the TPRS do strange thing. It > sounds like irls.reg might be the answer to my problems. The question > I have is how to use it? What is an appropriate value? I can't seem to > find any more information than that provided, and I don't know if I > really understand what it is doing. Are there any examples or > references on this that I have overlooked during my googling that > could help? > > Best wishes, > > Mark Payne > DTU Aqua, > Copenhagen, Denmark >