thr3ads.net - R help - [R] MGCV: Use of irls.reg option [Jun 2012]

If this information is useful, please help other people find it:
Share via:

r-help.20.trevva at spamgourmet.com

2012-Jun-21 00:28 UTC

[R] MGCV: Use of irls.reg option

Hi,

In the help files in the ?mgcv package for the gam.control() function,
there is an option irls.reg. The help files describe this option as:

For most models this should be 0. The iteratively re-weighted least squares
method by which GAMs are fitted can fail to converge in some circumstances.
For example, data with many zeroes can cause problems in a model with a log
link, because a mean of zero corresponds to an infinite range of
linear predictor
values. Such convergence problems are caused by a fundamental lack of
identifiability, but do not show up as lack of identifiability in the
penalized linear
model problems that have to be solved at each stage of iteration. In such
circumstances it is possible to apply a ridge regression penalty to the model to
impose identifiability, and irls.reg is the size of the penalty.

I am trying to fit a poisson GLM model with a log-link function and am
having problems similar to those described - in particular, the model
has a spatial s(lon,lat) term and there are lot of zeros around the
edges of my domain which are making the TPRS do strange thing. It
sounds like irls.reg might be the answer to my problems. The question
I have is how to use it? What is an appropriate value? I can't seem to
find any more information than that provided, and I don't know if I
really understand what it is doing. Are there any examples or
references on this that I have overlooked during my googling that
could help?

Best wishes,

Mark Payne
DTU Aqua,
Copenhagen, Denmark

Simon Wood

2012-Jun-21 16:11 UTC

head link

[R] MGCV: Use of irls.reg option

Hi Mark,

irls.reg is kind of `legacy code'. Does model fitting actually fail for 
your example, or is it just that the
estimated spatial smooth looks unpleasant?

best,
Simon


On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com
wrote:> Hi,
>
> In the help files in the  mgcv package for the gam.control() function,
> there is an option irls.reg. The help files describe this option as:
>
> For most models this should be 0. The iteratively re-weighted least squares
> method by which GAMs are fitted can fail to converge in some circumstances.
> For example, data with many zeroes can cause problems in a model with a log
> link, because a mean of zero corresponds to an infinite range of
> linear predictor
> values. Such convergence problems are caused by a fundamental lack of
> identifiability, but do not show up as lack of identifiability in the
> penalized linear
> model problems that have to be solved at each stage of iteration. In such
> circumstances it is possible to apply a ridge regression penalty to the
model to
> impose identifiability, and irls.reg is the size of the penalty.
>
> I am trying to fit a poisson GLM model with a log-link function and am
> having problems similar to those described - in particular, the model
> has a spatial s(lon,lat) term and there are lot of zeros around the
> edges of my domain which are making the TPRS do strange thing. It
> sounds like irls.reg might be the answer to my problems. The question
> I have is how to use it? What is an appropriate value? I can't seem to
> find any more information than that provided, and I don't know if I
> really understand what it is doing. Are there any examples or
> references on this that I have overlooked during my googling that
> could help?
>
> Best wishes,
>
> Mark Payne
> DTU Aqua,
> Copenhagen, Denmark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

r-help.20.trevva at spamgourmet.com

2012-Jun-22 08:13 UTC

head link

[R] MGCV: Use of irls.reg option

Hi Simon,

Thanks for taking the time to reply. Please let me explain a few more details.

The problem that I am working on is essentially the same as the
Bristol Channel Sole Egg distribution example in your book and in the
"soap" paper but instead it is Herring Larvae in the English Channel -
same same, but different. The model structure is:

mdl <- gam(nlarv ~ s(lon,lat)  + s(day of year) +
factor(year),data=dat, family=poisson(log="link"))

ie a multiplicative structure with a poisson observation model. Now,
the problem is that at the edges of the distribution in space (lon,
lat) (and to a lesser extent time) the observations are rich in zeros,
as we move away from the main spawning grounds. The model appears to
be converging ok, but the residuals look horrible. In particular,
there are some extremely large residuals around the edges (pearson
residuals of 1000 or so), where we get a few larvae in a region where
they are otherwise unlikely. When I look at the TPRS (on the linear
predictor scale) it appears to be heading towards minus infinity -
essentially we end up in a situation where we observe a single larvae,
but the expected mean number is 1e-10, which creates these very large
residuals. This was where I happened across the irls.reg argument -
the description in the help file (i.e. lack of identifiability) sounds
very much like the problem that I am having, which is what inspired
the question.

I've also tried using the "soap" smoother in place of the TPRS -
the
problem is not as severe, and I can limit it by making the boundaries
of the soap film extremely tight around the non-zero data but the same
underlying problem is still lurking in the corners...

Do you have any suggestions as to how I can get around this
edge-effects problem?

Mark
----

Hi Mark,

irls.reg is kind of `legacy code'. Does model fitting actually fail for
your example, or is it just that the
estimated spatial smooth looks unpleasant?

best,
Simon

On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com
wrote:> Hi,
>
> In the help files in the  mgcv package for the gam.control() function,
> there is an option irls.reg. The help files describe this option as:
>
> For most models this should be 0. The iteratively re-weighted least squares
> method by which GAMs are fitted can fail to converge in some circumstances.
> For example, data with many zeroes can cause problems in a model with a log
> link, because a mean of zero corresponds to an infinite range of
> linear predictor
> values. Such convergence problems are caused by a fundamental lack of
> identifiability, but do not show up as lack of identifiability in the
> penalized linear
> model problems that have to be solved at each stage of iteration. In such
> circumstances it is possible to apply a ridge regression penalty to the
model to
> impose identifiability, and irls.reg is the size of the penalty.
>
> I am trying to fit a poisson GLM model with a log-link function and am
> having problems similar to those described - in particular, the model
> has a spatial s(lon,lat) term and there are lot of zeros around the
> edges of my domain which are making the TPRS do strange thing. It
> sounds like irls.reg might be the answer to my problems. The question
> I have is how to use it? What is an appropriate value? I can't seem to
> find any more information than that provided, and I don't know if I
> really understand what it is doing. Are there any examples or
> references on this that I have overlooked during my googling that
> could help?
>
> Best wishes,
>
> Mark Payne
> DTU Aqua,
> Copenhagen, Denmark
>

Possibly Parallel Threads

Search for more possibly parallel threads

R help - Jun 2012 - MGCV: Use of irls.reg option

[R] MGCV: Use of irls.reg option

[R] MGCV: Use of irls.reg option

[R] MGCV: Use of irls.reg option

Possibly Parallel Threads