thr3ads.net - R help - [R] mgcv::gam() scale parameter estimates for quasibinomial error models [Apr 2021]

If this information is useful, please help other people find it:
Share via:

John Maindonald

2021-Apr-15 05:39 UTC

[R] mgcv::gam() scale parameter estimates for quasibinomial error models

For both glm() and mgcv::gam() quasibinomial error models, the summary
object has a dispersion value that has the same role as sigma^2 in the
summary object for lm() model fits.

Where some fitted probabilities are small, the `gam()` default scale parameter 
estimates, returned as `scale` (and `sig2`) in the gam object and as
?dispersion"
in the summary object, can differ wildly from the Pearson values that `glm()`
works with, and that can be obtained by calling `gam()` with a suitable control
setting (see the code that now follows.)

The following demonstrates the issue:

  ## ?mgcv? version 1.8-34
  Cholera <- HistData:: Cholera
  library(mgcv)
  form <- cbind(cholera_deaths, popn-cholera_deaths) ~ 
     water + elevation + poor_rate
  default.gam <- gam(form, data=Cholera, family=quasibinomial)
  pearson.gam <- update(quasibin.gam,
control=gam.control(scale.est=?pearson"))

  c(Pearson=pearson.gam$scale, Default=default.gam$scale)
  ##  Pearson     Default 
  ## 33.545829  2.919535 

My own calculation (from either fitted model), returns 30.07 for the
(Fletcher 2012) version of the dispersion that was, according to 
Wood?s ?Generalized Additive Models? (2nd edn, 2017, p.111),
returned as the GAM scale estimate at the time when the book was 
written. 

The default scale estimates returned by `gam()` vary wildly, relative 
to the relatively stable ?pearson" estimates, in data that are simulated
to have comparable dispersion estimates.

For the Cholera data, it would make good sense to fit a model
with quasipoisson errors to the death counts, using log(popn) as an
offset.  The GAM model then uses, with default setting for `scale.est`, 
a scale parameter that is close to that returned by "pearson.gam?. 
SEs (as well as coefficients) are similar to those returned by 
"pearson.gam?. 

The detailed calculations that are documented in the following 
documents may be of some general interest.
  https://www.dropbox.com/s/vl9usat07urbgel/quasibin-gam.pdf?dl=0
  or https://www.dropbox.com/s/s83mh1mut5xc3gk/quasibin-gam.html?dl=0

I am posting this here now before posting a bug report in case I have
missed something important.  It would be useful to be directed to the 
mgcv code used for calculation of what is returned as the Fletcher statistic.

  Rmd file: https://www.dropbox.com/s/ghmsdcvgxp068bs/quasibin-gam.Rmd?dl=0
  .bib file: https://www.dropbox.com/s/r1yjqx0sni2pzjy/quasi.bib?dl=0

John Maindonald             email: john.maindonald at anu.edu.au

Simon Wood

2021-Apr-15 08:56 UTC

head link

[R] mgcv::gam() scale parameter estimates for quasibinomial error models

Thanks John. It's a bug in weights handling. mgcv will give wrong scale 
parameter estimates for weighted models where the scale parameter is 
unknown, (except Gaussian, fortunately). quasibinomial with trials > 1 
is one such case, because the weights are used to store the number of 
trials. Otherwise the problem would only arise if weights were 
explicitly provided in a non-Gaussian model with unknown scale 
parameter. Fixed for 1. 8-35. best,

Simon

On 15/04/2021 06:39, John Maindonald wrote:> For both glm() and mgcv::gam() quasibinomial error models, the summary
> object has a dispersion value that has the same role as sigma^2 in the
> summary object for lm() model fits.
>
> Where some fitted probabilities are small, the `gam()` default scale
parameter
> estimates, returned as `scale` (and `sig2`) in the gam object and as
?dispersion"
> in the summary object, can differ wildly from the Pearson values that
`glm()`
> works with, and that can be obtained by calling `gam()` with a suitable
control
> setting (see the code that now follows.)
>
> The following demonstrates the issue:
>
>    ## ?mgcv? version 1.8-34
>    Cholera <- HistData:: Cholera
>    library(mgcv)
>    form <- cbind(cholera_deaths, popn-cholera_deaths) ~
>       water + elevation + poor_rate
>    default.gam <- gam(form, data=Cholera, family=quasibinomial)
>    pearson.gam <- update(quasibin.gam,
control=gam.control(scale.est=?pearson"))
>
>    c(Pearson=pearson.gam$scale, Default=default.gam$scale)
>    ##  Pearson     Default
>    ## 33.545829  2.919535
>
> My own calculation (from either fitted model), returns 30.07 for the
> (Fletcher 2012) version of the dispersion that was, according to
> Wood?s ?Generalized Additive Models? (2nd edn, 2017, p.111),
> returned as the GAM scale estimate at the time when the book was
> written.
>
> The default scale estimates returned by `gam()` vary wildly, relative
> to the relatively stable ?pearson" estimates, in data that are
simulated
> to have comparable dispersion estimates.
>
> For the Cholera data, it would make good sense to fit a model
> with quasipoisson errors to the death counts, using log(popn) as an
> offset.  The GAM model then uses, with default setting for `scale.est`,
> a scale parameter that is close to that returned by "pearson.gam?.
> SEs (as well as coefficients) are similar to those returned by
> "pearson.gam?.
>
> The detailed calculations that are documented in the following
> documents may be of some general interest.
>    https://www.dropbox.com/s/vl9usat07urbgel/quasibin-gam.pdf?dl=0
>    or https://www.dropbox.com/s/s83mh1mut5xc3gk/quasibin-gam.html?dl=0
>
> I am posting this here now before posting a bug report in case I have
> missed something important.  It would be useful to be directed to the
> mgcv code used for calculation of what is returned as the Fletcher
statistic.
>
>    Rmd file:
https://www.dropbox.com/s/ghmsdcvgxp068bs/quasibin-gam.Rmd?dl=0
>    .bib file: https://www.dropbox.com/s/r1yjqx0sni2pzjy/quasi.bib?dl=0
>
> John Maindonald             email: john.maindonald at anu.edu.au
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
Simon Wood, School of Mathematics, University of Edinburgh,
https://www.maths.ed.ac.uk/~swood34/

R help - Apr 2021 - mgcv::gam() scale parameter estimates for quasibinomial error models

[R] mgcv::gam() scale parameter estimates for quasibinomial error models

[R] mgcv::gam() scale parameter estimates for quasibinomial error models