thr3ads.net - R help - [R] GAM, GLM, Logit, infinite or missing values in 'x' [Jan 2008]

If this information is useful, please help other people find it:
Share via:

Anders Schwartz Corr

2008-Jan-08 06:35 UTC

[R] GAM, GLM, Logit, infinite or missing values in 'x'

Hi,

I'm running gam (mgcv version 1.3-29) and glm (logit) (stats R 2.61) on 
the same models/data, and I got error messages for the gam() model and 
warnings for the glm() model.

R-help suggested that the glm() warning messages are due to the model 
perfectly predicting binary output. Perhaps the model overfits the data? I 
inspected my data and it was not immediately obvious to me (though I guess 
it will be to some of the more pointed of you) how this would be the case.

The gam() errors vanish when I delete one covariate (it doesn't matter 
which one). Can I write a loop into the code such that if an error is 
returned (is.error() doesn't seem to exist unfortunately) then I pare off 
one of the covariates and rerun the gam()? That would be ideal. I could 
set options(error = f()) in which f() reruns the gam with 
one fewer covariate until it works, but the gam is in a bunch of loops 
that would break given the error and I would like to figure out another 
option.

My glm and gam models are below. Any suggestions are very much 
appreciated.

Best,

Anders
> form.logitoutbinary ~ a_norm_total2 + I(a_norm_total2^2) + prop + igoprop +
     gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
     diversity + cincOter + polity2
> form.glogitoutbinary ~ s(a_norm_total2) + s(prop) + s(prop, by = a_norm_total2) +
     igoprop + gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
     diversity + cincOter + polity2

GAM error message:
avt.2glogit<-gam(form.glogit, data=dataS, na.action=na.omit,family=binomial)
Error in eigen(hess1, symmetric = TRUE) :
   infinite or missing values in 'x'
Calls: gam -> gam.outer -> newton -> eigen

GLM warnings:
There were 29 warnings (use warnings() to see them)> warnings()Warning messages:
1: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
   fitted probabilities numerically 0 or 1 occurred
2: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
   fitted probabilities numerically 0 or 1 occurred
3: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
   fitted probabilities numerically 0 or 1 occurred
4: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
   fitted probabilities numerically 0 or 1 occurred

Prof Brian Ripley

2008-Jan-08 08:18 UTC

head link

[R] GAM, GLM, Logit, infinite or missing values in 'x'

On Tue, 8 Jan 2008, Anders Schwartz Corr wrote:
>
> Hi,
>
> I'm running gam (mgcv version 1.3-29) and glm (logit) (stats R 2.61) on
> the same models/data, and I got error messages for the gam() model and
> warnings for the glm() model.
>
> R-help suggested that the glm() warning messages are due to the model
> perfectly predicting binary output. Perhaps the model overfits the data? I
> inspected my data and it was not immediately obvious to me (though I guess
> it will be to some of the more pointed of you) how this would be the case.
Only the clairvoyant, given that you didn't supply the data.  But this 
concept of complete/partial separation is well-known in certain fields 
(more in AI than in statistics).  See my PRNN book for a comprehensive 
account, and

@Book{Santner.Duffy.89,
   author       = "T. J. Santner and D. E. Duffy",
   title        = "The Statistical Analysis of Discrete Data",
   publisher    = "Springer-Verlag",
   address      = "New York",
   year         = "1989",
   ISBN         = "0-387-97018-5",
   comment      = "Reference from MASS",
}

for a statistical book that covers it.
> The gam() errors vanish when I delete one covariate (it doesn't matter
> which one). Can I write a loop into the code such that if an error is
> returned (is.error() doesn't seem to exist unfortunately) then I pare
off
See ?try : try comes very close to is.error.
Seealso ?tryCatch
> one of the covariates and rerun the gam()? That would be ideal. I could
> set options(error = f()) in which f() reruns the gam with
> one fewer covariate until it works, but the gam is in a bunch of loops
> that would break given the error and I would like to figure out another
> option.
>
> My glm and gam models are below. Any suggestions are very much
> appreciated.
>
> Best,
>
> Anders
>
>> form.logit
> outbinary ~ a_norm_total2 + I(a_norm_total2^2) + prop + igoprop +
>     gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
>     diversity + cincOter + polity2
>
>> form.glogit
> outbinary ~ s(a_norm_total2) + s(prop) + s(prop, by = a_norm_total2) +
>     igoprop + gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
>     diversity + cincOter + polity2
>
> GAM error message:
> avt.2glogit<-gam(form.glogit, data=dataS,
na.action=na.omit,family=binomial)
> Error in eigen(hess1, symmetric = TRUE) :
>   infinite or missing values in 'x'
> Calls: gam -> gam.outer -> newton -> eigen
>
> GLM warnings:
> There were 29 warnings (use warnings() to see them)
>> warnings()
> Warning messages:
> 1: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>   fitted probabilities numerically 0 or 1 occurred
> 2: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>   fitted probabilities numerically 0 or 1 occurred
> 3: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>   fitted probabilities numerically 0 or 1 occurred
> 4: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>   fitted probabilities numerically 0 or 1 occurred
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Simon Wood

2008-Jan-08 09:04 UTC

head link

[R] GAM, GLM, Logit, infinite or missing values in 'x'

Anders,

A very flexible logistic regression model is quite often able to predict some 
subset of the data `perfectly' with the unfurtunate consequence that the 
corresponding linear predictor is not well defined. My guess would be that 
the extra flexibility of the gam is what is causing slightly more trouble 
with `gam' than with `glm'. 

That said the  `gam' fitting methods are supposed to deal with the numerical
consequences of such indefiniteness more elegantly than is the case for your 
example. Any chance that you could send me the data so that I can dig into 
this a bit more, and see if whatever is causing the error could be dealt with 
more gracefully (or at least trapped earlier)? [If so, a text file usually 
seems most painless, and I of course undertake not to do anything with the 
data except investigate this issue].

best,
Simon

On Tuesday 08 January 2008 06:35, Anders Schwartz Corr
wrote:> Hi,
>
> I'm running gam (mgcv version 1.3-29) and glm (logit) (stats R 2.61) on
> the same models/data, and I got error messages for the gam() model and
> warnings for the glm() model.
>
> R-help suggested that the glm() warning messages are due to the model
> perfectly predicting binary output. Perhaps the model overfits the data? I
> inspected my data and it was not immediately obvious to me (though I guess
> it will be to some of the more pointed of you) how this would be the case.
>
> The gam() errors vanish when I delete one covariate (it doesn't matter
> which one). Can I write a loop into the code such that if an error is
> returned (is.error() doesn't seem to exist unfortunately) then I pare
off
> one of the covariates and rerun the gam()? That would be ideal. I could
> set options(error = f()) in which f() reruns the gam with
> one fewer covariate until it works, but the gam is in a bunch of loops
> that would break given the error and I would like to figure out another
> option.
>
> My glm and gam models are below. Any suggestions are very much
> appreciated.
>
> Best,
>
> Anders
>
> > form.logit
>
> outbinary ~ a_norm_total2 + I(a_norm_total2^2) + prop + igoprop +
>      gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
>      diversity + cincOter + polity2
>
> > form.glogit
>
> outbinary ~ s(a_norm_total2) + s(prop) + s(prop, by = a_norm_total2) +
>      igoprop + gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
>      diversity + cincOter + polity2
>
> GAM error message:
> avt.2glogit<-gam(form.glogit, data=dataS,
> na.action=na.omit,family=binomial) Error in eigen(hess1, symmetric = TRUE)
> :
>    infinite or missing values in 'x'
> Calls: gam -> gam.outer -> newton -> eigen
>
> GLM warnings:
> There were 29 warnings (use warnings() to see them)
>
> > warnings()
>
> Warning messages:
> 1: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
> 2: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
> 3: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
> 4: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.
-- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

Simon Wood

2008-Feb-05 10:57 UTC

head link

[R] GAM, GLM, Logit, infinite or missing values in 'x'

Anders,

Thanks for sending the data. The fix is to reduce the convergence tolerance 
`epsilon' in the `control' argument to `gam' (1e-8 is fine).
I'll put a trap
and informative error message into a future mgcv release.

Here's what happens. The model is quite ill conditioned (there's near 
collinearity in the parametric terms) and there is quite slow convergence of 
the IRLS scheme used for fitting. With the default convergence tolerances the 
parameter values are insufficiently converged to be fed into the iteration 
that finds the derivatives of the AIC/UBRE score with respect to the 
smoothing parameters. The derivative iterations diverge (ill conditioning 
introduces sensitivity to the slight error in the parameter estimates from 
the IRLS, and it doesn't help that the linear predictor is "practically
infinite" in places). Tightening the convergence tolerances in the IRLS 
improves the parameter estimates sufficiently for the derivative iterations 
to converge. 

best,
Simon 



On Tuesday 08 January 2008 06:35, Anders Schwartz Corr
wrote:> Hi,
>
> I'm running gam (mgcv version 1.3-29) and glm (logit) (stats R 2.61) on
> the same models/data, and I got error messages for the gam() model and
> warnings for the glm() model.
>
> R-help suggested that the glm() warning messages are due to the model
> perfectly predicting binary output. Perhaps the model overfits the data? I
> inspected my data and it was not immediately obvious to me (though I guess
> it will be to some of the more pointed of you) how this would be the case.
>
> The gam() errors vanish when I delete one covariate (it doesn't matter
> which one). Can I write a loop into the code such that if an error is
> returned (is.error() doesn't seem to exist unfortunately) then I pare
off
> one of the covariates and rerun the gam()? That would be ideal. I could
> set options(error = f()) in which f() reruns the gam with
> one fewer covariate until it works, but the gam is in a bunch of loops
> that would break given the error and I would like to figure out another
> option.
>
> My glm and gam models are below. Any suggestions are very much
> appreciated.
>
> Best,
>
> Anders
>
> > form.logit
>
> outbinary ~ a_norm_total2 + I(a_norm_total2^2) + prop + igoprop +
>      gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
>      diversity + cincOter + polity2
>
> > form.glogit
>
> outbinary ~ s(a_norm_total2) + s(prop) + s(prop, by = a_norm_total2) +
>      igoprop + gpconc + ter + open + igototal + cinc.nmc + demsOnumstat +
>      diversity + cincOter + polity2
>
> GAM error message:
> avt.2glogit<-gam(form.glogit, data=dataS,
> na.action=na.omit,family=binomial) Error in eigen(hess1, symmetric = TRUE)
> :
>    infinite or missing values in 'x'
> Calls: gam -> gam.outer -> newton -> eigen
>
> GLM warnings:
> There were 29 warnings (use warnings() to see them)
>
> > warnings()
>
> Warning messages:
> 1: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
> 2: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
> 3: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
> 4: In glm.fit(x = X, y = Y, weights = weights, start = start,  ... :
>    fitted probabilities numerically 0 or 1 occurred
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.
-- > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283

Reasonably Related Threads

Search for more maybe matching threads

R help - Jan 2008 - GAM, GLM, Logit, infinite or missing values in 'x'

[R] GAM, GLM, Logit, infinite or missing values in 'x'

[R] GAM, GLM, Logit, infinite or missing values in 'x'

[R] GAM, GLM, Logit, infinite or missing values in 'x'

[R] GAM, GLM, Logit, infinite or missing values in 'x'

Reasonably Related Threads