thr3ads.net - R help - [R] Predicted values from glm() when linear predictor is NA. [Aug 2022]

If this information is useful, please help other people find it:
Share via:

David Winsemius

2022-Jul-28 01:25 UTC

[R] Predicted values from glm() when linear predictor is NA.

On 7/27/22 17:26, Rolf Turner wrote:> I have a data frame with a numeric ("TrtTime") and a categorical
> ("Lifestage") predictor.
>
> Level "L1" of Lifestage occurs only with a single value of
TrtTime,
> explicitly 12, whence it is not possible to estimate a TrtTime
"slope"
> when Lifestage is "L1".
>
> Indeed, when I fitted the model
>
>      fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
>                 data=demoDat)
>
> I got:
>
>> as.matrix(coef(fit))
>>                                    [,1]
>> (Intercept)                -0.91718302
>> TrtTime                     0.88846195
>> LifestageEgg + L1         -45.36420974
>> LifestageL1                14.27570572
>> LifestageL1 + L2           -0.30332697
>> LifestageL3                -3.58672631
>> TrtTime:LifestageEgg + L1   8.10482459
>> TrtTime:LifestageL1                 NA
>> TrtTime:LifestageL1 + L2    0.05662651
>> TrtTime:LifestageL3         1.66743472
> That is, TrtTime:LifestageL1 is NA, as expected.
>
> I would have thought that fitted or predicted values corresponding to
> Lifestage = "L1" would thereby be NA, but this is not the case:
>
>> predict(fit)[demoDat$Lifestage=="L1"]
>>        26       65      131
>> 24.02007 24.02007 24.02007
>>
>> fitted(fit)[demoDat$Lifestage=="L1"]
>>   26  65 131
>>    1   1   1
> That is, the predicted values on the scale of the linear predictor are
> large and positive, rather than being NA.
>
> What this amounts to, it seems to me, is saying that if the linear
> predictor in a Binomial glm is NA, then "success" is a certainty.
> This strikes me as being a dubious proposition.  My gut feeling is that
> misleading results could be produced.
The NA is most likely caused by aliasing, so some other combination of 
factors a perfect surrogate for every case with that level of the 
interaction. The `predict.glm` function always requires a complete set 
of values to construct a case. Whether apparent incremental linear 
prediction of that interaction term is large or small will depend on the 
degree of independent contribution of the surrogate levels of other 
variables..


David.
>
> Can anyone explain to me a rationale for this behaviour pattern?
> Is there some justification for it that I am not currently seeing?
> Any other comments?  (Please omit comments to the effect of "You are
as
> thick as two short planks!". :-) )
>
> I have attached the example data set in a file "demoDat.txt",
should
> anyone want to experiment with it.  The file was created using dput() so
> you should access it (if you wish to do so) via something like
>
>      demoDat <- dget("demoDat.txt")
>
> Thanks for any enlightenment.
>
> cheers,
>
> Rolf Turner
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Rolf Turner

2022-Jul-28 02:19 UTC

head link

[R] Predicted values from glm() when linear predictor is NA.

On Wed, 27 Jul 2022 18:25:23 -0700
David Winsemius <dwinsemius at comcast.net> wrote:

<SNIP>
> The NA is most likely caused by aliasing, so some other combination
> of factors a perfect surrogate for every case with that level of the 
> interaction. 
<SNIP>

No, I think it's much simpler than that.  Essentially it boils down to
the fact that if y is 1:10 and x is rep(1,10) then

    lm(y ~ x)

gives a slope estimate of NA.  As it clearly should.

cheers,

Rolf

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

Martin Maechler

2022-Aug-02 07:10 UTC

head link

[R] Predicted values from glm() when linear predictor is NA.

>>>>> Rolf Turner 
>>>>>     on Thu, 28 Jul 2022 14:19:49 +1200 writes:
    > On Wed, 27 Jul 2022 18:25:23 -0700 David Winsemius
    > <dwinsemius at comcast.net> wrote:

    > <SNIP>

    >> The NA is most likely caused by aliasing, so some other
    >> combination of factors a perfect surrogate for every case
    >> with that level of the interaction.

    > <SNIP>

    > No, I think it's much simpler than that.  Essentially it
    > boils down to the fact that if y is 1:10 and x is
    > rep(1,10) then

    >     lm(y ~ x)

    > gives a slope estimate of NA.  As it clearly should.

*and*  predict() should surely *not* be NA, either, right ??

You can easily confirm with

 x <- rep(1,7); y <- 1:7; summary(fm <- lm(y~x))
 predict(fm)

which gives all '4'  (= mean(y))

Note that this *is* aliasing:  x is aliased with the intercept,
and it fits entirely your situation:

If a predictor variable is superfluous, i.e., more precisely, a
column of the X design matrix is an exact linear combination of
previous columns,
prediction is no problem at all and gives exactly what you'd get
if you omit that superfluous column/variable.

Are you sure this is not simply your situation?
Martin

    > cheers,
    > Rolf

    > -- 
    > Honorary Research Fellow Department of Statistics
    > University of Auckland Phone: +64-9-373-7599 ext. 88276

R help - Aug 2022 - Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.