thr3ads.net - R help - [R] Predicted values from glm() when linear predictor is NA. [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Jeff Newmiller

2022-Jul-28 05:31 UTC

[R] Predicted values from glm() when linear predictor is NA.

But "disappearing" is not what NA is supposed to do normally. Why is
it being treated that way here?

On July 27, 2022 7:04:20 PM PDT, John Fox <jfox at mcmaster.ca>
wrote:>Dear Rolf,
>
>The coefficient of TrtTime:LifestageL1 isn't estimable (as you explain)
and by setting it to NA, glm() effectively removes it from the model. An
equivalent model is therefore
>
>> fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +
>+               I((Lifestage == "Egg + L1")*TrtTime) +
>+               I((Lifestage == "L1 + L2")*TrtTime) +
>+               I((Lifestage == "L3")*TrtTime),
>+             family=binomial, data=demoDat)
>Warning message:
>glm.fit: fitted probabilities numerically 0 or 1 occurred
>
>> cbind(coef(fit, complete=FALSE), coef(fit2))
>                                  [,1]         [,2]
>(Intercept)                -0.91718302  -0.91718302
>TrtTime                     0.88846195   0.88846195
>LifestageEgg + L1         -45.36420974 -45.36420974
>LifestageL1                14.27570572  14.27570572
>LifestageL1 + L2           -0.30332697  -0.30332697
>LifestageL3                -3.58672631  -3.58672631
>TrtTime:LifestageEgg + L1   8.10482459   8.10482459
>TrtTime:LifestageL1 + L2    0.05662651   0.05662651
>TrtTime:LifestageL3         1.66743472   1.66743472
>
>There is no problem computing fitted values for the model, specified either
way. That the fitted values when Lifestage == "L1" all round to 1 on
the probability scale is coincidental -- that is, a consequence of the data.
>
>I hope this helps,
> John
>
>On 2022-07-27 8:26 p.m., Rolf Turner wrote:
>> 
>> I have a data frame with a numeric ("TrtTime") and a
categorical
>> ("Lifestage") predictor.
>> 
>> Level "L1" of Lifestage occurs only with a single value of
TrtTime,
>> explicitly 12, whence it is not possible to estimate a TrtTime
"slope"
>> when Lifestage is "L1".
>> 
>> Indeed, when I fitted the model
>> 
>>      fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage,
family=binomial,
>>                 data=demoDat)
>> 
>> I got:
>> 
>>> as.matrix(coef(fit))
>>>                                    [,1]
>>> (Intercept)                -0.91718302
>>> TrtTime                     0.88846195
>>> LifestageEgg + L1         -45.36420974
>>> LifestageL1                14.27570572
>>> LifestageL1 + L2           -0.30332697
>>> LifestageL3                -3.58672631
>>> TrtTime:LifestageEgg + L1   8.10482459
>>> TrtTime:LifestageL1                 NA
>>> TrtTime:LifestageL1 + L2    0.05662651
>>> TrtTime:LifestageL3         1.66743472
>> 
>> That is, TrtTime:LifestageL1 is NA, as expected.
>> 
>> I would have thought that fitted or predicted values corresponding to
>> Lifestage = "L1" would thereby be NA, but this is not the
case:
>> 
>>> predict(fit)[demoDat$Lifestage=="L1"]
>>>        26       65      131
>>> 24.02007 24.02007 24.02007
>>> 
>>> fitted(fit)[demoDat$Lifestage=="L1"]
>>>   26  65 131
>>>    1   1   1
>> 
>> That is, the predicted values on the scale of the linear predictor are
>> large and positive, rather than being NA.
>> 
>> What this amounts to, it seems to me, is saying that if the linear
>> predictor in a Binomial glm is NA, then "success" is a
certainty.
>> This strikes me as being a dubious proposition.  My gut feeling is that
>> misleading results could be produced.
>> 
>> Can anyone explain to me a rationale for this behaviour pattern?
>> Is there some justification for it that I am not currently seeing?
>> Any other comments?  (Please omit comments to the effect of "You
are as
>> thick as two short planks!". :-) )
>> 
>> I have attached the example data set in a file "demoDat.txt",
should
>> anyone want to experiment with it.  The file was created using dput()
so
>> you should access it (if you wish to do so) via something like
>> 
>>      demoDat <- dget("demoDat.txt")
>> 
>> Thanks for any enlightenment.
>> 
>> cheers,
>> 
>> Rolf Turner
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
-- 
Sent from my phone. Please excuse my brevity.

John Fox

2022-Jul-28 12:45 UTC

head link

[R] Predicted values from glm() when linear predictor is NA.

Dear Jeff,

On 2022-07-28 1:31 a.m., Jeff Newmiller wrote:> But "disappearing" is not what NA is supposed to do normally. Why
is it being treated that way here?
NA has a different meaning here than in data.

By default, in glm() the argument singular.ok is TRUE, and so estimates 
are provided even when there are singularities, and even though the 
singularities are resolved arbitrarily.

In this model, the columns of the model matrix labelled LifestageL1 and 
TrtTime:LifestageL1 are perfectly collinear -- the second is 12 times 
the first (both have 0s in the same rows and either 1 or 12 in three of 
the rows) -- and thus both can't be estimated simultaneously, but the 
model can be estimated by eliminating one or the other (effectively 
setting its coefficient to 0), or by taking any linear combination of 
the two regressors (i.e., using any regressor with 0s and some other 
value). The fitted values under the model are invariant with respect to 
this arbitrary choice.

My apologies if I'm stating the obvious and misunderstand your objection.

Best,
  John
> 
> On July 27, 2022 7:04:20 PM PDT, John Fox <jfox at mcmaster.ca>
wrote:
>> Dear Rolf,
>>
>> The coefficient of TrtTime:LifestageL1 isn't estimable (as you
explain) and by setting it to NA, glm() effectively removes it from the model.
An equivalent model is therefore
>>
>>> fit2 <- glm(cbind(Dead,Alive) ~ TrtTime + Lifestage +
>> +               I((Lifestage == "Egg + L1")*TrtTime) +
>> +               I((Lifestage == "L1 + L2")*TrtTime) +
>> +               I((Lifestage == "L3")*TrtTime),
>> +             family=binomial, data=demoDat)
>> Warning message:
>> glm.fit: fitted probabilities numerically 0 or 1 occurred
>>
>>> cbind(coef(fit, complete=FALSE), coef(fit2))
>>                                   [,1]         [,2]
>> (Intercept)                -0.91718302  -0.91718302
>> TrtTime                     0.88846195   0.88846195
>> LifestageEgg + L1         -45.36420974 -45.36420974
>> LifestageL1                14.27570572  14.27570572
>> LifestageL1 + L2           -0.30332697  -0.30332697
>> LifestageL3                -3.58672631  -3.58672631
>> TrtTime:LifestageEgg + L1   8.10482459   8.10482459
>> TrtTime:LifestageL1 + L2    0.05662651   0.05662651
>> TrtTime:LifestageL3         1.66743472   1.66743472
>>
>> There is no problem computing fitted values for the model, specified
either way. That the fitted values when Lifestage == "L1" all round to
1 on the probability scale is coincidental -- that is, a consequence of the
data.
>>
>> I hope this helps,
>> John
>>
>> On 2022-07-27 8:26 p.m., Rolf Turner wrote:
>>>
>>> I have a data frame with a numeric ("TrtTime") and a
categorical
>>> ("Lifestage") predictor.
>>>
>>> Level "L1" of Lifestage occurs only with a single value
of TrtTime,
>>> explicitly 12, whence it is not possible to estimate a TrtTime
"slope"
>>> when Lifestage is "L1".
>>>
>>> Indeed, when I fitted the model
>>>
>>>       fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage,
family=binomial,
>>>                  data=demoDat)
>>>
>>> I got:
>>>
>>>> as.matrix(coef(fit))
>>>>                                     [,1]
>>>> (Intercept)                -0.91718302
>>>> TrtTime                     0.88846195
>>>> LifestageEgg + L1         -45.36420974
>>>> LifestageL1                14.27570572
>>>> LifestageL1 + L2           -0.30332697
>>>> LifestageL3                -3.58672631
>>>> TrtTime:LifestageEgg + L1   8.10482459
>>>> TrtTime:LifestageL1                 NA
>>>> TrtTime:LifestageL1 + L2    0.05662651
>>>> TrtTime:LifestageL3         1.66743472
>>>
>>> That is, TrtTime:LifestageL1 is NA, as expected.
>>>
>>> I would have thought that fitted or predicted values corresponding
to
>>> Lifestage = "L1" would thereby be NA, but this is not the
case:
>>>
>>>> predict(fit)[demoDat$Lifestage=="L1"]
>>>>         26       65      131
>>>> 24.02007 24.02007 24.02007
>>>>
>>>> fitted(fit)[demoDat$Lifestage=="L1"]
>>>>    26  65 131
>>>>     1   1   1
>>>
>>> That is, the predicted values on the scale of the linear predictor
are
>>> large and positive, rather than being NA.
>>>
>>> What this amounts to, it seems to me, is saying that if the linear
>>> predictor in a Binomial glm is NA, then "success" is a
certainty.
>>> This strikes me as being a dubious proposition.  My gut feeling is
that
>>> misleading results could be produced.
>>>
>>> Can anyone explain to me a rationale for this behaviour pattern?
>>> Is there some justification for it that I am not currently seeing?
>>> Any other comments?  (Please omit comments to the effect of
"You are as
>>> thick as two short planks!". :-) )
>>>
>>> I have attached the example data set in a file
"demoDat.txt", should
>>> anyone want to experiment with it.  The file was created using
dput() so
>>> you should access it (if you wish to do so) via something like
>>>
>>>       demoDat <- dget("demoDat.txt")
>>>
>>> Thanks for any enlightenment.
>>>
>>> cheers,
>>>
>>> Rolf Turner
>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
> -- 
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://socialsciences.mcmaster.ca/jfox/

R help - Jul 2022 - Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.