thr3ads.net - R help - [R] Predicted values from glm() when linear predictor is NA. [Jul 2022]

If this information is useful, please help other people find it:
Share via:

Rolf Turner

2022-Jul-28 00:26 UTC

[R] Predicted values from glm() when linear predictor is NA.

I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

    fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
               data=demoDat)

I got:
> as.matrix(coef(fit))
>                                   [,1]
> (Intercept)                -0.91718302
> TrtTime                     0.88846195
> LifestageEgg + L1         -45.36420974
> LifestageL1                14.27570572
> LifestageL1 + L2           -0.30332697
> LifestageL3                -3.58672631
> TrtTime:LifestageEgg + L1   8.10482459
> TrtTime:LifestageL1                 NA
> TrtTime:LifestageL1 + L2    0.05662651
> TrtTime:LifestageL3         1.66743472
That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:
> predict(fit)[demoDat$Lifestage=="L1"]
>       26       65      131 
> 24.02007 24.02007 24.02007
>
> fitted(fit)[demoDat$Lifestage=="L1"]
>  26  65 131 
>   1   1   1
That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

    demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: demoDat.txt
URL:
<https://stat.ethz.ch/pipermail/r-help/attachments/20220728/dd771e34/attachment.txt>

Andrew Robinson

2022-Jul-28 00:41 UTC

head link

[R] [EXT] Predicted values from glm() when linear predictor is NA.

Hi Rolf,

that's an interesting observation.

I agree that it is counter-intuitive that the fitted values / predictions are
not NA.

However, I don't agree with your comment that <<if the linear
predictor in a Binomial glm is NA, then "success" is a
certainty>> - that seems to be a peculiarity of these data - note for
these three observations there are thousands dead and zero alive and that you
get similar outcomes if you omit TrtTime altogether ...
> predict(glm(cbind(Dead,Alive) ~ Lifestage,
family=binomial,data=demoDat))[demoDat$Lifestage=="L1"] 26 65 131
20.02007 20.02007 20.02007>
Cheers,

Andrew

--
Andrew Robinson
Chief Executive Officer, CEBRA and Professor of Biosecurity,
School/s of BioSciences and Mathematics & Statistics
University of Melbourne, VIC 3010 Australia
Tel: (+61) 0403 138 955
Email: apro at unimelb.edu.au
Website: https://researchers.ms.unimelb.edu.au/~apro at unimelb/

I acknowledge the Traditional Owners of the land I inhabit, and pay my respects
to their Elders.
On 28 Jul 2022, 10:27 AM +1000, Rolf Turner <r.turner at auckland.ac.nz>,
wrote:
External email: Please exercise caution


I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
data=demoDat)

I got:

as.matrix(coef(fit))
[,1]
(Intercept) -0.91718302
TrtTime 0.88846195
LifestageEgg + L1 -45.36420974
LifestageL1 14.27570572
LifestageL1 + L2 -0.30332697
LifestageL3 -3.58672631
TrtTime:LifestageEgg + L1 8.10482459
TrtTime:LifestageL1 NA
TrtTime:LifestageL1 + L2 0.05662651
TrtTime:LifestageL3 1.66743472

That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:

predict(fit)[demoDat$Lifestage=="L1"]
26 65 131
24.02007 24.02007 24.02007

fitted(fit)[demoDat$Lifestage=="L1"]
26 65 131
1 1 1

That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition. My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments? (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it. The file was created using dput() so
you should access it (if you wish to do so) via something like

demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

	[[alternative HTML version deleted]]

Ebert,Timothy Aaron

2022-Jul-28 00:42 UTC

head link

[R] Predicted values from glm() when linear predictor is NA.

Time is often used in this sort of problem, but really time is not relevant. A
better choice is accumulated thermal units. The insect will molt when X thermal
units have been accumulated. This is often expressed as degree days, but could
as easily be other units like degree seconds. However, I suspect that fine time
units exceeds the accuracy of the measurement system. A growth chamber might
maintain 28 C, but the temperature the insect experiences might be somewhat
different thereby introducing additional variability in the outcome. No thermal
units accumulated, no development, so that is not an issue.
    This approach allows one to predict life stage over a large temperature
range. Accuracy can be improved if one knows the lower development threshold. At
high temperatures development stops, and a mortality function can be added.

Tim

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Rolf Turner
Sent: Wednesday, July 27, 2022 8:26 PM
To: r-help <r-help at r-project.org>
Subject: [R] Predicted values from glm() when linear predictor is NA.

[External Email]

I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

    fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
               data=demoDat)

I got:
> as.matrix(coef(fit))
>                                   [,1]
> (Intercept)                -0.91718302
> TrtTime                     0.88846195
> LifestageEgg + L1         -45.36420974
> LifestageL1                14.27570572
> LifestageL1 + L2           -0.30332697
> LifestageL3                -3.58672631
> TrtTime:LifestageEgg + L1   8.10482459
> TrtTime:LifestageL1                 NA
> TrtTime:LifestageL1 + L2    0.05662651
> TrtTime:LifestageL3         1.66743472
That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to Lifestage
= "L1" would thereby be NA, but this is not the case:
> predict(fit)[demoDat$Lifestage=="L1"]
>       26       65      131
> 24.02007 24.02007 24.02007
>
> fitted(fit)[demoDat$Lifestage=="L1"]
>  26  65 131
>   1   1   1
That is, the predicted values on the scale of the linear predictor are large and
positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear predictor in
a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so you
should access it (if you wish to do so) via something like

    demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

David Winsemius

2022-Jul-28 01:25 UTC

head link

[R] Predicted values from glm() when linear predictor is NA.

On 7/27/22 17:26, Rolf Turner wrote:> I have a data frame with a numeric ("TrtTime") and a categorical
> ("Lifestage") predictor.
>
> Level "L1" of Lifestage occurs only with a single value of
TrtTime,
> explicitly 12, whence it is not possible to estimate a TrtTime
"slope"
> when Lifestage is "L1".
>
> Indeed, when I fitted the model
>
>      fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
>                 data=demoDat)
>
> I got:
>
>> as.matrix(coef(fit))
>>                                    [,1]
>> (Intercept)                -0.91718302
>> TrtTime                     0.88846195
>> LifestageEgg + L1         -45.36420974
>> LifestageL1                14.27570572
>> LifestageL1 + L2           -0.30332697
>> LifestageL3                -3.58672631
>> TrtTime:LifestageEgg + L1   8.10482459
>> TrtTime:LifestageL1                 NA
>> TrtTime:LifestageL1 + L2    0.05662651
>> TrtTime:LifestageL3         1.66743472
> That is, TrtTime:LifestageL1 is NA, as expected.
>
> I would have thought that fitted or predicted values corresponding to
> Lifestage = "L1" would thereby be NA, but this is not the case:
>
>> predict(fit)[demoDat$Lifestage=="L1"]
>>        26       65      131
>> 24.02007 24.02007 24.02007
>>
>> fitted(fit)[demoDat$Lifestage=="L1"]
>>   26  65 131
>>    1   1   1
> That is, the predicted values on the scale of the linear predictor are
> large and positive, rather than being NA.
>
> What this amounts to, it seems to me, is saying that if the linear
> predictor in a Binomial glm is NA, then "success" is a certainty.
> This strikes me as being a dubious proposition.  My gut feeling is that
> misleading results could be produced.
The NA is most likely caused by aliasing, so some other combination of 
factors a perfect surrogate for every case with that level of the 
interaction. The `predict.glm` function always requires a complete set 
of values to construct a case. Whether apparent incremental linear 
prediction of that interaction term is large or small will depend on the 
degree of independent contribution of the surrogate levels of other 
variables..


David.
>
> Can anyone explain to me a rationale for this behaviour pattern?
> Is there some justification for it that I am not currently seeing?
> Any other comments?  (Please omit comments to the effect of "You are
as
> thick as two short planks!". :-) )
>
> I have attached the example data set in a file "demoDat.txt",
should
> anyone want to experiment with it.  The file was created using dput() so
> you should access it (if you wish to do so) via something like
>
>      demoDat <- dget("demoDat.txt")
>
> Thanks for any enlightenment.
>
> cheers,
>
> Rolf Turner
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - Jul 2022 - Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.

[R] [EXT] Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.

[R] Predicted values from glm() when linear predictor is NA.