Ebert,Timothy Aaron
2022-Jul-28 00:42 UTC
[R] Predicted values from glm() when linear predictor is NA.
Time is often used in this sort of problem, but really time is not relevant. A better choice is accumulated thermal units. The insect will molt when X thermal units have been accumulated. This is often expressed as degree days, but could as easily be other units like degree seconds. However, I suspect that fine time units exceeds the accuracy of the measurement system. A growth chamber might maintain 28 C, but the temperature the insect experiences might be somewhat different thereby introducing additional variability in the outcome. No thermal units accumulated, no development, so that is not an issue. This approach allows one to predict life stage over a large temperature range. Accuracy can be improved if one knows the lower development threshold. At high temperatures development stops, and a mortality function can be added. Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of Rolf Turner Sent: Wednesday, July 27, 2022 8:26 PM To: r-help <r-help at r-project.org> Subject: [R] Predicted values from glm() when linear predictor is NA. [External Email] I have a data frame with a numeric ("TrtTime") and a categorical ("Lifestage") predictor. Level "L1" of Lifestage occurs only with a single value of TrtTime, explicitly 12, whence it is not possible to estimate a TrtTime "slope" when Lifestage is "L1". Indeed, when I fitted the model fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial, data=demoDat) I got:> as.matrix(coef(fit)) > [,1] > (Intercept) -0.91718302 > TrtTime 0.88846195 > LifestageEgg + L1 -45.36420974 > LifestageL1 14.27570572 > LifestageL1 + L2 -0.30332697 > LifestageL3 -3.58672631 > TrtTime:LifestageEgg + L1 8.10482459 > TrtTime:LifestageL1 NA > TrtTime:LifestageL1 + L2 0.05662651 > TrtTime:LifestageL3 1.66743472That is, TrtTime:LifestageL1 is NA, as expected. I would have thought that fitted or predicted values corresponding to Lifestage = "L1" would thereby be NA, but this is not the case:> predict(fit)[demoDat$Lifestage=="L1"] > 26 65 131 > 24.02007 24.02007 24.02007 > > fitted(fit)[demoDat$Lifestage=="L1"] > 26 65 131 > 1 1 1That is, the predicted values on the scale of the linear predictor are large and positive, rather than being NA. What this amounts to, it seems to me, is saying that if the linear predictor in a Binomial glm is NA, then "success" is a certainty. This strikes me as being a dubious proposition. My gut feeling is that misleading results could be produced. Can anyone explain to me a rationale for this behaviour pattern? Is there some justification for it that I am not currently seeing? Any other comments? (Please omit comments to the effect of "You are as thick as two short planks!". :-) ) I have attached the example data set in a file "demoDat.txt", should anyone want to experiment with it. The file was created using dput() so you should access it (if you wish to do so) via something like demoDat <- dget("demoDat.txt") Thanks for any enlightenment. cheers, Rolf Turner -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Rolf Turner
2022-Jul-28 02:09 UTC
[R] Predicted values from glm() when linear predictor is NA.
On Thu, 28 Jul 2022 00:42:51 +0000 "Ebert,Timothy Aaron" <tebert at ufl.edu> wrote:> Time is often used in this sort of problem, but really time is not > relevant. A better choice is accumulated thermal units. The insect > will molt when X thermal units have been accumulated. This is often > expressed as degree days, but could as easily be other units like > degree seconds. However, I suspect that fine time units exceeds the > accuracy of the measurement system. A growth chamber might maintain > 28 C, but the temperature the insect experiences might be somewhat > different thereby introducing additional variability in the outcome. > No thermal units accumulated, no development, so that is not an > issue. This approach allows one to predict life stage over a large > temperature range. Accuracy can be improved if one knows the lower > development threshold. At high temperatures development stops, and a > mortality function can be added.Very cogent comments in respect of dealing with the underlying practical problem, but I am not so much concerned with the practical problem at the moment but rather with the workings of the software that I am using. cheers, Rolf P.S. I am at several removes from the data set(s) that I am messing about with. But if my understanding is correct (always an assumption of which to be sceptical!) these data were collected with the temperature being held *constant*, whence time and accumulated thermal units would be equivalent. Is it not so? R. -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276