David Winsemius
2022-Jul-28 01:25 UTC
[R] Predicted values from glm() when linear predictor is NA.
On 7/27/22 17:26, Rolf Turner wrote:> I have a data frame with a numeric ("TrtTime") and a categorical > ("Lifestage") predictor. > > Level "L1" of Lifestage occurs only with a single value of TrtTime, > explicitly 12, whence it is not possible to estimate a TrtTime "slope" > when Lifestage is "L1". > > Indeed, when I fitted the model > > fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial, > data=demoDat) > > I got: > >> as.matrix(coef(fit)) >> [,1] >> (Intercept) -0.91718302 >> TrtTime 0.88846195 >> LifestageEgg + L1 -45.36420974 >> LifestageL1 14.27570572 >> LifestageL1 + L2 -0.30332697 >> LifestageL3 -3.58672631 >> TrtTime:LifestageEgg + L1 8.10482459 >> TrtTime:LifestageL1 NA >> TrtTime:LifestageL1 + L2 0.05662651 >> TrtTime:LifestageL3 1.66743472 > That is, TrtTime:LifestageL1 is NA, as expected. > > I would have thought that fitted or predicted values corresponding to > Lifestage = "L1" would thereby be NA, but this is not the case: > >> predict(fit)[demoDat$Lifestage=="L1"] >> 26 65 131 >> 24.02007 24.02007 24.02007 >> >> fitted(fit)[demoDat$Lifestage=="L1"] >> 26 65 131 >> 1 1 1 > That is, the predicted values on the scale of the linear predictor are > large and positive, rather than being NA. > > What this amounts to, it seems to me, is saying that if the linear > predictor in a Binomial glm is NA, then "success" is a certainty. > This strikes me as being a dubious proposition. My gut feeling is that > misleading results could be produced.The NA is most likely caused by aliasing, so some other combination of factors a perfect surrogate for every case with that level of the interaction. The `predict.glm` function always requires a complete set of values to construct a case. Whether apparent incremental linear prediction of that interaction term is large or small will depend on the degree of independent contribution of the surrogate levels of other variables.. David.> > Can anyone explain to me a rationale for this behaviour pattern? > Is there some justification for it that I am not currently seeing? > Any other comments? (Please omit comments to the effect of "You are as > thick as two short planks!". :-) ) > > I have attached the example data set in a file "demoDat.txt", should > anyone want to experiment with it. The file was created using dput() so > you should access it (if you wish to do so) via something like > > demoDat <- dget("demoDat.txt") > > Thanks for any enlightenment. > > cheers, > > Rolf Turner > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Rolf Turner
2022-Jul-28 02:19 UTC
[R] Predicted values from glm() when linear predictor is NA.
On Wed, 27 Jul 2022 18:25:23 -0700 David Winsemius <dwinsemius at comcast.net> wrote: <SNIP>> The NA is most likely caused by aliasing, so some other combination > of factors a perfect surrogate for every case with that level of the > interaction.<SNIP> No, I think it's much simpler than that. Essentially it boils down to the fact that if y is 1:10 and x is rep(1,10) then lm(y ~ x) gives a slope estimate of NA. As it clearly should. cheers, Rolf -- Honorary Research Fellow Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Martin Maechler
2022-Aug-02 07:10 UTC
[R] Predicted values from glm() when linear predictor is NA.
>>>>> Rolf Turner >>>>> on Thu, 28 Jul 2022 14:19:49 +1200 writes:> On Wed, 27 Jul 2022 18:25:23 -0700 David Winsemius > <dwinsemius at comcast.net> wrote: > <SNIP> >> The NA is most likely caused by aliasing, so some other >> combination of factors a perfect surrogate for every case >> with that level of the interaction. > <SNIP> > No, I think it's much simpler than that. Essentially it > boils down to the fact that if y is 1:10 and x is > rep(1,10) then > lm(y ~ x) > gives a slope estimate of NA. As it clearly should. *and* predict() should surely *not* be NA, either, right ?? You can easily confirm with x <- rep(1,7); y <- 1:7; summary(fm <- lm(y~x)) predict(fm) which gives all '4' (= mean(y)) Note that this *is* aliasing: x is aliased with the intercept, and it fits entirely your situation: If a predictor variable is superfluous, i.e., more precisely, a column of the X design matrix is an exact linear combination of previous columns, prediction is no problem at all and gives exactly what you'd get if you omit that superfluous column/variable. Are you sure this is not simply your situation? Martin > cheers, > Rolf > -- > Honorary Research Fellow Department of Statistics > University of Auckland Phone: +64-9-373-7599 ext. 88276