>>>>> GILLIBERT, Andre
>>>>> on Sat, 14 Jan 2023 16:05:31 +0000 writes:
> Dear developers,
> I found an inconsistency in the predict.lm() function between offset
and non-offset terms of the formula, but I am not sure whether that is
intentional or a bug.
> The problem can be shown in a simple example:
> mod <- local({
> y <- rep(0,10)
> x <- rep(c(0,1), each=5)
> list(lm(y ~ x), lm(y ~ offset(x)))
> })
> # works fine, using the x variable of the local environment
> predict(mod[[1]], newdata=data.frame(z=1:10))
> # error 'x' not found, because it seeks x in the global
environment
> predict(mod[[2]], newdata=data.frame(z=1:10))
> I would expect either both predict() to use the local x
> variable or the global x variable, but the current
> behavior is inconsistent.
> In the worse case, both variables may exist but refer to
> different data, which seems to be very dangerous in my
> opinion.
> The problem becomes obvious from the source code of
model.frame.default() and predict.lm()
> predict.lm() calls model.frame()
> For a non-offset variable, the source code of model.frame.default
shows:
> variables <- eval(predvars, data, env)
> Where env is the environment of the formula parameter.
> Consequently, non-offset variables are evaluated in the context of the
data frame, then in the environment of the formula/terms of the model.
> For offset variables, the source code of predict.lm() contains:
> eval(attr(tt, "variables")[[i + 1]], newdata)
> It is not executed in the environment of the formula/terms of the
model.
> The inconsistency could easily be fixed by a patch to predict.lm() by
replacing eval(attr(tt, "variables")[[i + 1]], newdata) by
eval(attr(tt, "variables")[[i + 1]], newdata, environment(Terms))
> The same modification would have to be done two lines after:
> offset <- offset + eval(object$call$offset, newdata,
environment(Terms))
> However, fixing this inconsistency could break code that rely on the
old behavior.
> What do you think of that?
As I've worked last week on the bugzilla issue about
predict.lm(), recently,
https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16158
and before that on another small detail there,
I indeed had noticed -- just from code reading --
that there seem to be several small inconsistencies in
predict.lm(); also, between the two branches se.fit=FALSE vs se.fit=TRUE
In the mean time, you have filed a new bugzilla isse about this,
https://bugs.r-project.org/show_bug.cgi?id=18456
so we (and everyone interested) will continue the discussion
there.
Thank you for contributing to make R better by this!
Best regards,
Martin
> --
> Sincerely
> Andr? GILLIBERT
--
Martin Maechler
ETH Zurich and R Core team