thr3ads.net - R devel - [Rd] Bug or feature? [Jan 2023]

If this information is useful, please help other people find it:
Share via:

GILLIBERT, Andre

2023-Jan-14 16:05 UTC

[Rd] Bug or feature?

Dear developers,


I found an inconsistency in the predict.lm() function between offset and
non-offset terms of the formula, but I am not sure whether that is intentional
or a bug.


The problem can be shown in a simple example:


mod <- local({
    y <- rep(0,10)
    x <- rep(c(0,1), each=5)
    list(lm(y ~ x), lm(y ~ offset(x)))
})
# works fine, using the x variable of the local environment
predict(mod[[1]], newdata=data.frame(z=1:10))
# error 'x' not found, because it seeks x in the global environment
predict(mod[[2]], newdata=data.frame(z=1:10))

I would expect either both predict() to use the local x variable or the global x
variable, but the current behavior is inconsistent.

In the worse case, both variables may exist but refer to different data, which
seems to be very dangerous in my opinion.


The problem becomes obvious from the source code of model.frame.default() and
predict.lm()

predict.lm() calls model.frame()

For a non-offset variable, the source code of model.frame.default shows:


variables <- eval(predvars, data, env)


Where env is the environment of the formula parameter.

Consequently, non-offset variables are evaluated in the context of the data
frame, then in the environment of the formula/terms of the model.


For offset variables, the source code of predict.lm() contains:

eval(attr(tt, "variables")[[i + 1]], newdata)


It is not executed in the environment of the formula/terms of the model.


The inconsistency could easily be fixed by a patch to predict.lm() by replacing
eval(attr(tt, "variables")[[i + 1]], newdata) by eval(attr(tt,
"variables")[[i + 1]], newdata, environment(Terms))

The same modification would have to be done two lines after:

offset <- offset + eval(object$call$offset, newdata, environment(Terms))


However, fixing this inconsistency could break code that rely on the old
behavior.


What do you think of that?


--

Sincerely

Andr? GILLIBERT

	[[alternative HTML version deleted]]

Martin Maechler

2023-Jan-17 08:33 UTC

head link

[Rd] Bug or feature?

>>>>> GILLIBERT, Andre 
>>>>>     on Sat, 14 Jan 2023 16:05:31 +0000 writes:
    > Dear developers,
    > I found an inconsistency in the predict.lm() function between offset
and non-offset terms of the formula, but I am not sure whether that is
intentional or a bug.


    > The problem can be shown in a simple example:

    > mod <- local({
    >   y <- rep(0,10)
    >   x <- rep(c(0,1), each=5)
    >   list(lm(y ~ x), lm(y ~ offset(x)))
    > })
    > # works fine, using the x variable of the local environment
    > predict(mod[[1]], newdata=data.frame(z=1:10))
    > # error 'x' not found, because it seeks x in the global
environment
    > predict(mod[[2]], newdata=data.frame(z=1:10))

    > I would expect either both predict() to use the local x
    > variable or the global x variable, but the current
    > behavior is inconsistent.

    > In the worse case, both variables may exist but refer to
    > different data, which seems to be very dangerous in my
    > opinion.

    > The problem becomes obvious from the source code of
model.frame.default() and predict.lm()

    > predict.lm() calls model.frame()

    > For a non-offset variable, the source code of model.frame.default
shows:


    > variables <- eval(predvars, data, env)


    > Where env is the environment of the formula parameter.

    > Consequently, non-offset variables are evaluated in the context of the
data frame, then in the environment of the formula/terms of the model.


    > For offset variables, the source code of predict.lm() contains:

    > eval(attr(tt, "variables")[[i + 1]], newdata)


    > It is not executed in the environment of the formula/terms of the
model.


    > The inconsistency could easily be fixed by a patch to predict.lm() by
replacing eval(attr(tt, "variables")[[i + 1]], newdata) by
eval(attr(tt, "variables")[[i + 1]], newdata, environment(Terms))

    > The same modification would have to be done two lines after:

    > offset <- offset + eval(object$call$offset, newdata,
environment(Terms))

    > However, fixing this inconsistency could break code that rely on the
old behavior.

    > What do you think of that?

As I've worked last week on the  bugzilla issue about
predict.lm(), recently,
   https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16158

and before that on another small detail there,
I indeed had noticed -- just from code reading -- 
that there seem to be several small inconsistencies in
predict.lm();  also, between the two branches  se.fit=FALSE vs  se.fit=TRUE

In the mean time, you have filed a new bugzilla isse about this,

   https://bugs.r-project.org/show_bug.cgi?id=18456

so we (and everyone interested) will continue the discussion
there.

Thank you for contributing to make R better by this!

Best regards,
Martin


    > --

    > Sincerely
    > Andr? GILLIBERT

--
Martin Maechler
ETH Zurich  and  R Core team

Seemingly Similar Threads

Search for more seemingly similar threads

R devel - Jan 2023 - Bug or feature?

[Rd] Bug or feature?

[Rd] Bug or feature?

Seemingly Similar Threads