thr3ads.net - R devel - [Rd] lm() takes weights from formula environment [Aug 2020]

If this information is useful, please help other people find it:
Share via:

John Mount

2020-Aug-09 18:13 UTC

[Rd] lm() takes weights from formula environment

I know this programmers can reason this out from R's late parameter
evaluation rules PLUS the explicit match.call()/eval() lm() does to work with
the passed in formula and data frame. But, from a statistical user point of view
this seems to be counter-productive. At best it works as if the user is passing
in the name of the weights variable instead of values (I know this is the
obvious consequence of NSE).

lm() takes instance weights from the formula environment. Usually that
environment is the interactive environment or a close child of the interactive
environment and we are lucky enough to have no intervening name collisions so we
don't have a problem. However it makes programming over formulas for lm() a
bit tricky. Here is an example of the issue.

Is there any recommended discussion on this and how to work around it? In my own
work I explicitly set the formula environment and put the weights in that
environment.


d <- data.frame(x = 1:3, y = c(3, 3, 4))
w <- c(1, 5, 1)

# works
lm(y ~ x, data = d, weights = w)  

# fails, as weights are taken from formul environment
fn <- function() {  # deliberately set up formula with bad value in
environment
  w <- c(-1, -1, -1, -1)  # bad weights
  f <- as.formula(y ~ x)  # captures bad weights with as.formula(env =
parent.frame()) default
  return(f)
}
lm(fn(), data = d, weights = w)
# Error in model.frame.default(formula = fn(), data = d, weights = w,
drop.unused.levels = TRUE) :
#   variable lengths differ (found for '(weights)')

Duncan Murdoch

2020-Aug-09 18:56 UTC

head link

[Rd] lm() takes weights from formula environment

This is fairly clearly documented in ?lm:

"All of weights, subset and offset are evaluated in the same way as 
variables in formula, that is first in data and then in the environment 
of formula."

There are lots of possible places to look for weights, but this seems to 
me like a pretty sensible search order.  In most cases the environment 
of the formula will have a parent environment chain that eventually 
leads to the global environment, so (with no conflicts) your strategy of 
defining w there will sometimes work, but looks pretty unreliable.

When you say you want to work around this search order, I think the 
obvious way is to add your w vector to your d dataframe.  That way it is 
guaranteed to be found even if there's a conflicting variable in the 
formula environment, or the global environment.

Duncan Murdoch

On 09/08/2020 2:13 p.m., John Mount wrote:> I know this programmers can reason this out from R's late parameter
evaluation rules PLUS the explicit match.call()/eval() lm() does to work with
the passed in formula and data frame. But, from a statistical user point of view
this seems to be counter-productive. At best it works as if the user is passing
in the name of the weights variable instead of values (I know this is the
obvious consequence of NSE).
> 
> lm() takes instance weights from the formula environment. Usually that
environment is the interactive environment or a close child of the interactive
environment and we are lucky enough to have no intervening name collisions so we
don't have a problem. However it makes programming over formulas for lm() a
bit tricky. Here is an example of the issue.
> 
> Is there any recommended discussion on this and how to work around it? In
my own work I explicitly set the formula environment and put the weights in that
environment.
> 
> 
> d <- data.frame(x = 1:3, y = c(3, 3, 4))
> w <- c(1, 5, 1)
> 
> # works
> lm(y ~ x, data = d, weights = w)
> 
> # fails, as weights are taken from formul environment
> fn <- function() {  # deliberately set up formula with bad value in
environment
>    w <- c(-1, -1, -1, -1)  # bad weights
>    f <- as.formula(y ~ x)  # captures bad weights with as.formula(env =
parent.frame()) default
>    return(f)
> }
> lm(fn(), data = d, weights = w)
> # Error in model.frame.default(formula = fn(), data = d, weights = w,
drop.unused.levels = TRUE) :
> #   variable lengths differ (found for '(weights)')
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

John Mount

2020-Aug-09 19:01 UTC

head link

[Rd] lm() takes weights from formula environment

Doesn't this preclude "y ~ ." style notations?
> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
> 
> This is fairly clearly documented in ?lm:
> 
> "All of weights, subset and offset are evaluated in the same way as
variables in formula, that is first in data and then in the environment of
formula."
> 
> There are lots of possible places to look for weights, but this seems to me
like a pretty sensible search order.  In most cases the environment of the
formula will have a parent environment chain that eventually leads to the global
environment, so (with no conflicts) your strategy of defining w there will
sometimes work, but looks pretty unreliable.
> 
> When you say you want to work around this search order, I think the obvious
way is to add your w vector to your d dataframe.  That way it is guaranteed to
be found even if there's a conflicting variable in the formula environment,
or the global environment.
> 
> Duncan Murdoch
> 
> On 09/08/2020 2:13 p.m., John Mount wrote:
>> I know this programmers can reason this out from R's late parameter
evaluation rules PLUS the explicit match.call()/eval() lm() does to work with
the passed in formula and data frame. But, from a statistical user point of view
this seems to be counter-productive. At best it works as if the user is passing
in the name of the weights variable instead of values (I know this is the
obvious consequence of NSE).
>> lm() takes instance weights from the formula environment. Usually that
environment is the interactive environment or a close child of the interactive
environment and we are lucky enough to have no intervening name collisions so we
don't have a problem. However it makes programming over formulas for lm() a
bit tricky. Here is an example of the issue.
>> Is there any recommended discussion on this and how to work around it?
In my own work I explicitly set the formula environment and put the weights in
that environment.
>> d <- data.frame(x = 1:3, y = c(3, 3, 4))
>> w <- c(1, 5, 1)
>> # works
>> lm(y ~ x, data = d, weights = w)
>> # fails, as weights are taken from formul environment
>> fn <- function() {  # deliberately set up formula with bad value in
environment
>>   w <- c(-1, -1, -1, -1)  # bad weights
>>   f <- as.formula(y ~ x)  # captures bad weights with as.formula(env
= parent.frame()) default
>>   return(f)
>> }
>> lm(fn(), data = d, weights = w)
>> # Error in model.frame.default(formula = fn(), data = d, weights = w,
drop.unused.levels = TRUE) :
>> #   variable lengths differ (found for '(weights)')
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

John Mount

2020-Aug-10 17:42 UTC

head link

[Rd] lm() takes weights from formula environment

I wish I had started with "I am disappointed that lm() doesn't continue
its search for weights into the calling environment" or "the fact that
lm() looks only in the formula environment and data frame for weights
doesn't seem consistent with how other values are treated."

But I did not. So I do apologize for both that and for negative tone on my part.


Simplified example:

d <- data.frame(x = 1:3, y = c(1, 2, 1))
w <- c(1, 10, 1)
f <- as.formula(y ~ x)
lm(f, data = d, weights = w)  # works

# fails
environment(f) <- baseenv()
lm(f, data = d, weights = w)
# Error in eval(extras, data, env) : object 'w' not found

> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
> 
> This is fairly clearly documented in ?lm:
>

Seemingly Similar Threads

Search for more apparently analagous threads

R devel - Aug 2020 - lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

Seemingly Similar Threads