thr3ads.net - R devel - [Rd] lm() takes weights from formula environment [Aug 2020]

If this information is useful, please help other people find it:
Share via:

John Mount

2020-Aug-09 19:01 UTC

[Rd] lm() takes weights from formula environment

Doesn't this preclude "y ~ ." style notations?
> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
> 
> This is fairly clearly documented in ?lm:
> 
> "All of weights, subset and offset are evaluated in the same way as
variables in formula, that is first in data and then in the environment of
formula."
> 
> There are lots of possible places to look for weights, but this seems to me
like a pretty sensible search order.  In most cases the environment of the
formula will have a parent environment chain that eventually leads to the global
environment, so (with no conflicts) your strategy of defining w there will
sometimes work, but looks pretty unreliable.
> 
> When you say you want to work around this search order, I think the obvious
way is to add your w vector to your d dataframe.  That way it is guaranteed to
be found even if there's a conflicting variable in the formula environment,
or the global environment.
> 
> Duncan Murdoch
> 
> On 09/08/2020 2:13 p.m., John Mount wrote:
>> I know this programmers can reason this out from R's late parameter
evaluation rules PLUS the explicit match.call()/eval() lm() does to work with
the passed in formula and data frame. But, from a statistical user point of view
this seems to be counter-productive. At best it works as if the user is passing
in the name of the weights variable instead of values (I know this is the
obvious consequence of NSE).
>> lm() takes instance weights from the formula environment. Usually that
environment is the interactive environment or a close child of the interactive
environment and we are lucky enough to have no intervening name collisions so we
don't have a problem. However it makes programming over formulas for lm() a
bit tricky. Here is an example of the issue.
>> Is there any recommended discussion on this and how to work around it?
In my own work I explicitly set the formula environment and put the weights in
that environment.
>> d <- data.frame(x = 1:3, y = c(3, 3, 4))
>> w <- c(1, 5, 1)
>> # works
>> lm(y ~ x, data = d, weights = w)
>> # fails, as weights are taken from formul environment
>> fn <- function() {  # deliberately set up formula with bad value in
environment
>>   w <- c(-1, -1, -1, -1)  # bad weights
>>   f <- as.formula(y ~ x)  # captures bad weights with as.formula(env
= parent.frame()) default
>>   return(f)
>> }
>> lm(fn(), data = d, weights = w)
>> # Error in model.frame.default(formula = fn(), data = d, weights = w,
drop.unused.levels = TRUE) :
>> #   variable lengths differ (found for '(weights)')
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>

Duncan Murdoch

2020-Aug-09 19:07 UTC

head link

[Rd] lm() takes weights from formula environment

On 09/08/2020 3:01 p.m., John Mount wrote:> Doesn't this preclude "y ~ ." style notations?
Yes, but you can use "y ~ . - w".

Duncan Murdoch

> 
>> On Aug 9, 2020, at 11:56 AM, Duncan Murdoch <murdoch.duncan at
gmail.com> wrote:
>>
>> This is fairly clearly documented in ?lm:
>>
>> "All of weights, subset and offset are evaluated in the same way
as variables in formula, that is first in data and then in the environment of
formula."
>>
>> There are lots of possible places to look for weights, but this seems
to me like a pretty sensible search order.  In most cases the environment of the
formula will have a parent environment chain that eventually leads to the global
environment, so (with no conflicts) your strategy of defining w there will
sometimes work, but looks pretty unreliable.
>>
>> When you say you want to work around this search order, I think the
obvious way is to add your w vector to your d dataframe.  That way it is
guaranteed to be found even if there's a conflicting variable in the formula
environment, or the global environment.
>>
>> Duncan Murdoch
>>
>> On 09/08/2020 2:13 p.m., John Mount wrote:
>>> I know this programmers can reason this out from R's late
parameter evaluation rules PLUS the explicit match.call()/eval() lm() does to
work with the passed in formula and data frame. But, from a statistical user
point of view this seems to be counter-productive. At best it works as if the
user is passing in the name of the weights variable instead of values (I know
this is the obvious consequence of NSE).
>>> lm() takes instance weights from the formula environment. Usually
that environment is the interactive environment or a close child of the
interactive environment and we are lucky enough to have no intervening name
collisions so we don't have a problem. However it makes programming over
formulas for lm() a bit tricky. Here is an example of the issue.
>>> Is there any recommended discussion on this and how to work around
it? In my own work I explicitly set the formula environment and put the weights
in that environment.
>>> d <- data.frame(x = 1:3, y = c(3, 3, 4))
>>> w <- c(1, 5, 1)
>>> # works
>>> lm(y ~ x, data = d, weights = w)
>>> # fails, as weights are taken from formul environment
>>> fn <- function() {  # deliberately set up formula with bad value
in environment
>>>    w <- c(-1, -1, -1, -1)  # bad weights
>>>    f <- as.formula(y ~ x)  # captures bad weights with
as.formula(env = parent.frame()) default
>>>    return(f)
>>> }
>>> lm(fn(), data = d, weights = w)
>>> # Error in model.frame.default(formula = fn(), data = d, weights =
w, drop.unused.levels = TRUE) :
>>> #   variable lengths differ (found for '(weights)')
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>

Duncan Murdoch

2020-Aug-09 22:05 UTC

head link

[Rd] lm() takes weights from formula environment

On 09/08/2020 3:07 p.m., Duncan Murdoch wrote:> On 09/08/2020 3:01 p.m., John Mount wrote:
>> Doesn't this preclude "y ~ ." style notations?
> 
> Yes, but you can use "y ~ . - w".
And as was pointed out to me offline, often one doesn't have a simple 
vector w giving the weights, instead one computes the weights from the 
predictors.  So if weights = f(pred), the original "y ~ ." would be
fine.

Duncan Murdoch

Maybe Matching Threads

Search for more maybe matching threads

R devel - Aug 2020 - lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

[Rd] lm() takes weights from formula environment

Maybe Matching Threads