thr3ads.net - R devel - [Rd] model.weights and model.offset: request for adjustment [Feb 2022]

If this information is useful, please help other people find it:
Share via:

tim@t@yior m@iii@g oii hidde@eieph@@ts@co@uk

2022-Feb-03 11:30 UTC

[Rd] model.weights and model.offset: request for adjustment

> On 03/02/2022 11:14 Martin Maechler <maechler at stat.math.ethz.ch>
wrote:
> 
>  
> >>>>> Ben Bolker 
> >>>>>     on Tue, 1 Feb 2022 21:21:46 -0500 writes:
> 
>     > The model.weights() and model.offset() functions from the
'stats'
>     > package index possibly-missing elements of a data frame via $,
e.g.
> 
>     > x$"(offset)"
>     > x$"(weights)"
> 
>     > This returns NULL without comment when x is a data frame:
> 
>     > x <- data.frame(a=1)
>     > x$"(offset)"  ## NULL
>     > x$"(weights)"  ## NULL
> 
>     > However, when x is a tibble we get a warning as well:
> 
>     > x <- tibble::as_tibble(x)
>     > x$"(offset)"
>     > ## NULL
>     > ## Warning message:
>     > ## Unknown or uninitialised column: `(offset)`.
> 
>     > I know it's not R-core's responsibility to manage forward 
>     > compatibility with tibbles, but in this case [[-indexing would
seem to
>     > be better practice in any case.
> 
> Yes, I would agree:  we should use  [[ instead of $ here
> in order to force exact matching just as principle
> 
> Importantly, because  also  mf[["(weights)"]]
> will return  NULL without a warning for a model/data frame, and
> it seems it does so also for tibbles.
> 
>     > Might a patch be accepted ... ?
> 
> That would not be necessary.
> 
> There's one remaining problem however:
> `$` access is clearly faster than `[[` for small data frames
> (because `$` is a primitive function doing everything in C, 
>  whereas `[[` calls the R level data frame method ).
> 
> Faster in both cases, i.e., when there *is* a column and when there
> is none (and NULL is returned), e.g., for the first case
> 
> > system.time(for(i in 1:20000) df[["a"]])
>    user  system elapsed 
>   0.064   0.000   0.065 
> > system.time(for(i in 1:20000) df$a)
>    user  system elapsed 
>   0.009   0.000   0.009 
> 
> So that's probably been the reason why  `$`  has been prefered?
Would .subset2(df, "a) be preferable?
R> df <- mtcars
R> system.time(for(i in 1:20000) df[["hp"]])
   user  system elapsed 
  0.078   0.000   0.078 
R> system.time(for(i in 1:20000) df$hp)
   user  system elapsed 
  0.011   0.000   0.010 
R> system.time(for(i in 1:20000) .subset2(df,"hp"))
   user  system elapsed 
  0.004   0.000   0.004 
Tim

Martin Maechler

2022-Feb-03 14:21 UTC

head link

[Rd] model.weights and model.offset: request for adjustment

>>>>> tim taylor 
>>>>>     on Thu, 3 Feb 2022 11:30:17 +0000 (GMT) writes:
    >> On 03/02/2022 11:14 Martin Maechler <maechler at
stat.math.ethz.ch> wrote:
    >> 
    >> 
    >> >>>>> Ben Bolker 
    >> >>>>>     on Tue, 1 Feb 2022 21:21:46 -0500 writes:
    >> 
    >> > The model.weights() and model.offset() functions from the
'stats'
    >> > package index possibly-missing elements of a data frame via $,
e.g.
    >> 
    >> > x$"(offset)"
    >> > x$"(weights)"
    >> 
    >> > This returns NULL without comment when x is a data frame:
    >> 
    >> > x <- data.frame(a=1)
    >> > x$"(offset)"  ## NULL
    >> > x$"(weights)"  ## NULL
    >> 
    >> > However, when x is a tibble we get a warning as well:
    >> 
    >> > x <- tibble::as_tibble(x)
    >> > x$"(offset)"
    >> > ## NULL
    >> > ## Warning message:
    >> > ## Unknown or uninitialised column: `(offset)`.
    >> 
    >> > I know it's not R-core's responsibility to manage
forward
    >> > compatibility with tibbles, but in this case [[-indexing would
seem to
    >> > be better practice in any case.
    >> 
    >> Yes, I would agree:  we should use  [[ instead of $ here
    >> in order to force exact matching just as principle
    >> 
    >> Importantly, because  also  mf[["(weights)"]]
    >> will return  NULL without a warning for a model/data frame, and
    >> it seems it does so also for tibbles.
    >> 
    >> > Might a patch be accepted ... ?
    >> 
    >> That would not be necessary.
    >> 
    >> There's one remaining problem however:
    >> `$` access is clearly faster than `[[` for small data frames
    >> (because `$` is a primitive function doing everything in C, 
    >> whereas `[[` calls the R level data frame method ).
    >> 
    >> Faster in both cases, i.e., when there *is* a column and when there
    >> is none (and NULL is returned), e.g., for the first case
    >> 
    >> > system.time(for(i in 1:20000) df[["a"]])
    >> user  system elapsed 
    >> 0.064   0.000   0.065 
    >> > system.time(for(i in 1:20000) df$a)
    >> user  system elapsed 
    >> 0.009   0.000   0.009 
    >> 
    >> So that's probably been the reason why  `$`  has been prefered?

    > Would .subset2(df, "a) be preferable?

    R> df <- mtcars
    R> system.time(for(i in 1:20000) df[["hp"]])
    > user  system elapsed 
    > 0.078   0.000   0.078 
    R> system.time(for(i in 1:20000) df$hp)
    > user  system elapsed 
    > 0.011   0.000   0.010 
    R> system.time(for(i in 1:20000) .subset2(df,"hp"))
    > user  system elapsed 
    > 0.004   0.000   0.004 

    > Tim

Yes, I think that's a very good idea --

notably, as interestingly it seems to work with tibble's very
well, too.

Martin

R devel - Feb 2022 - model.weights and model.offset: request for adjustment

[Rd] model.weights and model.offset: request for adjustment

[Rd] model.weights and model.offset: request for adjustment