Ben Bolker
2022-Feb-02 02:21 UTC
[Rd] model.weights and model.offset: request for adjustment
The model.weights() and model.offset() functions from the 'stats' package index possibly-missing elements of a data frame via $, e.g. x$"(offset)" x$"(weights)" This returns NULL without comment when x is a data frame: x <- data.frame(a=1) x$"(offset)" ## NULL x$"(weights)" ## NULL However, when x is a tibble we get a warning as well: x <- tibble::as_tibble(x) x$"(offset)" ## NULL ## Warning message: ## Unknown or uninitialised column: `(offset)`. I know it's not R-core's responsibility to manage forward compatibility with tibbles, but in this case [[-indexing would seem to be better practice in any case. Might a patch be accepted ... ? cheers Ben Bolker
Martin Maechler
2022-Feb-03 11:14 UTC
[Rd] model.weights and model.offset: request for adjustment
>>>>> Ben Bolker >>>>> on Tue, 1 Feb 2022 21:21:46 -0500 writes:> The model.weights() and model.offset() functions from the 'stats' > package index possibly-missing elements of a data frame via $, e.g. > x$"(offset)" > x$"(weights)" > This returns NULL without comment when x is a data frame: > x <- data.frame(a=1) > x$"(offset)" ## NULL > x$"(weights)" ## NULL > However, when x is a tibble we get a warning as well: > x <- tibble::as_tibble(x) > x$"(offset)" > ## NULL > ## Warning message: > ## Unknown or uninitialised column: `(offset)`. > I know it's not R-core's responsibility to manage forward > compatibility with tibbles, but in this case [[-indexing would seem to > be better practice in any case. Yes, I would agree: we should use [[ instead of $ here in order to force exact matching just as principle Importantly, because also mf[["(weights)"]] will return NULL without a warning for a model/data frame, and it seems it does so also for tibbles. > Might a patch be accepted ... ? That would not be necessary. There's one remaining problem however: `$` access is clearly faster than `[[` for small data frames (because `$` is a primitive function doing everything in C, whereas `[[` calls the R level data frame method ). Faster in both cases, i.e., when there *is* a column and when there is none (and NULL is returned), e.g., for the first case> system.time(for(i in 1:20000) df[["a"]])user system elapsed 0.064 0.000 0.065> system.time(for(i in 1:20000) df$a)user system elapsed 0.009 0.000 0.009 So that's probably been the reason why `$` has been prefered? Martin > cheers > Ben Bolker