Chris Black
2024-Feb-07 01:09 UTC
[Rd] [EXTERNAL] Re: NOTE: multiple local function definitions for ?fun? with different formal arguments
Hopefully to too much of a tangent: A related problem this check doesn?t catch
is accidental top-level redefinitions in package code, such as
## a.R:
helper <- function() 1
f <- function() {
helper()
}
# ?cool, f() must return 1"
## b.R:
helper <- function(x) 2
g <- function() {
helper()
}
# ?cool, g() must return 2"
## Runtime:
# > c(pkg::f(), pkg::g())
# [1] 2 2
# ?oh right, only the last definition of helper() is used?
I?ve seen several variants of this issue in code from folks who are new to
package development, especially if they're naively refactoring something
that started out as an interactively-run analysis. Collaborators who are puzzled
by it get my ?packages are collections of objects not sequences of expressions,
yes that needs to be in your mental model, here?s the link to RWE again? talk,
but I would be happy to be able to point them to a check result to go along with
it.
I don?t think this is grounds on its own to change a 20-year precedent, but in
case anyone is collecting wishlist reasons to make the check look harder...
Thanks,
Chris
> On Feb 6, 2024, at 3:17 PM, Martin Morgan <mtmorgan.xyz at gmail.com>
wrote:
>
> I went looking and found this in codetools, where it's been for 20
years
>
>
https://gitlab.com/luke-tierney/codetools/-/blame/master/R/codetools.R?ref_type=heads#L951
>
> I think the call stack in codetools is checkUsagePackage ->
checkUsageEnv -> checkUsage, and these are similarly established. The call
from the tools package
https://github.com/wch/r-source/blame/95146f0f366a36899e4277a6a722964a51b93603/src/library/tools/R/QC.R#L4585
is also quite old.
>
> I'm not sure this had been said explicitly, but perhaps the original
intent was to protect against accidentally redefining a local function.
Obviously one could do this with a local variable too, though that might less
often be an error?
>
> toto <- function(mode) {
> tata <- function(a, b) a * b # intended
> tata <- function(a, b) a / b # oops
> ?
> }
>
> Another workaround is to actually name the local functions
>
> toto <- function(mode) {
> tata <- function(a, b) a * b
> titi <- function(u, v, w) (u + v) / w
> if (mode == 1)
> tata
> else
> titi
> }
>
> ? or to use a switch statement
>
> toto <- function(mode) {
> ## fun <- switch(?) for use of `fun()` in toto
> switch(
> mode,
> tata = function(a, b) a * b,
> titi = function(u, v, w) (u + v) / w,
> stop("unknown `mode = '", mode, "'`")
> )
> }
>
> ? or similarly to write `fun <- if ? else ?`, assigning the result of
the `if` to `fun`. I guess this last formulation points to the fact that a more
careful analysis of Herv?'s original code means that `fun` can only take one
value (only one branch of the `if` can be taken) so there can only be one
version of `fun` in any invocation of `toto()`.
>
> Perhaps the local names (and string-valued 'mode') are suggestive
of special case, so serve as implicit documentation?
>
> Adding `?` to `tata` doesn't seem like a good idea; toto(1)(3, 5, 7) no
longer signals an error.
>
> There seems to be a lot in common with S3 and S4 methods, where `toto`
corresponds to the generic, `tata` and `titi` to methods. This
'dispatch' is brought out by using `switch()`. There is plenty of
opportunity for thinking that you're invoking one method but actually
you're invoking the other. For instance with dplyr, I like that I can tbl
|> print(n = 2) so much that I find myself doing this with data.frame df
|> print(n = 2), which is an error (`n` partially matches `na.print`, and 2
is not a valid value); both methods silently ignore the typo print(m = 2).
>
> Martin Morgan
>
> From: R-devel <r-devel-bounces at r-project.org> on behalf of Henrik
Bengtsson <henrik.bengtsson at gmail.com>
> Date: Tuesday, February 6, 2024 at 4:34?PM
> To: Izmirlian, Grant (NIH/NCI) [E] <izmirlig at mail.nih.gov>
> Cc: r-devel at r-project.org <r-devel at r-project.org>
> Subject: Re: [Rd] [EXTERNAL] Re: NOTE: multiple local function definitions
for ?fun? with different formal arguments
> Here's a dummy example that I think illustrates the problem:
>
> toto <- function() {
> if (runif(1) < 0.5)
> function(a) a
> else
> function(a,b) a+b
> }
>
>> fcn <- toto()
>> fcn(1,2)
> [1] 3
>> fcn <- toto()
>> fcn(1,2)
> [1] 3
>> fcn <- toto()
>> fcn(1,2)
> Error in fcn(1, 2) : unused argument (2)
>
> How can you use the returned function, if you get different arguments?
>
> In your example, you cannot use the returned function without knowing
> 'mode', or by inspecting the returned function. So, the warning is
> there to alert you to a potential bug. Anecdotally, I'm pretty sure
> this R CMD check NOTE has caught at least one such bug in one of
> my/our packages.
>
> If you want to keep the current design pattern, one approach could be
> to add ... to your function definitions:
>
> toto <- function(mode)
> {
> if (mode == 1)
> fun <- function(a, b, ...) a*b
> else
> fun <- function(u, v, w) (u + v) / w
> fun
> }
>
> to make sure that toto() returns functions that accept the same
> minimal number of arguments.
>
> /Henrik
>
> On Tue, Feb 6, 2024 at 1:15?PM Izmirlian, Grant (NIH/NCI) [E] via
> R-devel <r-devel at r-project.org> wrote:
>>
>> Because functions get called and therefore, the calling sequence
matters. It?s just protecting you from yourself, but as someone pointed out,
there?s a way to silence such notes.
>> G
>>
>>
>> From: Herv? Pag?s <hpages.on.github at gmail.com>
>> Sent: Tuesday, February 6, 2024 2:40 PM
>> To: Izmirlian, Grant (NIH/NCI) [E] <izmirlig at mail.nih.gov>;
Duncan Murdoch <murdoch.duncan at gmail.com>; r-devel at r-project.org
>> Subject: Re: [EXTERNAL] Re: [Rd] NOTE: multiple local function
definitions for ?fun? with different formal arguments
>>
>>
>> On 2/6/24 11:19, Izmirlian, Grant (NIH/NCI) [E] wrote:
>> The note refers to the fact that the function named ?fun? appears to be
defined in two different ways.
>>
>> Sure I get that. But how is that any different from a variable being
defined in two different ways like in
>>
>> if (mode == 1)
>> x <- -8
>> else
>> x <- 55
>>
>> This is such a common and perfectly fine pattern. Why would this be
considered a potential hazard when the variable is a function?
>>
>> H.
>>
>> From: Herv? Pag?s <hpages.on.github at
gmail.com><mailto:hpages.on.github at gmail.com>
>> Sent: Tuesday, February 6, 2024 2:17 PM
>> To: Duncan Murdoch <murdoch.duncan at
gmail.com><mailto:murdoch.duncan at gmail.com>; Izmirlian, Grant
(NIH/NCI) [E] <izmirlig at mail.nih.gov><mailto:izmirlig at
mail.nih.gov>; r-devel at r-project.org<mailto:r-devel at
r-project.org>
>> Subject: [EXTERNAL] Re: [Rd] NOTE: multiple local function definitions
for ?fun? with different formal arguments
>>
>>
>> Thanks. Workarounds are interesting but... what's the point of the
NOTE in the first place?
>>
>> H.
>> On 2/4/24 09:07, Duncan Murdoch wrote:
>> On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via R-devel
wrote:
>>
>>
>> Well you can see that yeast is exactly weekday you have. The way out
is to just not name the result
>>
>> I think something happened to your explanation...
>>
>>
>>
>>
>> toto <- function(mode)
>> {
>> ifelse(mode == 1,
>> function(a,b) a*b,
>> function(u, v, w) (u + v) / w)
>> }
>>
>> It's a bad idea to use ifelse() when you really want if() ... else
... . In this case it works, but it doesn't always. So the workaround
should be
>>
>>
>> toto <- function(mode)
>> {
>> if(mode == 1)
>> function(a,b) a*b
>> else
>> function(u, v, w) (u + v) / w
>> }
>>
>>
>>
>>
>>
>>
>> ________________________________
>> From: Grant Izmirlian <izmirlidroid at
gmail.com><mailto:izmirlidroid at gmail.com>
>> Date: Sun, Feb 4, 2024, 10:44 AM
>> To: "Izmirlian, Grant (NIH/NCI) [E]" <izmirlig at
mail.nih.gov><mailto:izmirlig at mail.nih.gov>
>> Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2
>>
>> Hi,
>>
>> I just ran into this 'R CMD check' NOTE for the first time:
>>
>> * checking R code for possible problems ... NOTE
>> toto: multiple local function definitions for ?fun? with different
>> formal arguments
>>
>> The "offending" code is something like this (simplified from
the real code):
>>
>> toto <- function(mode)
>> {
>> if (mode == 1)
>> fun <- function(a, b) a*b
>> else
>> fun <- function(u, v, w) (u + v) / w
>> fun
>> }
>>
>> Is that NOTE really intended? Hard to see why this code would be
>> considered "wrong".
>>
>> I know it's just a NOTE but still...
>>
>> I agree it's a false positive, but the issue is that you have a
function object in your function which can't be called unconditionally. The
workaround doesn't create such an object.
>>
>> Recognizing that your function never tries to call fun requires global
inspection of toto(), and most of the checks are based on local inspection.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-devel at r-project.org<mailto:R-devel at r-project.org> mailing
list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> --
>>
>> Herv? Pag?s
>>
>>
>>
>> Bioconductor Core Team
>>
>> hpages.on.github at gmail.com<mailto:hpages.on.github at
gmail.com>
>> CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and are
confident the content is safe.
>>
>>
>> --
>>
>> Herv? Pag?s
>>
>>
>>
>> Bioconductor Core Team
>>
>> hpages.on.github at gmail.com<mailto:hpages.on.github at
gmail.com>
>> CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you recognize the sender and are
confident the content is safe.
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
Duncan Murdoch
2024-Feb-07 14:05 UTC
[Rd] [EXTERNAL] Re: NOTE: multiple local function definitions for ?fun? with different formal arguments
I agree a note about this sort of change might be good.
I think it wouldn't be too hard to write such a check to detect simple
assignments using <- or =. If you also wanted to detect method
redefinitions, or redefinitions of functions stored in lists, etc., it
would be harder.
There's unexported code in the pkgload package that will get you the
list of R files in the correct collation order: pkgload:::find_code .
I don't know of such a function exported by some other package, but
there might be one. Once you have that list, you could parse each file
and look for top level assignments to a name, then look for duplicates
in the vector of names.
Here's a little script that finds cases where an R source file makes an
assignment to a variable with the same name as one that was used earlier:
# Assume we are in the top level directory of a package.
Rfiles <- pkgload:::find_code()
allnames <- character()
for (f in Rfiles) {
exprs <- parse(f)
names <- character(length(exprs))
for (i in seq_along(exprs)) {
expr <- exprs[[i]]
if (is.name(expr[[1]]) &&
deparse(expr[[1]]) %in% c("<-", "=") &&
is.name(expr[[2]])) {
names[i] <- deparse(expr[[2]])
}
}
names <- names[names != ""]
prev <- length(allnames)
allnames <- c(allnames, names)
dups <- which(duplicated(allnames))
dups <- dups[dups > prev]
if (any(dups)) {
cat("Duplicated names in ", basename(f), ":\n")
cat(paste(allnames[dups], collapse = ", "), "\n")
}
}
It could be made more fancy to report the locations of both the original
and the dup if you feel motivated.
Duncan Murdoch
On 06/02/2024 8:09 p.m., Chris Black wrote:> Hopefully to too much of a tangent: A related problem this check doesn?t
catch is accidental top-level redefinitions in package code, such as
>
> ## a.R:
> helper <- function() 1
>
> f <- function() {
> helper()
> }
> # ?cool, f() must return 1"
>
> ## b.R:
> helper <- function(x) 2
>
> g <- function() {
> helper()
> }
> # ?cool, g() must return 2"
>
> ## Runtime:
> # > c(pkg::f(), pkg::g())
> # [1] 2 2
> # ?oh right, only the last definition of helper() is used?
>
> I?ve seen several variants of this issue in code from folks who are new to
package development, especially if they're naively refactoring something
that started out as an interactively-run analysis. Collaborators who are puzzled
by it get my ?packages are collections of objects not sequences of expressions,
yes that needs to be in your mental model, here?s the link to RWE again? talk,
but I would be happy to be able to point them to a check result to go along with
it.
>
> I don?t think this is grounds on its own to change a 20-year precedent, but
in case anyone is collecting wishlist reasons to make the check look harder...
>
> Thanks,
> Chris
>
>> On Feb 6, 2024, at 3:17 PM, Martin Morgan <mtmorgan.xyz at
gmail.com> wrote:
>>
>> I went looking and found this in codetools, where it's been for 20
years
>>
>>
https://gitlab.com/luke-tierney/codetools/-/blame/master/R/codetools.R?ref_type=heads#L951
>>
>> I think the call stack in codetools is checkUsagePackage ->
checkUsageEnv -> checkUsage, and these are similarly established. The call
from the tools package
https://github.com/wch/r-source/blame/95146f0f366a36899e4277a6a722964a51b93603/src/library/tools/R/QC.R#L4585
is also quite old.
>>
>> I'm not sure this had been said explicitly, but perhaps the
original intent was to protect against accidentally redefining a local function.
Obviously one could do this with a local variable too, though that might less
often be an error?
>>
>> toto <- function(mode) {
>> tata <- function(a, b) a * b # intended
>> tata <- function(a, b) a / b # oops
>> ?
>> }
>>
>> Another workaround is to actually name the local functions
>>
>> toto <- function(mode) {
>> tata <- function(a, b) a * b
>> titi <- function(u, v, w) (u + v) / w
>> if (mode == 1)
>> tata
>> else
>> titi
>> }
>>
>> ? or to use a switch statement
>>
>> toto <- function(mode) {
>> ## fun <- switch(?) for use of `fun()` in toto
>> switch(
>> mode,
>> tata = function(a, b) a * b,
>> titi = function(u, v, w) (u + v) / w,
>> stop("unknown `mode = '", mode,
"'`")
>> )
>> }
>>
>> ? or similarly to write `fun <- if ? else ?`, assigning the result
of the `if` to `fun`. I guess this last formulation points to the fact that a
more careful analysis of Herv?'s original code means that `fun` can only
take one value (only one branch of the `if` can be taken) so there can only be
one version of `fun` in any invocation of `toto()`.
>>
>> Perhaps the local names (and string-valued 'mode') are
suggestive of special case, so serve as implicit documentation?
>>
>> Adding `?` to `tata` doesn't seem like a good idea; toto(1)(3, 5,
7) no longer signals an error.
>>
>> There seems to be a lot in common with S3 and S4 methods, where `toto`
corresponds to the generic, `tata` and `titi` to methods. This
'dispatch' is brought out by using `switch()`. There is plenty of
opportunity for thinking that you're invoking one method but actually
you're invoking the other. For instance with dplyr, I like that I can tbl
|> print(n = 2) so much that I find myself doing this with data.frame df
|> print(n = 2), which is an error (`n` partially matches `na.print`, and 2
is not a valid value); both methods silently ignore the typo print(m = 2).
>>
>> Martin Morgan
>>
>> From: R-devel <r-devel-bounces at r-project.org> on behalf of
Henrik Bengtsson <henrik.bengtsson at gmail.com>
>> Date: Tuesday, February 6, 2024 at 4:34?PM
>> To: Izmirlian, Grant (NIH/NCI) [E] <izmirlig at mail.nih.gov>
>> Cc: r-devel at r-project.org <r-devel at r-project.org>
>> Subject: Re: [Rd] [EXTERNAL] Re: NOTE: multiple local function
definitions for ?fun? with different formal arguments
>> Here's a dummy example that I think illustrates the problem:
>>
>> toto <- function() {
>> if (runif(1) < 0.5)
>> function(a) a
>> else
>> function(a,b) a+b
>> }
>>
>>> fcn <- toto()
>>> fcn(1,2)
>> [1] 3
>>> fcn <- toto()
>>> fcn(1,2)
>> [1] 3
>>> fcn <- toto()
>>> fcn(1,2)
>> Error in fcn(1, 2) : unused argument (2)
>>
>> How can you use the returned function, if you get different arguments?
>>
>> In your example, you cannot use the returned function without knowing
>> 'mode', or by inspecting the returned function. So, the
warning is
>> there to alert you to a potential bug. Anecdotally, I'm pretty
sure
>> this R CMD check NOTE has caught at least one such bug in one of
>> my/our packages.
>>
>> If you want to keep the current design pattern, one approach could be
>> to add ... to your function definitions:
>>
>> toto <- function(mode)
>> {
>> if (mode == 1)
>> fun <- function(a, b, ...) a*b
>> else
>> fun <- function(u, v, w) (u + v) / w
>> fun
>> }
>>
>> to make sure that toto() returns functions that accept the same
>> minimal number of arguments.
>>
>> /Henrik
>>
>> On Tue, Feb 6, 2024 at 1:15?PM Izmirlian, Grant (NIH/NCI) [E] via
>> R-devel <r-devel at r-project.org> wrote:
>>>
>>> Because functions get called and therefore, the calling sequence
matters. It?s just protecting you from yourself, but as someone pointed out,
there?s a way to silence such notes.
>>> G
>>>
>>>
>>> From: Herv? Pag?s <hpages.on.github at gmail.com>
>>> Sent: Tuesday, February 6, 2024 2:40 PM
>>> To: Izmirlian, Grant (NIH/NCI) [E] <izmirlig at
mail.nih.gov>; Duncan Murdoch <murdoch.duncan at gmail.com>; r-devel at
r-project.org
>>> Subject: Re: [EXTERNAL] Re: [Rd] NOTE: multiple local function
definitions for ?fun? with different formal arguments
>>>
>>>
>>> On 2/6/24 11:19, Izmirlian, Grant (NIH/NCI) [E] wrote:
>>> The note refers to the fact that the function named ?fun? appears
to be defined in two different ways.
>>>
>>> Sure I get that. But how is that any different from a variable
being defined in two different ways like in
>>>
>>> if (mode == 1)
>>> x <- -8
>>> else
>>> x <- 55
>>>
>>> This is such a common and perfectly fine pattern. Why would this be
considered a potential hazard when the variable is a function?
>>>
>>> H.
>>>
>>> From: Herv? Pag?s <hpages.on.github at
gmail.com><mailto:hpages.on.github at gmail.com>
>>> Sent: Tuesday, February 6, 2024 2:17 PM
>>> To: Duncan Murdoch <murdoch.duncan at
gmail.com><mailto:murdoch.duncan at gmail.com>; Izmirlian, Grant
(NIH/NCI) [E] <izmirlig at mail.nih.gov><mailto:izmirlig at
mail.nih.gov>; r-devel at r-project.org<mailto:r-devel at
r-project.org>
>>> Subject: [EXTERNAL] Re: [Rd] NOTE: multiple local function
definitions for ?fun? with different formal arguments
>>>
>>>
>>> Thanks. Workarounds are interesting but... what's the point of
the NOTE in the first place?
>>>
>>> H.
>>> On 2/4/24 09:07, Duncan Murdoch wrote:
>>> On 04/02/2024 10:55 a.m., Izmirlian, Grant (NIH/NCI) [E] via
R-devel wrote:
>>>
>>>
>>> Well you can see that yeast is exactly weekday you have. The way
out is to just not name the result
>>>
>>> I think something happened to your explanation...
>>>
>>>
>>>
>>>
>>> toto <- function(mode)
>>> {
>>> ifelse(mode == 1,
>>> function(a,b) a*b,
>>> function(u, v, w) (u + v) / w)
>>> }
>>>
>>> It's a bad idea to use ifelse() when you really want if() ...
else ... . In this case it works, but it doesn't always. So the workaround
should be
>>>
>>>
>>> toto <- function(mode)
>>> {
>>> if(mode == 1)
>>> function(a,b) a*b
>>> else
>>> function(u, v, w) (u + v) / w
>>> }
>>>
>>>
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Grant Izmirlian <izmirlidroid at
gmail.com><mailto:izmirlidroid at gmail.com>
>>> Date: Sun, Feb 4, 2024, 10:44 AM
>>> To: "Izmirlian, Grant (NIH/NCI) [E]" <izmirlig at
mail.nih.gov><mailto:izmirlig at mail.nih.gov>
>>> Subject: Fwd: [EXTERNAL] R-devel Digest, Vol 252, Issue 2
>>>
>>> Hi,
>>>
>>> I just ran into this 'R CMD check' NOTE for the first time:
>>>
>>> * checking R code for possible problems ... NOTE
>>> toto: multiple local function definitions for ?fun? with different
>>> formal arguments
>>>
>>> The "offending" code is something like this (simplified
from the real code):
>>>
>>> toto <- function(mode)
>>> {
>>> if (mode == 1)
>>> fun <- function(a, b) a*b
>>> else
>>> fun <- function(u, v, w) (u + v) / w
>>> fun
>>> }
>>>
>>> Is that NOTE really intended? Hard to see why this code would be
>>> considered "wrong".
>>>
>>> I know it's just a NOTE but still...
>>>
>>> I agree it's a false positive, but the issue is that you have a
function object in your function which can't be called unconditionally. The
workaround doesn't create such an object.
>>>
>>> Recognizing that your function never tries to call fun requires
global inspection of toto(), and most of the checks are based on local
inspection.
>>>
>>> Duncan Murdoch
>>>
>>> ______________________________________________
>>> R-devel at r-project.org<mailto:R-devel at r-project.org>
mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>> --
>>>
>>> Herv? Pag?s
>>>
>>>
>>>
>>> Bioconductor Core Team
>>>
>>> hpages.on.github at gmail.com<mailto:hpages.on.github at
gmail.com>
>>> CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you recognize the sender and are
confident the content is safe.
>>>
>>>
>>> --
>>>
>>> Herv? Pag?s
>>>
>>>
>>>
>>> Bioconductor Core Team
>>>
>>> hpages.on.github at gmail.com<mailto:hpages.on.github at
gmail.com>
>>> CAUTION: This email originated from outside of the organization. Do
not click links or open attachments unless you recognize the sender and are
confident the content is safe.
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel