thr3ads.net - R devel - [Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods [Jun 2025]

If this information is useful, please help other people find it:
Share via:

Mikael Jagan

2025-Jun-10 03:44 UTC

[Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods

I don't really understand the premise.  Any function F with '...' as
a formal
argument can pass '...' to another function G.  The actual arguments
matching
'...' in the call to F will be matched to the formal arguments of G.  So
the
the maintainer of F may want to alert the user of F to the existence of G and
the user of F may want to consult the documentation of G.

Whether F is S3 generic and G is registered as a method for F seems irrelevant.

That is a conceptual issue.  There are practical issues, too:

     * print.default is used "everywhere".  Backwards incompatible
changes to
       default behaviour have the potential to break a lot of code out there.

     * Testing that a function F is S3 generic seems nontrivial.  You have to
       deal with internally generic functions and for closures recurse through
       body(F) looking for a call to UseMethod.

     * I would not want the output of print(F) to depend on details external to
       F or the method call, such as the state of the table of registered S3
       methods which changes as packages are loaded.  AFAIK, it is intended that
       options() is the only exception to the rule.

     * More harmonious would be to implement the feature ("give me more
       information about S3 methods") as an option (disabled by default) of
       utils::.S3methods if not as a new function altogether.

Mikael
> Date: Fri, 6 Jun 2025 11:59:08 -0700
> From: Michael Chirico<michaelchirico4 at gmail.com>
> 
> There is a big difference in how to think of '...' for non-generic
> functions like data.frame() vs. S3 generics.
> 
> In the former, it means "any number of inputs" [e.g. columns]; in
the
> latter, it means "any number of inputs [think c()], as well as any
> arguments that might be interpreted by class implementations".
> 
> Understanding the difference for a given generic can require carefully
> reading lots of documentation. print(<generic>), which is useful for
> so many other contexts, can be a dead end.
> 
> One idea is to extend the print() method to suggest to the reader
> which other arguments are available (among registered generics). Often
> ?<generic> will include the most common implementation, but not
always
> so.
> 
> For rbind (in a --vanilla session), we currently have one method,
> rbind.data.frame, that offers three arguments not present in the
> generic: make.row.names, stringsAsFactors, and factor.exclude. The
> proposal would be to mention this in the print(rbind) output somehow,
> e.g.
> 
>> print(rbind)
> function (..., deparse.level = 1)
> .Internal(rbind(deparse.level, ...))
> <bytecode: 0x73d4fd824e20>
> <environment: namespace:base>
> 
> +Other arguments implemented by methods
> +  factor.exclude: rbind.data.frame
> +  make.row.names: rbind.data.frame
> +  stringsAsFactors: rbind.data.frame
> 
> I suggest grouping by argument, not generic, although something like
> this could be OK too:
> 
> +Signatures of other methods
> +  rbind.data.frame(..., deparse.level = 1, make.row.names = TRUE,
> stringsAsFactors = FALSE,
> +      factor.exclude = TRUE)
> 
> Where it gets more interesting is when there are many methods, e.g.
> for as.data.frame (again, in a --vanilla session):
> 
>> print(as.data.frame)
> function (x, row.names = NULL, optional = FALSE, ...)
> {
>      if (is.null(x))
>          return(as.data.frame(list()))
>      UseMethod("as.data.frame")
> }
> <bytecode: 0x73d4fc1e70d0>
> <environment: namespace:base>
> 
> +Other arguments implemented by methods
> +  base: as.data.frame.table
> +  check.names: as.data.frame.list
> +  col.names: as.data.frame.list
> +  cut.names: as.data.frame.list
> +  fix.empty.names: as.data.frame.list
> +  make.names: as.data.frame.matrix, as.data.frame.model.matrix
> +  new.names: as.data.frame.list
> +  nm: as.data.frame.bibentry, as.data.frame.complex, as.data.frame.Date,
> +    as.data.frame.difftime, as.data.frame.factor, as.data.frame.integer,
> +    as.data.frame.logical, as.data.frame.noquote, as.data.frame.numeric,
> +    as.data.frame.numeric_version, as.data.frame.ordered,
> +    as.data.frame.person, as.data.frame.POSIXct, as.data.frame.raw
> +  responseName: as.data.frame.table
> +  sep: as.data.frame.table
> +  stringsAsFactors: as.data.frame.character, as.data.frame.list,
> +    as.data.frame.matrix, as.data.frame.table
> 
> Or
> 
> +Signatures of other methods
> +  as.data.frame.aovproj(x, ...)
> +  as.data.frame.array(x, row.names = NULL, optional = FALSE, ...)
> +  as.data.frame.AsIs(x, row.names = NULL, optional = FALSE, ...)
> +  as.data.frame.bibentry(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.character(x, ..., stringsAsFactors = FALSE)
> +  as.data.frame.citation(x, row.names = NULL, optional = FALSE, ...)
> +  as.data.frame.complex(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.data.frame(x, row.names = NULL, ...)
> +  as.data.frame.Date(x, row.names = NULL, optional = FALSE, ..., nm >
deparse1(substitute(x)))
> +  as.data.frame.default(x, ...)
> +  as.data.frame.difftime(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.factor(x, row.names = NULL, optional = FALSE, ..., nm
> = deparse1(substitute(x)))
> +  as.data.frame.ftable(x, row.names = NULL, optional = FALSE, ...)
> +  as.data.frame.integer(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.list(x, row.names = NULL, optional = FALSE, ...,
> cut.names = FALSE,
> +      col.names = names(x), fix.empty.names = TRUE, new.names >
!missing(col.names),
> +      check.names = !optional, stringsAsFactors = FALSE)
> +  as.data.frame.logical(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.logLik(x, ...)
> +  as.data.frame.matrix(x, row.names = NULL, optional = FALSE,
> make.names = TRUE,
> +      ..., stringsAsFactors = FALSE)
> +  as.data.frame.model.matrix(x, row.names = NULL, optional = FALSE,
> make.names = TRUE,
> +      ...)
> +  as.data.frame.noquote(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.numeric(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.numeric_version(x, row.names = NULL, optional > FALSE,
..., nm = deparse1(substitute(x)))
> +  as.data.frame.ordered(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.person(x, row.names = NULL, optional = FALSE, ..., nm
> = deparse1(substitute(x)))
> +  as.data.frame.POSIXct(x, row.names = NULL, optional = FALSE, ...,
> nm = deparse1(substitute(x)))
> +  as.data.frame.POSIXlt(x, row.names = NULL, optional = FALSE, ...)
> +  as.data.frame.raw(x, row.names = NULL, optional = FALSE, ..., nm >
deparse1(substitute(x)))
> +  as.data.frame.table(x, row.names = NULL, ..., responseName >
"Freq", stringsAsFactors = TRUE,
> +      sep = "", base = list(LETTERS))
> +  as.data.frame.ts(x, ...)
> 
> Obviously that's a bit more cluttered, but as.data.frame() should be a
> pretty unusual case. It also highlights better the differences in the
> two approaches: the former economizes on space and focuses on what
> sorts of arguments are available; the latter shows the defaults, does
> not hide the arguments shared with the generic, and will always
> produce as many lines as there are methods.
> 
> There are other edge cases to think through (multiple registrations,
> interactions with S4, primitives, ...), but I want to first check with
> the list if this looks workable & valuable enough to pursue.
> 
> Mike C
> 
> ----
> 
> Code that helped with the above:
> 
> f = as.data.frame
> # NB: methods() and getAnywhere() require {utils}
> m = methods(f)
> generic_args = names(formals(f))
> f_methods = lapply(m, \(fn) getAnywhere(fn)$objs[[1L]])
> names(f_methods) = m
> new_args = sapply(f_methods, \(g) setdiff(names(formals(g)), generic_args))
> with( # group by argument name
>    data.frame(method = rep(names(new_args), lengths(new_args)), arg >
unlist(new_args), row.names=NULL),
>    {tbl = tapply(method, arg, toString); writeLines(paste0(names(tbl),
> ": ", tbl))}
> )
> signatures=sapply(f_methods, \(g) paste(head(format(args(g)), -1),
> collapse="\n"))
> writeLines(paste0(names(signatures), gsub("^\\s*function\\s*",
"", signatures)))

Michael Chirico

2025-Jun-10 16:15 UTC

head link

[Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods

Thanks for the thoughtful reply Mikael.
> Any function F with '...' as a formal argument can pass
'...' to another function G.
Yes, that's true. The difference is that in print(F) we can _usually_
pick out at a glance how '...' is being used -- we can see which
'G'
is getting '...'.

For S3 generics, we quickly reach the dead end of 'UseMethod' -- F
being S3 generic is in fact _highly_ relevant.

Yes, the practical issues you raise are interesting & knotty (I
especially have in mind [1] and [2]), but ultimately I think we could
come up with something useful. Whether that becomes a default can
depend on how useful it winds up being, and the empirical risk of
back-incompatibility (which I suspect is low).

Mike C

[1] utils::isS3stdGeneric
https://stat.ethz.ch/R-manual/R-devel/library/utils/html/isS3stdGen.html,
which has a large # of false negatives
[2] utils::nonS3methods
https://stat.ethz.ch/R-manual/R-devel/library/tools/html/QC.html,
which maintains an onerous list of S3 method lookalikes

On Mon, Jun 9, 2025 at 8:44?PM Mikael Jagan <jaganmn2 at gmail.com>
wrote:>
> I don't really understand the premise.  Any function F with
'...' as a formal
> argument can pass '...' to another function G.  The actual
arguments matching
> '...' in the call to F will be matched to the formal arguments of
G.  So the
> the maintainer of F may want to alert the user of F to the existence of G
and
> the user of F may want to consult the documentation of G.
>
> Whether F is S3 generic and G is registered as a method for F seems
irrelevant.
>
> That is a conceptual issue.  There are practical issues, too:
>
>      * print.default is used "everywhere".  Backwards
incompatible changes to
>        default behaviour have the potential to break a lot of code out
there.
>
>      * Testing that a function F is S3 generic seems nontrivial.  You have
to
>        deal with internally generic functions and for closures recurse
through
>        body(F) looking for a call to UseMethod.
>
>      * I would not want the output of print(F) to depend on details
external to
>        F or the method call, such as the state of the table of registered
S3
>        methods which changes as packages are loaded.  AFAIK, it is intended
that
>        options() is the only exception to the rule.
>
>      * More harmonious would be to implement the feature ("give me
more
>        information about S3 methods") as an option (disabled by
default) of
>        utils::.S3methods if not as a new function altogether.
>
> Mikael
>
> > Date: Fri, 6 Jun 2025 11:59:08 -0700
> > From: Michael Chirico<michaelchirico4 at gmail.com>
> >
> > There is a big difference in how to think of '...' for
non-generic
> > functions like data.frame() vs. S3 generics.
> >
> > In the former, it means "any number of inputs" [e.g.
columns]; in the
> > latter, it means "any number of inputs [think c()], as well as
any
> > arguments that might be interpreted by class implementations".
> >
> > Understanding the difference for a given generic can require carefully
> > reading lots of documentation. print(<generic>), which is useful
for
> > so many other contexts, can be a dead end.
> >
> > One idea is to extend the print() method to suggest to the reader
> > which other arguments are available (among registered generics). Often
> > ?<generic> will include the most common implementation, but not
always
> > so.
> >
> > For rbind (in a --vanilla session), we currently have one method,
> > rbind.data.frame, that offers three arguments not present in the
> > generic: make.row.names, stringsAsFactors, and factor.exclude. The
> > proposal would be to mention this in the print(rbind) output somehow,
> > e.g.
> >
> >> print(rbind)
> > function (..., deparse.level = 1)
> > .Internal(rbind(deparse.level, ...))
> > <bytecode: 0x73d4fd824e20>
> > <environment: namespace:base>
> >
> > +Other arguments implemented by methods
> > +  factor.exclude: rbind.data.frame
> > +  make.row.names: rbind.data.frame
> > +  stringsAsFactors: rbind.data.frame
> >
> > I suggest grouping by argument, not generic, although something like
> > this could be OK too:
> >
> > +Signatures of other methods
> > +  rbind.data.frame(..., deparse.level = 1, make.row.names = TRUE,
> > stringsAsFactors = FALSE,
> > +      factor.exclude = TRUE)
> >
> > Where it gets more interesting is when there are many methods, e.g.
> > for as.data.frame (again, in a --vanilla session):
> >
> >> print(as.data.frame)
> > function (x, row.names = NULL, optional = FALSE, ...)
> > {
> >      if (is.null(x))
> >          return(as.data.frame(list()))
> >      UseMethod("as.data.frame")
> > }
> > <bytecode: 0x73d4fc1e70d0>
> > <environment: namespace:base>
> >
> > +Other arguments implemented by methods
> > +  base: as.data.frame.table
> > +  check.names: as.data.frame.list
> > +  col.names: as.data.frame.list
> > +  cut.names: as.data.frame.list
> > +  fix.empty.names: as.data.frame.list
> > +  make.names: as.data.frame.matrix, as.data.frame.model.matrix
> > +  new.names: as.data.frame.list
> > +  nm: as.data.frame.bibentry, as.data.frame.complex,
as.data.frame.Date,
> > +    as.data.frame.difftime, as.data.frame.factor,
as.data.frame.integer,
> > +    as.data.frame.logical, as.data.frame.noquote,
as.data.frame.numeric,
> > +    as.data.frame.numeric_version, as.data.frame.ordered,
> > +    as.data.frame.person, as.data.frame.POSIXct, as.data.frame.raw
> > +  responseName: as.data.frame.table
> > +  sep: as.data.frame.table
> > +  stringsAsFactors: as.data.frame.character, as.data.frame.list,
> > +    as.data.frame.matrix, as.data.frame.table
> >
> > Or
> >
> > +Signatures of other methods
> > +  as.data.frame.aovproj(x, ...)
> > +  as.data.frame.array(x, row.names = NULL, optional = FALSE, ...)
> > +  as.data.frame.AsIs(x, row.names = NULL, optional = FALSE, ...)
> > +  as.data.frame.bibentry(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.character(x, ..., stringsAsFactors = FALSE)
> > +  as.data.frame.citation(x, row.names = NULL, optional = FALSE, ...)
> > +  as.data.frame.complex(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.data.frame(x, row.names = NULL, ...)
> > +  as.data.frame.Date(x, row.names = NULL, optional = FALSE, ..., nm
> > deparse1(substitute(x)))
> > +  as.data.frame.default(x, ...)
> > +  as.data.frame.difftime(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.factor(x, row.names = NULL, optional = FALSE, ..., nm
> > = deparse1(substitute(x)))
> > +  as.data.frame.ftable(x, row.names = NULL, optional = FALSE, ...)
> > +  as.data.frame.integer(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.list(x, row.names = NULL, optional = FALSE, ...,
> > cut.names = FALSE,
> > +      col.names = names(x), fix.empty.names = TRUE, new.names >
> !missing(col.names),
> > +      check.names = !optional, stringsAsFactors = FALSE)
> > +  as.data.frame.logical(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.logLik(x, ...)
> > +  as.data.frame.matrix(x, row.names = NULL, optional = FALSE,
> > make.names = TRUE,
> > +      ..., stringsAsFactors = FALSE)
> > +  as.data.frame.model.matrix(x, row.names = NULL, optional = FALSE,
> > make.names = TRUE,
> > +      ...)
> > +  as.data.frame.noquote(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.numeric(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.numeric_version(x, row.names = NULL, optional >
> FALSE, ..., nm = deparse1(substitute(x)))
> > +  as.data.frame.ordered(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.person(x, row.names = NULL, optional = FALSE, ..., nm
> > = deparse1(substitute(x)))
> > +  as.data.frame.POSIXct(x, row.names = NULL, optional = FALSE, ...,
> > nm = deparse1(substitute(x)))
> > +  as.data.frame.POSIXlt(x, row.names = NULL, optional = FALSE, ...)
> > +  as.data.frame.raw(x, row.names = NULL, optional = FALSE, ..., nm
> > deparse1(substitute(x)))
> > +  as.data.frame.table(x, row.names = NULL, ..., responseName >
> "Freq", stringsAsFactors = TRUE,
> > +      sep = "", base = list(LETTERS))
> > +  as.data.frame.ts(x, ...)
> >
> > Obviously that's a bit more cluttered, but as.data.frame() should
be a
> > pretty unusual case. It also highlights better the differences in the
> > two approaches: the former economizes on space and focuses on what
> > sorts of arguments are available; the latter shows the defaults, does
> > not hide the arguments shared with the generic, and will always
> > produce as many lines as there are methods.
> >
> > There are other edge cases to think through (multiple registrations,
> > interactions with S4, primitives, ...), but I want to first check with
> > the list if this looks workable & valuable enough to pursue.
> >
> > Mike C
> >
> > ----
> >
> > Code that helped with the above:
> >
> > f = as.data.frame
> > # NB: methods() and getAnywhere() require {utils}
> > m = methods(f)
> > generic_args = names(formals(f))
> > f_methods = lapply(m, \(fn) getAnywhere(fn)$objs[[1L]])
> > names(f_methods) = m
> > new_args = sapply(f_methods, \(g) setdiff(names(formals(g)),
generic_args))
> > with( # group by argument name
> >    data.frame(method = rep(names(new_args), lengths(new_args)), arg
> > unlist(new_args), row.names=NULL),
> >    {tbl = tapply(method, arg, toString); writeLines(paste0(names(tbl),
> > ": ", tbl))}
> > )
> > signatures=sapply(f_methods, \(g) paste(head(format(args(g)), -1),
> > collapse="\n"))
> > writeLines(paste0(names(signatures),
gsub("^\\s*function\\s*", "", signatures)))
>

R devel - Jun 2025 - Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods

[Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods

[Rd] Suggestion: default print method for S3 generics could offer some insights on '...' among registered methods