thr3ads.net - R devel - [Rd] Undocumented 'use.names' argument to c() [Sep 2016]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2016-Sep-23 17:54 UTC

[Rd] Undocumented 'use.names' argument to c()

I'd vote for it to stay.  It could of course suprise someone who'd
expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1, b=2,
use.names=FALSE).   On the upside, is the performance gain from using
use.names=FALSE.  Below benchmarks show that the combining of the
names attributes themselves takes ~20-25 times longer than the
combining of the integers themselves.  Also, at no surprise,
use.names=FALSE avoids some memory allocations.
> options(digits = 2)
>
> a <- b <- c <- d <- 1:1e4
> names(c) <- c
> names(d) <- d
>
> stats <- microbenchmark::microbenchmark(+   c(a, b, use.names=FALSE),
+   c(c, d, use.names=FALSE),
+   c(a, d, use.names=FALSE),
+   c(a, b, use.names=TRUE),
+   c(a, d, use.names=TRUE),
+   c(c, d, use.names=TRUE),
+   unit = "ms"
+ )>
> statsUnit: milliseconds
                       expr   min    lq  mean median    uq   max neval
 c(a, b, use.names = FALSE) 0.031 0.032 0.049  0.034 0.036 1.474   100
 c(c, d, use.names = FALSE) 0.031 0.031 0.035  0.034 0.035 0.064   100
 c(a, d, use.names = FALSE) 0.031 0.031 0.049  0.034 0.035 1.452   100
  c(a, b, use.names = TRUE) 0.031 0.031 0.055  0.034 0.036 2.094   100
  c(a, d, use.names = TRUE) 0.510 0.526 0.588  0.549 0.617 1.998   100
  c(c, d, use.names = TRUE) 0.780 0.815 0.886  0.841 0.944 1.430   100
> profmem::profmem(c(c, d, use.names=FALSE))Rprofmem memory profiling of:
c(c, d, use.names = FALSE)

Memory allocations:
      bytes      calls
1     80040 <internal>
total 80040
> profmem::profmem(c(c, d, use.names=TRUE))Rprofmem memory profiling of:
c(c, d, use.names = TRUE)

Memory allocations:
       bytes      calls
1      80040 <internal>
2     160040 <internal>
total 240080

/Henrik

On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel
<r-devel at r-project.org> wrote:> In Splus c() and unlist() called the same C code, but with a different
> 'sys_index'  code (the last argument to .Internal) and c() did not
consider
> an argument named 'use.names' special.
>
>> c
> function(..., recursive = F)
> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1)
>> unlist
> function(data, recursive = T, use.names = T)
> .Internal(unlist(data, recursive = recursive, use.names = use.names),
> "S_unlist", TRUE, 2)
>> c(A=1,B=2,use.names=FALSE)
>  A B use.names
>  1 2         0
>
> The C code used sys_index==2 to mean 'the last  argument is the
'use.names'
> argument, if sys_index==1 only the recursive argument was considered
> special.
>
> Sys.funs.c:
>  405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator)
>  406 {
>  407         int which = sys_index; boolean named, recursive, names;
>  ...
>  419         args = arglist->value.tree; n = arglist->length;
>  ...
>  424         names = which==2 ? logical_value(args[--n], ent, S_evaluator)
> : (which == 1);
>
> Thus there is no historical reason for giving c() the use.names argument.
>
>
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via
> R-devel <r-devel at r-project.org> wrote:
>
>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/
>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no
'use.names'
>> argument.
>>
>> Because 'c' is a generic function, I don't think that
changing formal
>> arguments is good.
>>
>> In R devel r71344, 'use.names' is not an argument of functions
'c.Date',
>> 'c.POSIXct' and 'c.difftime'.
>>
>> Could 'use.names' be documented to be accepted by the default
method of
>> 'c', but not listed as a formal argument of 'c'? Or,
could the code that
>> handles the argument name 'use.names' be removed?
>> ----------------
>> >>>>> David Winsemius <dwinsemius at comcast.net>
>> >>>>>     on Tue, 20 Sep 2016 23:46:48 -0700 writes:
>>
>>     >> On Sep 20, 2016, at 7:18 PM, Karl Millar via R-devel
<r-devel at
>> r-project.org> wrote:
>>     >>
>>     >> 'c' has an undocumented 'use.names'
argument.  I'm not sure if this
>> is
>>     >> a documentation or implementation bug.
>>
>>     > It came up on stackoverflow a couple of years ago:
>>
>>     > http://stackoverflow.com/questions/24815572/why-does-
>> function-c-accept-an-undocumented-argument/24815653#24815653
>>
>>     > At the time it appeared to me to be a documentation lag.
>>
>> Thank you, Karl and David,
>> yes it is a documentation glitch ... and a bit more:  Experts know that
>> print()ing of primitive functions is, eehm, "special".
>>
>> I've committed a change to R-devel ... (with the intent to port
>> to R-patched).
>>
>> Martin
>>
>>     >>
>>     >>> c(a = 1)
>>     >> a
>>     >> 1
>>     >>> c(a = 1, use.names = F)
>>     >> [1] 1
>>     >>
>>     >> Karl
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Karl Millar

2016-Sep-23 18:12 UTC

head link

[Rd] Undocumented 'use.names' argument to c()

I'd expect that a lot of the performance overhead could be eliminated
by simply improving the underlying code.  IMHO, we should ignore it in
deciding the API that we want here.

On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson
<henrik.bengtsson at gmail.com> wrote:> I'd vote for it to stay.  It could of course suprise someone who'd
> expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1, b=2,
> use.names=FALSE).   On the upside, is the performance gain from using
> use.names=FALSE.  Below benchmarks show that the combining of the
> names attributes themselves takes ~20-25 times longer than the
> combining of the integers themselves.  Also, at no surprise,
> use.names=FALSE avoids some memory allocations.
>
>> options(digits = 2)
>>
>> a <- b <- c <- d <- 1:1e4
>> names(c) <- c
>> names(d) <- d
>>
>> stats <- microbenchmark::microbenchmark(
> +   c(a, b, use.names=FALSE),
> +   c(c, d, use.names=FALSE),
> +   c(a, d, use.names=FALSE),
> +   c(a, b, use.names=TRUE),
> +   c(a, d, use.names=TRUE),
> +   c(c, d, use.names=TRUE),
> +   unit = "ms"
> + )
>>
>> stats
> Unit: milliseconds
>                        expr   min    lq  mean median    uq   max neval
>  c(a, b, use.names = FALSE) 0.031 0.032 0.049  0.034 0.036 1.474   100
>  c(c, d, use.names = FALSE) 0.031 0.031 0.035  0.034 0.035 0.064   100
>  c(a, d, use.names = FALSE) 0.031 0.031 0.049  0.034 0.035 1.452   100
>   c(a, b, use.names = TRUE) 0.031 0.031 0.055  0.034 0.036 2.094   100
>   c(a, d, use.names = TRUE) 0.510 0.526 0.588  0.549 0.617 1.998   100
>   c(c, d, use.names = TRUE) 0.780 0.815 0.886  0.841 0.944 1.430   100
>
>> profmem::profmem(c(c, d, use.names=FALSE))
> Rprofmem memory profiling of:
> c(c, d, use.names = FALSE)
>
> Memory allocations:
>       bytes      calls
> 1     80040 <internal>
> total 80040
>
>> profmem::profmem(c(c, d, use.names=TRUE))
> Rprofmem memory profiling of:
> c(c, d, use.names = TRUE)
>
> Memory allocations:
>        bytes      calls
> 1      80040 <internal>
> 2     160040 <internal>
> total 240080
>
> /Henrik
>
> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel
> <r-devel at r-project.org> wrote:
>> In Splus c() and unlist() called the same C code, but with a different
>> 'sys_index'  code (the last argument to .Internal) and c() did
not consider
>> an argument named 'use.names' special.
>>
>>> c
>> function(..., recursive = F)
>> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1)
>>> unlist
>> function(data, recursive = T, use.names = T)
>> .Internal(unlist(data, recursive = recursive, use.names = use.names),
>> "S_unlist", TRUE, 2)
>>> c(A=1,B=2,use.names=FALSE)
>>  A B use.names
>>  1 2         0
>>
>> The C code used sys_index==2 to mean 'the last  argument is the
'use.names'
>> argument, if sys_index==1 only the recursive argument was considered
>> special.
>>
>> Sys.funs.c:
>>  405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator)
>>  406 {
>>  407         int which = sys_index; boolean named, recursive, names;
>>  ...
>>  419         args = arglist->value.tree; n = arglist->length;
>>  ...
>>  424         names = which==2 ? logical_value(args[--n], ent,
S_evaluator)
>> : (which == 1);
>>
>> Thus there is no historical reason for giving c() the use.names
argument.
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via
>> R-devel <r-devel at r-project.org> wrote:
>>
>>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/
>>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no
'use.names'
>>> argument.
>>>
>>> Because 'c' is a generic function, I don't think that
changing formal
>>> arguments is good.
>>>
>>> In R devel r71344, 'use.names' is not an argument of
functions 'c.Date',
>>> 'c.POSIXct' and 'c.difftime'.
>>>
>>> Could 'use.names' be documented to be accepted by the
default method of
>>> 'c', but not listed as a formal argument of 'c'?
Or, could the code that
>>> handles the argument name 'use.names' be removed?
>>> ----------------
>>> >>>>> David Winsemius <dwinsemius at
comcast.net>
>>> >>>>>     on Tue, 20 Sep 2016 23:46:48 -0700 writes:
>>>
>>>     >> On Sep 20, 2016, at 7:18 PM, Karl Millar via R-devel
<r-devel at
>>> r-project.org> wrote:
>>>     >>
>>>     >> 'c' has an undocumented 'use.names'
argument.  I'm not sure if this
>>> is
>>>     >> a documentation or implementation bug.
>>>
>>>     > It came up on stackoverflow a couple of years ago:
>>>
>>>     > http://stackoverflow.com/questions/24815572/why-does-
>>> function-c-accept-an-undocumented-argument/24815653#24815653
>>>
>>>     > At the time it appeared to me to be a documentation lag.
>>>
>>> Thank you, Karl and David,
>>> yes it is a documentation glitch ... and a bit more:  Experts know
that
>>> print()ing of primitive functions is, eehm, "special".
>>>
>>> I've committed a change to R-devel ... (with the intent to port
>>> to R-patched).
>>>
>>> Martin
>>>
>>>     >>
>>>     >>> c(a = 1)
>>>     >> a
>>>     >> 1
>>>     >>> c(a = 1, use.names = F)
>>>     >> [1] 1
>>>     >>
>>>     >> Karl
>>>
>>> ______________________________________________
>>> R-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Martin Maechler

2016-Sep-24 14:12 UTC

head link

[Rd] Undocumented 'use.names' argument to c()

>>>>> Karl Millar via R-devel <r-devel at r-project.org>
>>>>>     on Fri, 23 Sep 2016 11:12:49 -0700 writes:
    > I'd expect that a lot of the performance overhead could be
eliminated
    > by simply improving the underlying code.  IMHO, we should ignore it in
    > deciding the API that we want here.

I agree partially.  Even if the underlying code can be made
faster, the 'use.names = FALSE' version will still be faster
than the default, notably in some "long" cases.

More further down.

    > On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson
    > <henrik.bengtsson at gmail.com> wrote:
    >> I'd vote for it to stay.  It could of course suprise someone
who'd
    >> expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1,
b=2,
    >> use.names=FALSE).   On the upside, is the performance gain from
using
    >> use.names=FALSE.  Below benchmarks show that the combining of the
    >> names attributes themselves takes ~20-25 times longer than the
    >> combining of the integers themselves.  Also, at no surprise,
    >> use.names=FALSE avoids some memory allocations.
    >> 
    >>> options(digits = 2)
    >>> 
    >>> a <- b <- c <- d <- 1:1e4
    >>> names(c) <- c
    >>> names(d) <- d
    >>> 
    >>> stats <- microbenchmark::microbenchmark(
    >> +   c(a, b, use.names=FALSE),
    >> +   c(c, d, use.names=FALSE),
    >> +   c(a, d, use.names=FALSE),
    >> +   c(a, b, use.names=TRUE),
    >> +   c(a, d, use.names=TRUE),
    >> +   c(c, d, use.names=TRUE),
    >> +   unit = "ms"
    >> + )
    >>> 
    >>> stats
    >> Unit: milliseconds
    >> expr   min    lq  mean median    uq   max neval
    >> c(a, b, use.names = FALSE) 0.031 0.032 0.049  0.034 0.036 1.474  
100
    >> c(c, d, use.names = FALSE) 0.031 0.031 0.035  0.034 0.035 0.064  
100
    >> c(a, d, use.names = FALSE) 0.031 0.031 0.049  0.034 0.035 1.452  
100
    >> c(a, b, use.names = TRUE) 0.031 0.031 0.055  0.034 0.036 2.094  
100
    >> c(a, d, use.names = TRUE) 0.510 0.526 0.588  0.549 0.617 1.998  
100
    >> c(c, d, use.names = TRUE) 0.780 0.815 0.886  0.841 0.944 1.430  
100
    >> 
    >>> profmem::profmem(c(c, d, use.names=FALSE))
    >> Rprofmem memory profiling of:
    >> c(c, d, use.names = FALSE)
    >> 
    >> Memory allocations:
    >> bytes      calls
    >> 1     80040 <internal>
    >> total 80040
    >> 
    >>> profmem::profmem(c(c, d, use.names=TRUE))
    >> Rprofmem memory profiling of:
    >> c(c, d, use.names = TRUE)
    >> 
    >> Memory allocations:
    >> bytes      calls
    >> 1      80040 <internal>
    >> 2     160040 <internal>
    >> total 240080
    >> 
    >> /Henrik
    >> 
    >> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel
    >> <r-devel at r-project.org> wrote:
    >>> In Splus c() and unlist() called the same C code, but with a
different
    >>> 'sys_index'  code (the last argument to .Internal) and
c() did not consider
    >>> an argument named 'use.names' special.

Thank you, Bill, very much, for making the historical context
clear, and giving us the facts, there.

OTOH, it is also true in R, that  c() and unlist() share code
.. quite a bit less though .. but more importantly, the very
original C code of Ross Ihaka (and possibly Robert Gentleman)
had explicitly considered both extra arguments 'recursive' and
'use.names', and not just the first.

The fact that c() has always been a .Primitive function and that
these have no formals()  had contributed to what I think to be a
documentation glitch early on, and when, quite a bit later, we've
added a fake argument list for printing, the then current
documentation was used.

This was the reason for declaring it a documentation "hole"
rather than something we do not want.

(read on)

    >>>> c
    >>> function(..., recursive = F)
    >>> .Internal(c(..., recursive = recursive), "S_unlist",
TRUE, 1)
    >>>> unlist
    >>> function(data, recursive = T, use.names = T)
    >>> .Internal(unlist(data, recursive = recursive, use.names =
use.names),
    >>> "S_unlist", TRUE, 2)
    >>>> c(A=1,B=2,use.names=FALSE)
    >>> A B use.names
    >>> 1 2         0
    >>> 
    >>> The C code used sys_index==2 to mean 'the last  argument is
the 'use.names'
    >>> argument, if sys_index==1 only the recursive argument was
considered
    >>> special.
    >>> 
    >>> Sys.funs.c:
    >>> 405 S_unlist(vector *ent, vector *arglist, s_evaluator
*S_evaluator)
    >>> 406 {
    >>> 407         int which = sys_index; boolean named, recursive,
names;
    >>> ...
    >>> 419         args = arglist->value.tree; n =
arglist->length;
    >>> ...
    >>> 424         names = which==2 ? logical_value(args[--n], ent,
S_evaluator)
    >>> : (which == 1);
    >>> 
    >>> Thus there is no historical reason for giving c() the use.names
argument.
    >>> 
    >>> 
    >>> Bill Dunlap
    >>> TIBCO Software
    >>> wdunlap tibco.com
    >>> 
    >>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto
Anggono via
    >>> R-devel <r-devel at r-project.org> wrote:
    >>> 
    >>>> In S-PLUS 3.4 help on 'c'
(http://www.uni-muenster.de/
    >>>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no
'use.names'
    >>>> argument.
    >>>> 
    >>>> Because 'c' is a generic function, I don't
think that changing formal
    >>>> arguments is good.
    >>>> 
    >>>> In R devel r71344, 'use.names' is not an argument
of functions 'c.Date',
    >>>> 'c.POSIXct' and 'c.difftime'.
You are right, Suharto, that methods for c() currently have no
such argument.

But again because c() is primitive and has a '...' at the
beginning, this does not explicitly hurt, currently, does it?

    >>>> Could 'use.names' be documented to be accepted by
the default method of
    >>>> 'c', but not listed as a formal argument of
'c'?
    >>>> Or, could the code that handles the argument name
    >>>> 'use.names' be removed? 

In principle, of course both could happen, and if one of these
two was preferable to the current state, I'd tend to the first one:
Consider 'use.names [= FALSE]' just an argument of the default
method for c(),  so existing c() methods would not have a strong need
for updating.

Notably, as the S4 generic for c,
via lines 48-49 of src/library/methods/R/BasicFunsList.R

, "c" = structure(function(x, ..., recursive = FALSE)
standardGeneric("c"),
                  signature="x")

has never had 'recursive' as part of the signature..
(and yes, that line 48 does need an update too !!!).

Martin

    >>>> ----------------
    >>>> >>>>> David Winsemius <dwinsemius at
comcast.net>
    >>>> >>>>>     on Tue, 20 Sep 2016 23:46:48 -0700
writes:
    >>>> 
    >>>> >> On Sep 20, 2016, at 7:18 PM, Karl Millar via
R-devel <r-devel at
    r-project.org> wrote:
    >>>> >>
    >>>> >> 'c' has an undocumented
'use.names' argument.  I'm not sure if this
    >>>> is
    >>>> >> a documentation or implementation bug.
    >>>> 
    >>>> > It came up on stackoverflow a couple of years ago:
    >>>> 
    >>>> > http://stackoverflow.com/questions/24815572/why-does-
    >>>>
function-c-accept-an-undocumented-argument/24815653#24815653
    >>>> 
    >>>> > At the time it appeared to me to be a documentation
lag.
    >>>> 
    >>>> Thank you, Karl and David,
    >>>> yes it is a documentation glitch ... and a bit more: 
Experts know that
    >>>> print()ing of primitive functions is, eehm,
"special".
    >>>> 
    >>>> I've committed a change to R-devel ... (with the intent
to port
    >>>> to R-patched).
    >>>> 
    >>>> Martin
    >>>> 
    >>>> >>
    >>>> >>> c(a = 1)
    >>>> >> a
    >>>> >> 1
    >>>> >>> c(a = 1, use.names = F)
    >>>> >> [1] 1
    >>>> >>
    >>>> >> Karl
    >>>>

Reasonably Related Threads

Search for more apparently analagous threads

R devel - Sep 2016 - Undocumented 'use.names' argument to c()

[Rd] Undocumented 'use.names' argument to c()

[Rd] Undocumented 'use.names' argument to c()

[Rd] Undocumented 'use.names' argument to c()

Reasonably Related Threads