Suharto Anggono Suharto Anggono
2016-Sep-25 14:12 UTC
[Rd] Undocumented 'use.names' argument to c()
>From comments in http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653 : The code of c() and unlist() was formerly shared but has been (long time passing) separated. From July 30, 1998, is where do_c got split into do_c and do_unlist.With the implementation of 'c.Date' in R devel r71350, an argument named 'use.names' is included for concatenation. So, it doesn't follow the documented 'c'. But, 'c.Date' is not explicitly documented in Dates.Rd, that has 'c.Date' as an alias. -------------------------------------------- On Sat, 24/9/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote: Subject: Re: [Rd] Undocumented 'use.names' argument to c() To: "Karl Millar" <kmillar at google.com> Date: Saturday, 24 September, 2016, 9:12 PM >>>>> Karl Millar via R-devel <r-devel at r-project.org>>>>>> on Fri, 23 Sep 2016 11:12:49 -0700 writes:> I'd expect that a lot of the performance overhead could be eliminated > by simply improving the underlying code. IMHO, we should ignore it in > deciding the API that we want here. I agree partially. Even if the underlying code can be made faster, the 'use.names = FALSE' version will still be faster than the default, notably in some "long" cases. More further down. > On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson > <henrik.bengtsson at gmail.com> wrote: >> I'd vote for it to stay. It could of course suprise someone who'd >> expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1, b=2, >> use.names=FALSE). On the upside, is the performance gain from using >> use.names=FALSE. Below benchmarks show that the combining of the >> names attributes themselves takes ~20-25 times longer than the >> combining of the integers themselves. Also, at no surprise, >> use.names=FALSE avoids some memory allocations. >> >>> options(digits = 2) >>> >>> a <- b <- c <- d <- 1:1e4 >>> names(c) <- c >>> names(d) <- d >>> >>> stats <- microbenchmark::microbenchmark( >> + c(a, b, use.names=FALSE), >> + c(c, d, use.names=FALSE), >> + c(a, d, use.names=FALSE), >> + c(a, b, use.names=TRUE), >> + c(a, d, use.names=TRUE), >> + c(c, d, use.names=TRUE), >> + unit = "ms" >> + ) >>> >>> stats >> Unit: milliseconds >> expr min lq mean median uq max neval >> c(a, b, use.names = FALSE) 0.031 0.032 0.049 0.034 0.036 1.474 100 >> c(c, d, use.names = FALSE) 0.031 0.031 0.035 0.034 0.035 0.064 100 >> c(a, d, use.names = FALSE) 0.031 0.031 0.049 0.034 0.035 1.452 100 >> c(a, b, use.names = TRUE) 0.031 0.031 0.055 0.034 0.036 2.094 100 >> c(a, d, use.names = TRUE) 0.510 0.526 0.588 0.549 0.617 1.998 100 >> c(c, d, use.names = TRUE) 0.780 0.815 0.886 0.841 0.944 1.430 100 >> >>> profmem::profmem(c(c, d, use.names=FALSE)) >> Rprofmem memory profiling of: >> c(c, d, use.names = FALSE) >> >> Memory allocations: >> bytes calls >> 1 80040 <internal> >> total 80040 >> >>> profmem::profmem(c(c, d, use.names=TRUE)) >> Rprofmem memory profiling of: >> c(c, d, use.names = TRUE) >> >> Memory allocations: >> bytes calls >> 1 80040 <internal> >> 2 160040 <internal> >> total 240080 >> >> /Henrik >> >> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel >> <r-devel at r-project.org> wrote: >>> In Splus c() and unlist() called the same C code, but with a different >>> 'sys_index' code (the last argument to .Internal) and c() did not consider >>> an argument named 'use.names' special. Thank you, Bill, very much, for making the historical context clear, and giving us the facts, there. OTOH, it is also true in R, that c() and unlist() share code .. quite a bit less though .. but more importantly, the very original C code of Ross Ihaka (and possibly Robert Gentleman) had explicitly considered both extra arguments 'recursive' and 'use.names', and not just the first. The fact that c() has always been a .Primitive function and that these have no formals() had contributed to what I think to be a documentation glitch early on, and when, quite a bit later, we've added a fake argument list for printing, the then current documentation was used. This was the reason for declaring it a documentation "hole" rather than something we do not want. (read on) >>>> c >>> function(..., recursive = F) >>> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1) >>>> unlist >>> function(data, recursive = T, use.names = T) >>> .Internal(unlist(data, recursive = recursive, use.names = use.names), >>> "S_unlist", TRUE, 2) >>>> c(A=1,B=2,use.names=FALSE) >>> A B use.names >>> 1 2 0 >>> >>> The C code used sys_index==2 to mean 'the last argument is the 'use.names' >>> argument, if sys_index==1 only the recursive argument was considered >>> special. >>> >>> Sys.funs.c: >>> 405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator) >>> 406 { >>> 407 int which = sys_index; boolean named, recursive, names; >>> ... >>> 419 args = arglist->value.tree; n = arglist->length; >>> ... >>> 424 names = which==2 ? logical_value(args[--n], ent, S_evaluator) >>> : (which == 1); >>> >>> Thus there is no historical reason for giving c() the use.names argument. >>> >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via >>> R-devel <r-devel at r-project.org> wrote: >>> >>>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/ >>>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no 'use.names' >>>> argument. >>>> >>>> Because 'c' is a generic function, I don't think that changing formal >>>> arguments is good. >>>> >>>> In R devel r71344, 'use.names' is not an argument of functions 'c.Date', >>>> 'c.POSIXct' and 'c.difftime'. You are right, Suharto, that methods for c() currently have no such argument. But again because c() is primitive and has a '...' at the beginning, this does not explicitly hurt, currently, does it? >>>> Could 'use.names' be documented to be accepted by the default method of >>>> 'c', but not listed as a formal argument of 'c'? >>>> Or, could the code that handles the argument name >>>> 'use.names' be removed? In principle, of course both could happen, and if one of these two was preferable to the current state, I'd tend to the first one: Consider 'use.names [= FALSE]' just an argument of the default method for c(), so existing c() methods would not have a strong need for updating. Notably, as the S4 generic for c, via lines 48-49 of src/library/methods/R/BasicFunsList.R , "c" = structure(function(x, ..., recursive = FALSE) standardGeneric("c"), signature="x") has never had 'recursive' as part of the signature.. (and yes, that line 48 does need an update too !!!). Martin >>>> ---------------- >>>> >>>>> David Winsemius <dwinsemius at comcast.net> >>>> >>>>> on Tue, 20 Sep 2016 23:46:48 -0700 writes: >>>> >>>> >> On Sep 20, 2016, at 7:18 PM, Karl Millar via R-devel <r-devel at r-project.org> wrote: >>>> >> >>>> >> 'c' has an undocumented 'use.names' argument. I'm not sure if this >>>> is >>>> >> a documentation or implementation bug. >>>> >>>> > It came up on stackoverflow a couple of years ago: >>>> >>>> > http://stackoverflow.com/questions/24815572/why-does- >>>> function-c-accept-an-undocumented-argument/24815653#24815653 >>>> >>>> > At the time it appeared to me to be a documentation lag. >>>> >>>> Thank you, Karl and David, >>>> yes it is a documentation glitch ... and a bit more: Experts know that >>>> print()ing of primitive functions is, eehm, "special". >>>> >>>> I've committed a change to R-devel ... (with the intent to port >>>> to R-patched). >>>> >>>> Martin >>>> >>>> >> >>>> >>> c(a = 1) >>>> >> a >>>> >> 1 >>>> >>> c(a = 1, use.names = F) >>>> >> [1] 1 >>>> >> >>>> >> Karl >>>>
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org> >>>>> on Sun, 25 Sep 2016 14:12:10 +0000 writes:>> From comments in >> http://stackoverflow.com/questions/24815572/why-does-function-c-accept-an-undocumented-argument/24815653 >> : The code of c() and unlist() was formerly shared but >> has been (long time passing) separated. From July 30, >> 1998, is where do_c got split into do_c and do_unlist. > With the implementation of 'c.Date' in R devel r71350, an > argument named 'use.names' is included for > concatenation. So, it doesn't follow the documented > 'c'. But, 'c.Date' is not explicitly documented in > Dates.Rd, that has 'c.Date' as an alias. I do not see any c.Date in R-devel with a 'use.names'; its a base function, hence not hidden .. As mentioned before, 'use.names' is used in unlist() in quite a few places, and such an argument also exists for lengths() and all.equal.list() and now c() > -------------------------------------------- > On Sat, 24/9/16, Martin Maechler > <maechler at stat.math.ethz.ch> wrote: > Subject: Re: [Rd] Undocumented 'use.names' argument to > c() To: "Karl Millar" <kmillar at google.com> > Date: Saturday, 24 September, 2016, 9:12 PM >>>>>> Karl Millar via R-devel <r-devel at r-project.org>>>>>> on Fri, 23 Sep 2016 11:12:49 -0700 writes:>> I'd expect that a lot of the performance overhead could >> be eliminated by simply improving the underlying code. >> IMHO, we should ignore it in deciding the API that we >> want here. > I agree partially. Even if the underlying code can be > made faster, the 'use.names = FALSE' version will still be > faster than the default, notably in some "long" cases. > More further down. >> On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson >> <henrik.bengtsson at gmail.com> wrote: >>> I'd vote for it to stay. It could of course suprise >>> someone who'd expect c(list(a=1), b=2, use.names >>> FALSE) to generate list(a=1, b=2, use.names=FALSE). On >>> the upside, is the performance gain from using >>> use.names=FALSE. Below benchmarks show that the >>> combining of the names attributes themselves takes >>> ~20-25 times longer than the combining of the integers >>> themselves. Also, at no surprise, use.names=FALSE >>> avoids some memory allocations. >>> >>>> options(digits = 2) >>>> >>>> a <- b <- c <- d <- 1:1e4 names(c) <- c names(d) <- d >>>> >>>> stats <- microbenchmark::microbenchmark( >>> + c(a, b, use.names=FALSE), + c(c, d, use.names=FALSE), >>> + c(a, d, use.names=FALSE), + c(a, b, use.names=TRUE), + >>> c(a, d, use.names=TRUE), + c(c, d, use.names=TRUE), + >>> unit = "ms" + ) >>>> >>>> stats >>> Unit: milliseconds expr min lq mean median uq max neval >>> c(a, b, use.names = FALSE) 0.031 0.032 0.049 0.034 0.036 >>> 1.474 100 c(c, d, use.names = FALSE) 0.031 0.031 0.035 >>> 0.034 0.035 0.064 100 c(a, d, use.names = FALSE) 0.031 >>> 0.031 0.049 0.034 0.035 1.452 100 c(a, b, use.names >>> TRUE) 0.031 0.031 0.055 0.034 0.036 2.094 100 c(a, d, >>> use.names = TRUE) 0.510 0.526 0.588 0.549 0.617 1.998 >>> 100 c(c, d, use.names = TRUE) 0.780 0.815 0.886 0.841 >>> 0.944 1.430 100 >>> >>>> profmem::profmem(c(c, d, use.names=FALSE)) >>> Rprofmem memory profiling of: c(c, d, use.names = FALSE) >>> >>> Memory allocations: bytes calls 1 80040 <internal> total >>> 80040 >>> >>>> profmem::profmem(c(c, d, use.names=TRUE)) >>> Rprofmem memory profiling of: c(c, d, use.names = TRUE) >>> >>> Memory allocations: bytes calls 1 80040 <internal> 2 >>> 160040 <internal> total 240080 >>> >>> /Henrik >>> >>> On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via >>> R-devel <r-devel at r-project.org> wrote: >>>> In Splus c() and unlist() called the same C code, but >>>> with a different 'sys_index' code (the last argument to >>>> .Internal) and c() did not consider an argument named >>>> 'use.names' special. > Thank you, Bill, very much, for making the historical > context clear, and giving us the facts, there. > OTOH, it is also true in R, that c() and unlist() share > code .. quite a bit less though .. but more importantly, > the very original C code of Ross Ihaka (and possibly > Robert Gentleman) had explicitly considered both extra > arguments 'recursive' and 'use.names', and not just the > first. > The fact that c() has always been a .Primitive function > and that these have no formals() had contributed to what I > think to be a documentation glitch early on, and when, > quite a bit later, we've added a fake argument list for > printing, the then current documentation was used. > This was the reason for declaring it a documentation > "hole" rather than something we do not want. > (read on) >>>>> c >>>> function(..., recursive = F) .Internal(c(..., recursive >>>> = recursive), "S_unlist", TRUE, 1) >>>>> unlist >>>> function(data, recursive = T, use.names = T) >>>> .Internal(unlist(data, recursive = recursive, use.names >>>> = use.names), "S_unlist", TRUE, 2) >>>>> c(A=1,B=2,use.names=FALSE) >>>> A B use.names 1 2 0 >>>> >>>> The C code used sys_index==2 to mean 'the last argument >>>> is the 'use.names' argument, if sys_index==1 only the >>>> recursive argument was considered special. >>>> >>>> Sys.funs.c: 405 S_unlist(vector *ent, vector *arglist, >>>> s_evaluator *S_evaluator) 406 { 407 int which >>>> sys_index; boolean named, recursive, names; ... 419 >>>> args = arglist->value.tree; n = arglist->length; ... >>>> 424 names = which==2 ? logical_value(args[--n], ent, >>>> S_evaluator) : (which == 1); >>>> >>>> Thus there is no historical reason for giving c() the >>>> use.names argument. >>>> >>>> >>>> Bill Dunlap TIBCO Software wdunlap tibco.com >>>> >>>> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono >>>> Suharto Anggono via R-devel <r-devel at r-project.org> >>>> wrote: >>>> >>>>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/ >>>>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is >>>>> no 'use.names' argument. >>>>> >>>>> Because 'c' is a generic function, I don't think that >>>>> changing formal arguments is good. >>>>> >>>>> In R devel r71344, 'use.names' is not an argument of >>>>> functions 'c.Date', 'c.POSIXct' and 'c.difftime'. > You are right, Suharto, that methods for c() currently > have no such argument. > But again because c() is primitive and has a '...' at the > beginning, this does not explicitly hurt, currently, does > it? >>>>> Could 'use.names' be documented to be accepted by the >>>>> default method of 'c', but not listed as a formal >>>>> argument of 'c'? Or, could the code that handles the >>>>> argument name 'use.names' be removed? > In principle, of course both could happen, and if one of > these two was preferable to the current state, I'd tend to > the first one: Consider 'use.names [= FALSE]' just an > argument of the default method for c(), so existing c() > methods would not have a strong need for updating. > Notably, as the S4 generic for c, via lines 48-49 of > src/library/methods/R/BasicFunsList.R > , "c" = structure(function(x, ..., recursive = FALSE) > standardGeneric("c"), signature="x") > has never had 'recursive' as part of the signature.. (and > yes, that line 48 does need an update too !!!). > Martin >>>>> ---------------- >>>>> >>>>> David Winsemius <dwinsemius at comcast.net> >>>>> >>>>> on Tue, 20 Sep 2016 23:46:48 -0700 writes: >>>>> >>>>> >> On Sep 20, 2016, at 7:18 PM, Karl Millar via >>>>> R-devel <r-devel at r-project.org> wrote: >>>>> >> >>>>> >> 'c' has an undocumented 'use.names' argument. I'm >>>>> not sure if this is >> a documentation or >>>>> implementation bug. >>>>> >>>>> > It came up on stackoverflow a couple of years ago: >>>>> >>>>> > >>>>> http://stackoverflow.com/questions/24815572/why-does- >>>>> function-c-accept-an-undocumented-argument/24815653#24815653 >>>>> >>>>> > At the time it appeared to me to be a documentation >>>>> lag. >>>>> >>>>> Thank you, Karl and David, yes it is a documentation >>>>> glitch ... and a bit more: Experts know that >>>>> print()ing of primitive functions is, eehm, "special". >>>>> >>>>> I've committed a change to R-devel ... (with the >>>>> intent to port to R-patched). >>>>> >>>>> Martin >>>>> >>>>> >> >>>>> >>> c(a = 1) >> a >> 1 >>> c(a = 1, use.names = F) >> >>>>> [1] 1 >>>>> >> >>>>> >> Karl >>>>> > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel