thr3ads.net - R devel - [Rd] RFC: tapply(*, ..., init.value = NA) [Feb 2017]

If this information is useful, please help other people find it:
Share via:

Suharto Anggono Suharto Anggono

2017-Feb-01 16:17 UTC

[Rd] RFC: tapply(*, ..., init.value = NA)

On 'aggregate data.frame', the URL should be
https://stat.ethz.ch/pipermail/r-help/2016-May/438631.html .

vector(typeof(ans))
(or  vector(storage.mode(ans)))
has length zero and can be used to initialize array.

Instead of
if(missing(default)) ,
if(identical(default, NA))
could be used. The documentation could then say, for example: "If default =
NA (the default), NA of appropriate storage mode (0 for raw) is automatically
used."
--------------------------------------------
On Wed, 1/2/17, Martin Maechler <maechler at stat.math.ethz.ch> wrote:

 Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)

 Cc: R-devel at r-project.org
 Date: Wednesday, 1 February, 2017, 12:14 AM
 >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
>>>>>     on Tue, 31 Jan 2017 15:43:53 +0000 writes:
    > Function 'aggregate.data.frame' in R has taken a different
route. With drop=FALSE, the function is also applied to subset corresponding to
combination of grouping variables that doesn't appear in the data (example 2
in https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

Interesting point (I couldn't easily find 'the example 2' though).
However, aggregate.data.frame() is a considerably more
sophisticated function and one goal was to change tapply() as
little as possible for compatibility (and maintenance!) reasons .

[snip]

    > With the code using
    >    if(missing(default)) ,
    > I consider the stated default value of 'default',
    >    default = NA ,
    > misleading because the code doesn't use it. 

I know and I also had thought about it and decided to keep it 
in the spirit of "self documentation" because  "in spirit",
the
default still *is* NA.

    > Also,
    >  tapply(1:3, 1:3, as.raw)
    > is not the same as
    >  tapply(1:3, 1:3, as.raw, default = NA) .
    > The accurate statement is the code in
    > if(missing(default)) ,
    > but it involves the local variable 'ans'.

exactly.  But putting that whole expression in there would look
confusing to those using  str(tapply), args(tapply) or similar
inspection to quickly get a glimpse of the function user "interface".
That's why we typically don't do that and rather slightly cheat
with the formal default, for the above "didactical" purposes.

If you are puristic about this, then missing() should almost never
be used when the function argument has a formal default.

I don't have a too strong opinion here, and we do have quite a
few other cases, where the formal default argument is not always
used because of   if(missing(.))  clauses.

I think I could be convinced to drop the '= NA' from the formal
argument list..

    > As far as I know, the result of function 'array' in is not a
classed object and the default method of  `[<-` will be used in the
'tapply' code portion.

    > As far as I know, the result of 'lapply' is a list without
class. So, 'unlist' applied to it uses the default method and the
'unlist' result is a vector or a factor.

You may be right here
  ((or not:  If a package author makes array() into an S3 generic and defines
    S3method(array, *) and she or another make tapply() into a
    generic with methods,  are we really sure that this code
    would not be used ??))

still, the as.raw example did not easily work without a warning
when using as.vector() .. or similar.

    > With the change, the result of

    > tapply(1:3, 1:3, factor, levels=3:1)

    > is of mode "character". The value is from the internal code,
not from the factor levels. It is worse than before the change, where it is
really the internal code, integer.

I agree that this change is not desirable.
One could argue that it was quite a "lucky coincidence" that the
previous
code returned the internal integer codes though..

[snip]

    > To initialize array, a zero-length vector can also be used.

yes, of course; but my  ans[0L][1L]  had the purpose to get the
correct mode specific version of NA .. which works for raw (by
getting '00' because "raw" has *no* NA!).

So it seems I need an additional   !is.factor(ans)  there ...
a bit ugly.

---------

[snip]

Martin Maechler

2017-Feb-04 15:48 UTC

head link

[Rd] RFC: tapply(*, ..., init.value = NA)

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
>>>>>     on Wed, 1 Feb 2017 16:17:06 +0000 writes:
    > On 'aggregate data.frame', the URL should be
    > https://stat.ethz.ch/pipermail/r-help/2016-May/438631.html .

thank you. Yes, using 'drop' makes sense there where the result
is always "linear(ized)" or "one-dimensional".
For tapply() that's only the case for 1D-index.

    > vector(typeof(ans)) (or vector(storage.mode(ans))) has
    > length zero and can be used to initialize array.  

Yes,.. unless in the case where ans is NULL.
You have convinced me, that is  nicer.

    > Instead of if(missing(default)) , if(identical(default,
    > NA)) could be used. The documentation could then say, for
    > example: "If default = NA (the default), NA of appropriate
    > storage mode (0 for raw) is automatically used."

After some thought (and experiments), I have reverted and no
longer use if(missing). You are right that it is not needed
(and even potentially confusing) here.

Changes are in svn c72106.

Martin Maechler


    > --------------------------------------------
    > On Wed, 1/2/17, Martin Maechler
    > <maechler at stat.math.ethz.ch> wrote:

    >  Subject: Re: [Rd] RFC: tapply(*, ..., init.value = NA)

    >  Cc: R-devel at r-project.org Date: Wednesday, 1 February,
    > 2017, 12:14 AM
 >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
>>>>>     on Tue, 31 Jan 2017 15:43:53 +0000 writes:
    >> Function 'aggregate.data.frame' in R has taken a
    >> different route. With drop=FALSE, the function is also
    >> applied to subset corresponding to combination of
    >> grouping variables that doesn't appear in the data
    >> (example 2 in
    >> https://stat.ethz.ch/pipermail/r-devel/2017-January/073678.html).

    > Interesting point (I couldn't easily find 'the example 2'
    > though).  However, aggregate.data.frame() is a
    > considerably more sophisticated function and one goal was
    > to change tapply() as little as possible for compatibility
    > (and maintenance!) reasons .

    > [snip]

    >> With the code using if(missing(default)) , I consider the
    >> stated default value of 'default', default = NA ,
    >> misleading because the code doesn't use it.

    > I know and I also had thought about it and decided to keep
    > it in the spirit of "self documentation" because "in
    > spirit", the default still *is* NA.

    >> Also, tapply(1:3, 1:3, as.raw) is not the same as
    >> tapply(1:3, 1:3, as.raw, default = NA) .  The accurate
    >> statement is the code in if(missing(default)) , but it
    >> involves the local variable 'ans'.

    > exactly.  But putting that whole expression in there would
    > look confusing to those using str(tapply), args(tapply) or
    > similar inspection to quickly get a glimpse of the
    > function user "interface".  That's why we typically
don't
    > do that and rather slightly cheat with the formal default,
    > for the above "didactical" purposes.

    > If you are puristic about this, then missing() should
    > almost never be used when the function argument has a
    > formal default.

    > I don't have a too strong opinion here, and we do have
    > quite a few other cases, where the formal default argument
    > is not always used because of if(missing(.))  clauses.

    > I think I could be convinced to drop the '= NA' from the
    > formal argument list..


    >> As far as I know, the result of function 'array' in is
    >> not a classed object and the default method of `[<-` will
    >> be used in the 'tapply' code portion.

    >> As far as I know, the result of 'lapply' is a list
    >> without class. So, 'unlist' applied to it uses the
    >> default method and the 'unlist' result is a vector or a
    >> factor.

    > You may be right here ((or not: If a package author makes
    > array() into an S3 generic and defines S3method(array, *)
    > and she or another make tapply() into a generic with
    > methods, are we really sure that this code would not be
    > used ??))

    > still, the as.raw example did not easily work without a
    > warning when using as.vector() .. or similar.

    >> With the change, the result of

    >> tapply(1:3, 1:3, factor, levels=3:1)

    >> is of mode "character". The value is from the internal
    >> code, not from the factor levels. It is worse than before
    >> the change, where it is really the internal code,
    >> integer.

    > I agree that this change is not desirable.  One could
    > argue that it was quite a "lucky coincidence" that the
    > previous code returned the internal integer codes though..


    > [snip]


    >> To initialize array, a zero-length vector can also be
    >> used.

    > yes, of course; but my ans[0L][1L] had the purpose to get
    > the correct mode specific version of NA .. which works for
    > raw (by getting '00' because "raw" has *no* NA!).

    > So it seems I need an additional !is.factor(ans) there ...
    > a bit ugly.


    > ---------

    > [snip]

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel

Reasonably Related Threads

Search for more maybe matching threads

R devel - Feb 2017 - RFC: tapply(*, ..., init.value = NA)

[Rd] RFC: tapply(*, ..., init.value = NA)

[Rd] RFC: tapply(*, ..., init.value = NA)

Reasonably Related Threads