Suharto Anggono Suharto Anggono
2016-Sep-10 02:36 UTC
[Rd] table(exclude = NULL) always includes NA
Looking at the code of function 'table' in R devel r71227, I see that
the part "remove NA level if it was added only for excluded in factor(a,
exclude=.)" is not quite right.
In
is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) ,
I think that what is intended is
a[a0 %in% c(exclude,NA)] <- NA .
So, it should be
is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) > 0L
or
is.na(a) <- as.logical(match(a0, c(exclude,NA), nomatch=0L)) .
The parallel code
is.na(a) <- match(a0, exclude, nomatch=0L)
is to be treated similarly.
Example that gives wrong result in R devel r71225:
table(3:1, exclude = 1)
table(3:1, exclude = 1, useNA = "always")
--------------------------------------------
On Tue, 16/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Subject: Re: [Rd] table(exclude = NULL) always includes NA
Cc: "Martin Maechler" <maechler at stat.math.ethz.ch>
Date: Tuesday, 16 August, 2016, 5:42 PM
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Mon, 15 Aug 2016 12:35:41 +0200 writes:
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Mon, 15 Aug 2016 11:07:43 +0200 writes:
>>>>> on Sun, 14 Aug 2016 03:42:08 +0000 writes:
>>> useNA <- if (missing(useNA) && !missing(exclude)
&& !(NA %in% exclude)) "ifany"
>>> An example where it change 'table' result for
non-factor input, from
https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
>>> x <- c(1,2,3,3,NA)
>>> table(as.integer(x), exclude=NaN)
>>> I bring the example up, in case that the change in result is
not intended.
>> Thanks a lot, Suharto.
>> To me, the example is convincing that the change (I commited
>> Friday), svn rev 71087 & 71088, are a clear improvement:
>> (As you surely know, but not all the other readers:)
>> Before the change, the above example gave *different* results
>> for 'x' and 'as.integer(x)', the integer case
*not* counting the NAs,
>> whereas with the change in effect, they are the same:
>>> x <- as.integer(dx <- c(1,2,3,3,NA))
>>> table(x, exclude=NaN); table(dx, exclude=NaN)
>> x
>> 1 2 3 <NA>
>> 1 1 2 1
>> dx
>> 1 2 3 <NA>
>> 1 1 2 1
>>>
>> --
>> But the change has affected 6-8 (of the 8000+) CRAN packages
>> which I am investigating now and probably will be in contact with
the
>> package maintainers after that.
> There has been another bug in table(), since the time 'useNA'
> was introduced, which gives (in released R, R-patched, or R-devel):
>> table(1:3, exclude = 1, useNA = "ifany")
> 2 3 <NA>
> 1 1 1
>>
> and that bug now (in R-devel, after my changes) triggers in
> cases it did not previously, notably in
> table(1:3, exclude = 1)
> which now does set 'useNA = "ifany"' and so gives the
same silly
> result as the one above.
> The reason for this bug is that addNA(..) is called (in all R
> versions mentioned) in this case, but it should not.
> I'm currently testing yet another amendment..
which was not sufficient... so I had to do *much* more work.
The result is code which functions -- I hope -- uniformly better
than the current code, but unfortunately, code that is much longer.
After all I came to the conclusion that using addNA() was not
good enough [I did not yet consider *changing* addNA() itself,
even though the only place we use it in R's own packages is
inside table()] and so for now have code in table() that does
the equivalent of addNA() *but* does remember if addNA() did add
an NA level or not.
I also have extended the regression tests considerably,
*and* example(table) now reverts to give identical output to
R 3.3.1 (which it did no longer in R-devel (r 71088)).
I'm still investigating the CRAN package fallout (from the above
change 4 days ago) but plan to commit my (unfortunately
somewhat extensive) changes.
Also, I think this will become the first in this year's R-devel
SIGNIFICANT USER-VISIBLE CHANGES:
? ?table()? has been amended to be more internally consistent
and become back compatible to R <= 2.7.2 again.
Consequently, ?table(1:2, exclude=NULL)? no longer contains
a zero count for ?<NA>?, but ?useNA = "always"? continues to
do so.
--
Martin
>>>>> Suharto Anggono Suharto Anggono <suharto_anggono at yahoo.com> >>>>> on Sat, 10 Sep 2016 02:36:54 +0000 writes:> Looking at the code of function 'table' in R devel r71227, I see that the part "remove NA level if it was added only for excluded in factor(a, exclude=.)" is not quite right. > In > is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) , > I think that what is intended is > a[a0 %in% c(exclude,NA)] <- NA . yes. > So, it should be > is.na(a) <- match(a0, c(exclude,NA), nomatch=0L) > 0L > or > is.na(a) <- as.logical(match(a0, c(exclude,NA), nomatch=0L)) . > The parallel code > is.na(a) <- match(a0, exclude, nomatch=0L) > is to be treated similarly. indeed. I may have been very wrongly thinking that `is.na<-` coerced its value to logical... or otherwise not thinking at all ;-) > Example that gives wrong result in R devel r71225: > table(3:1, exclude = 1) > table(3:1, exclude = 1, useNA = "always") > -------------------------------------------- Thanks a lot, Suharto. You are entirely correct. I'm amazed that table(*, exclude = *) seems so rarely used / tested, that this has gone undetected for almost four weeks. It is fixed now with svn r71230. Martin