Suharto Anggono Suharto Anggono
2016-Aug-17 03:16 UTC
[Rd] table(exclude = NULL) always includes NA
The quirk as in table(1:3, exclude = 1, useNA = "ifany") is actually somewhat documented, and still in R devel r71104. In R help on 'table', in "Details" section: It is best to supply factors rather than rely on coercion. In particular, ?exclude? will be used in coercion to a factor, and so values (not levels) which appear in ?exclude? before coercion will be mapped to ?NA? rather than be discarded. Another part, above it: ?useNA? controls if the table includes counts of ?NA? values: .... Note that levels specified in ?exclude? are mapped to ?NA? and so included in ?NA? counts. The last statement is actually not true for an argument that is already a factor. -------------------------------------------- On Tue, 16/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote: Subject: Re: [Rd] table(exclude = NULL) always includes NA Cc: "Martin Maechler" <maechler at stat.math.ethz.ch> Date: Tuesday, 16 August, 2016, 5:42 PM>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Mon, 15 Aug 2016 12:35:41 +0200 writes:>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Mon, 15 Aug 2016 11:07:43 +0200 writes:>>>>> on Sun, 14 Aug 2016 03:42:08 +0000 writes:>>> useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "ifany" >>> An example where it change 'table' result for non-factor input, from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html : >>> x <- c(1,2,3,3,NA) >>> table(as.integer(x), exclude=NaN) >>> I bring the example up, in case that the change in result is not intended. >> Thanks a lot, Suharto. >> To me, the example is convincing that the change (I commited >> Friday), svn rev 71087 & 71088, are a clear improvement: >> (As you surely know, but not all the other readers:) >> Before the change, the above example gave *different* results >> for 'x' and 'as.integer(x)', the integer case *not* counting the NAs, >> whereas with the change in effect, they are the same: >>> x <- as.integer(dx <- c(1,2,3,3,NA)) >>> table(x, exclude=NaN); table(dx, exclude=NaN) >> x >> 1 2 3 <NA> >> 1 1 2 1 >> dx >> 1 2 3 <NA> >> 1 1 2 1 >>> >> -- >> But the change has affected 6-8 (of the 8000+) CRAN packages >> which I am investigating now and probably will be in contact with the >> package maintainers after that. > There has been another bug in table(), since the time 'useNA' > was introduced, which gives (in released R, R-patched, or R-devel): >> table(1:3, exclude = 1, useNA = "ifany") > 2 3 <NA> > 1 1 1 >> > and that bug now (in R-devel, after my changes) triggers in > cases it did not previously, notably in > table(1:3, exclude = 1) > which now does set 'useNA = "ifany"' and so gives the same silly > result as the one above. > The reason for this bug is that addNA(..) is called (in all R > versions mentioned) in this case, but it should not. > I'm currently testing yet another amendment.. which was not sufficient... so I had to do *much* more work. The result is code which functions -- I hope -- uniformly better than the current code, but unfortunately, code that is much longer. After all I came to the conclusion that using addNA() was not good enough [I did not yet consider *changing* addNA() itself, even though the only place we use it in R's own packages is inside table()] and so for now have code in table() that does the equivalent of addNA() *but* does remember if addNA() did add an NA level or not. I also have extended the regression tests considerably, *and* example(table) now reverts to give identical output to R 3.3.1 (which it did no longer in R-devel (r 71088)). I'm still investigating the CRAN package fallout (from the above change 4 days ago) but plan to commit my (unfortunately somewhat extensive) changes. Also, I think this will become the first in this year's R-devel SIGNIFICANT USER-VISIBLE CHANGES: ? ?table()? has been amended to be more internally consistent and become back compatible to R <= 2.7.2 again. Consequently, ?table(1:2, exclude=NULL)? no longer contains a zero count for ?<NA>?, but ?useNA = "always"? continues to do so. -- Martin
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org> >>>>> on Wed, 17 Aug 2016 03:16:52 +0000 writes:> The quirk as in table(1:3, exclude = 1, useNA = "ifany") is actually somewhat documented, and still in R devel r71104. yes, the documentation needs updating, too, thank you. > In R help on 'table', in "Details" section: > It is best to supply factors rather than rely on coercion. In particular, ?exclude? will be used in coercion to a factor, and so values (not levels) which appear in ?exclude? before coercion will be mapped to ?NA? rather than be discarded. > Another part, above it: > ?useNA? controls if the table includes counts of ?NA? values: .... Note that levels specified in ?exclude? are mapped to ?NA? and so included in ?NA? counts. > The last statement is actually not true for an argument that is already a factor. You are right. I plan to basically drop both these parts. So, whereas the code got more complicated, at least the documentation becomes simpler (because the functions behaves more "logical"). One more thing; I plan to add this paragraph to the 'Examples:' section : ## "pathological" case: d.patho <- addNA(c(1,NA,1:2,1:3))[-7]; is.na(d.patho) <- 3:4 d.patho ## just 3 consecutive NA's ? --- well, have *two* kinds of NAs here : as.integer(d.patho) # 1 4 NA NA 1 2 ## ## In R >= 3.4.0, table() allows to differentiate: table(d.patho) # counts the "unusual" NA table(d.patho, useNA = "ifany") # counts all three table(d.patho, exclude = NULL) # (ditto) table(d.patho, exclude = NA) # counts none If you read this and try it in R-devel (svn r >= 71101), > table(d.patho) # counts the "unusual" NA d.patho 1 2 3 <NA> 2 1 0 1 > table(d.patho, useNA = "ifany") # counts all three d.patho 1 2 3 <NA> 2 1 0 3 > table(d.patho, exclude = NULL) # (ditto) d.patho 1 2 3 <NA> 2 1 0 3 > table(d.patho, exclude = NA) # counts none d.patho 1 2 3 2 1 0 > you may find that indeed, one could desire "more symmetry" : Namely, we would want a way to only count the two "value-NA"s, i.e., return the 4th possible result > table(d.patho, ......) d.patho 1 2 3 <NA> 2 1 0 2>From a UI point of view, this should probably be achieved by aforth 'useNA' option .... but then, I'm *not* jumping to doing that right now but *will* update the table help-page, soon. Martin > -------------------------------------------- > On Tue, 16/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote: > Subject: Re: [Rd] table(exclude = NULL) always includes NA > Cc: "Martin Maechler" <maechler at stat.math.ethz.ch> > Date: Tuesday, 16 August, 2016, 5:42 PM>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Mon, 15 Aug 2016 12:35:41 +0200 writes:>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Mon, 15 Aug 2016 11:07:43 +0200 writes:>>>>> on Sun, 14 Aug 2016 03:42:08 +0000 writes:>>>> useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "ifany" >>>> An example where it change 'table' result for non-factor input, from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html : >>>> x <- c(1,2,3,3,NA) >>>> table(as.integer(x), exclude=NaN) >>>> I bring the example up, in case that the change in result is not intended. >>> Thanks a lot, Suharto. >>> To me, the example is convincing that the change (I commited >>> Friday), svn rev 71087 & 71088, are a clear improvement: >>> (As you surely know, but not all the other readers:) >>> Before the change, the above example gave *different* results >>> for 'x' and 'as.integer(x)', the integer case *not* counting the NAs, >>> whereas with the change in effect, they are the same: >>>> x <- as.integer(dx <- c(1,2,3,3,NA)) >>>> table(x, exclude=NaN); table(dx, exclude=NaN) >>> x >>> 1 2 3 <NA> >>> 1 1 2 1 >>> dx >>> 1 2 3 <NA> >>> 1 1 2 1 >>>> >>> -- >>> But the change has affected 6-8 (of the 8000+) CRAN packages >>> which I am investigating now and probably will be in contact with the >>> package maintainers after that. >> There has been another bug in table(), since the time 'useNA' >> was introduced, which gives (in released R, R-patched, or R-devel): >>> table(1:3, exclude = 1, useNA = "ifany") >> 2 3 <NA> >> 1 1 1 >>> >> and that bug now (in R-devel, after my changes) triggers in >> cases it did not previously, notably in >> table(1:3, exclude = 1) >> which now does set 'useNA = "ifany"' and so gives the same silly >> result as the one above. >> The reason for this bug is that addNA(..) is called (in all R >> versions mentioned) in this case, but it should not. >> I'm currently testing yet another amendment.. > which was not sufficient... so I had to do *much* more work. > The result is code which functions -- I hope -- uniformly better > than the current code, but unfortunately, code that is much longer. > After all I came to the conclusion that using addNA() was not > good enough [I did not yet consider *changing* addNA() itself, > even though the only place we use it in R's own packages is > inside table()] and so for now have code in table() that does > the equivalent of addNA() *but* does remember if addNA() did add > an NA level or not. > I also have extended the regression tests considerably, > *and* example(table) now reverts to give identical output to > R 3.3.1 (which it did no longer in R-devel (r 71088)). > I'm still investigating the CRAN package fallout (from the above > change 4 days ago) but plan to commit my (unfortunately > somewhat extensive) changes. > Also, I think this will become the first in this year's R-devel > SIGNIFICANT USER-VISIBLE CHANGES: > ? ?table()? has been amended to be more internally consistent > and become back compatible to R <= 2.7.2 again. > Consequently, ?table(1:2, exclude=NULL)? no longer contains > a zero count for ?<NA>?, but ?useNA = "always"? continues to > do so. > -- > Martin > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel