thr3ads.net - R devel - [Rd] 'droplevels' inappropriate change [Aug 2016]

If this information is useful, please help other people find it:
Share via:

Suharto Anggono Suharto Anggono

2016-Aug-21 10:44 UTC

[Rd] 'droplevels' inappropriate change

In R devel r71124, if 'x' is a factor, droplevels(x) gives
factor(x, exclude = NULL) .
In R 3.3.1, it gives
factor(x) .

If a factor 'x' has NA and levels of 'x' doesn't contain NA,
factor(x) gives the expected result for droplevels(x) , but factor(x, exclude =
NULL) doesn't. As I said in
https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html , factor(x, exclude
= NULL) adds NA as a level.

Using
factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) ,
like in the code of function `[.factor` (in the same file, factor.R, as
'droplevels'), is better.
It is possible just to use
x[, drop = TRUE] .

For a factor 'x' that has NA level and also NA value, factor(x, exclude
= NULL) is not perfect, though. It change NA to be associated with NA factor
level.

Martin Maechler

2016-Aug-22 10:30 UTC

head link

[Rd] 'droplevels' inappropriate change

>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at
r-project.org>
>>>>>     on Sun, 21 Aug 2016 10:44:18 +0000 writes:
    > In R devel r71124, if 'x' is a factor, droplevels(x) gives
    > factor(x, exclude = NULL) .  In R 3.3.1, it gives
    > factor(x) .

    > If a factor 'x' has NA and levels of 'x' doesn't
contain
    > NA, factor(x) gives the expected result for droplevels(x)
    > , but factor(x, exclude = NULL) doesn't. As I said in
    > https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html
    > , factor(x, exclude = NULL) adds NA as a level.

    > Using factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) , 
    > like in the code of function `[.factor` (in the
    > same file, factor.R, as 'droplevels'), is better.  It is
    > possible just to use x[, drop = TRUE] .

You are right.  The change to droplevels() [in svn rev 71113 ]
was not thorough enough, and I will commit a change that uses

    factor(x, exclude = if(anyNA(levels(x))) NULL else NA )

------

    > For a factor 'x' that has NA level and also NA value,

i.e., one like this ?

x <- factor(c(1, 2, NA, NA), exclude = NULL) ; is.na(x)[2] <- TRUE
x # << two "different" NA's (in codes | w/ level) looking
the same in print()
stopifnot(identical(x, structure(as.integer(c(1, NA, 3, 3)),
				 .Label = c("1", "2", NA), class = "factor")))


    > factor(x, exclude = NULL) is not perfect, though. It
    > change NA to be associated with NA factor level.

yes, it does, but why is that not good?
The result of calling factor() on a factor 'f' should either be
'f'
*or* a more regular version of 'f'.

Now, for the above 'x' --- which I call "pathological", as it
has two kinds of NA's but the user does not easily see that ---
I am happy that both

  factor(x)               # and
  factor(x, exlude = NULL)

produce a "regularized" version of x:

  > dput(x)
  structure(c(1L, NA, 3L, 3L), .Label = c("1", "2", NA),
class = "factor")
  > dput(factor(x))
  structure(c(1L, NA, NA, NA), .Label = "1", class =
"factor")
  > dput(factor(x, exclude=NULL))
  structure(c(1L, 2L, 2L, 2L), .Label = c("1", NA), class =
"factor")
  >

Apparently Analagous Threads

Search for more reasonably related threads

R devel - Aug 2016 - 'droplevels' inappropriate change

[Rd] 'droplevels' inappropriate change

[Rd] 'droplevels' inappropriate change

Apparently Analagous Threads