Suharto Anggono Suharto Anggono
2016-Aug-21 10:44 UTC
[Rd] 'droplevels' inappropriate change
In R devel r71124, if 'x' is a factor, droplevels(x) gives factor(x, exclude = NULL) . In R 3.3.1, it gives factor(x) . If a factor 'x' has NA and levels of 'x' doesn't contain NA, factor(x) gives the expected result for droplevels(x) , but factor(x, exclude = NULL) doesn't. As I said in https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html , factor(x, exclude = NULL) adds NA as a level. Using factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) , like in the code of function `[.factor` (in the same file, factor.R, as 'droplevels'), is better. It is possible just to use x[, drop = TRUE] . For a factor 'x' that has NA level and also NA value, factor(x, exclude = NULL) is not perfect, though. It change NA to be associated with NA factor level.
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org> >>>>> on Sun, 21 Aug 2016 10:44:18 +0000 writes:> In R devel r71124, if 'x' is a factor, droplevels(x) gives > factor(x, exclude = NULL) . In R 3.3.1, it gives > factor(x) . > If a factor 'x' has NA and levels of 'x' doesn't contain > NA, factor(x) gives the expected result for droplevels(x) > , but factor(x, exclude = NULL) doesn't. As I said in > https://stat.ethz.ch/pipermail/r-devel/2016-May/072796.html > , factor(x, exclude = NULL) adds NA as a level. > Using factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) , > like in the code of function `[.factor` (in the > same file, factor.R, as 'droplevels'), is better. It is > possible just to use x[, drop = TRUE] . You are right. The change to droplevels() [in svn rev 71113 ] was not thorough enough, and I will commit a change that uses factor(x, exclude = if(anyNA(levels(x))) NULL else NA ) ------ > For a factor 'x' that has NA level and also NA value, i.e., one like this ? x <- factor(c(1, 2, NA, NA), exclude = NULL) ; is.na(x)[2] <- TRUE x # << two "different" NA's (in codes | w/ level) looking the same in print() stopifnot(identical(x, structure(as.integer(c(1, NA, 3, 3)), .Label = c("1", "2", NA), class = "factor"))) > factor(x, exclude = NULL) is not perfect, though. It > change NA to be associated with NA factor level. yes, it does, but why is that not good? The result of calling factor() on a factor 'f' should either be 'f' *or* a more regular version of 'f'. Now, for the above 'x' --- which I call "pathological", as it has two kinds of NA's but the user does not easily see that --- I am happy that both factor(x) # and factor(x, exlude = NULL) produce a "regularized" version of x: > dput(x) structure(c(1L, NA, 3L, 3L), .Label = c("1", "2", NA), class = "factor") > dput(factor(x)) structure(c(1L, NA, NA, NA), .Label = "1", class = "factor") > dput(factor(x, exclude=NULL)) structure(c(1L, 2L, 2L, 2L), .Label = c("1", NA), class = "factor") >
Possibly Parallel Threads
- 'droplevels' inappropriate change
- Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)
- [bug] droplevels() also drop object attributes (comment…)
- [bug] droplevels() also drop object attributes (comment…)
- 'droplevels' inappropriate change