Suharto Anggono Suharto Anggono
2016-Aug-27 03:17 UTC
[Rd] 'droplevels' inappropriate change
In R devel r71157, 'droplevels' documentation, in "Arguments" section, says this about argument 'exclude'. passed to factor(); factor levels which should be excluded from the result even if present. Note that this was implicitly NA in R <= 3.3.1 which did drop NA levels even when present in x, contrary to the documentation. The current default is compatible with x[ , drop=FALSE]. The part x[ , drop=FALSE] should be x[ , drop=TRUE] Saying that 'exclude' is factor levels is not quite true for NA element. NA may be not an original level, but NA in 'exclude' affects the result. For a factor 'x', factor(x, exclude = exclude) doesn't really work for excluding in general. See, for example, https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html . factor(factor(c("a","b","c")), exclude="c") However, this excludes "2": factor(factor(2:3), exclude=2) Rather unexpectedly, this excludes NA: factor(factor(c("a",NA), exclude=NULL), exclude="c") For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or NA levels. An explanation is in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html .
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org> >>>>> on Sat, 27 Aug 2016 03:17:32 +0000 writes:> In R devel r71157, 'droplevels' documentation, in "Arguments" section, says this about argument 'exclude'. > passed to factor(); factor levels which should be excluded from the result even if present. Note that this was implicitly NA in R <= 3.3.1 which did drop NA levels even when present in x, contrary to the documentation. The current default is compatible with x[ , drop=FALSE]. > The part > x[ , drop=FALSE] > should be > x[ , drop=TRUE] Yes, definitely, thank you! a "typo" by me. .. fixed now. > Saying that 'exclude' is factor levels is not quite true for NA element. NA may be not an original level, but NA in 'exclude' affects the result. > For a factor 'x', factor(x, exclude = exclude) doesn't really work for excluding in general. See, for example, https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html . > factor(factor(c("a","b","c")), exclude="c") > However, this excludes "2": > factor(factor(2:3), exclude=2) > Rather unexpectedly, this excludes NA: > factor(factor(c("a",NA), exclude=NULL), exclude="c") > For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or NA levels. An explanation is in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html . Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5 years ago) is confirming the problem there, and suggesting (as you, right?) that actually `factor()` is not behaving correctly here. And your persistence is finally getting close to convince me that it is not just droplevels(), but factor() itself which needs care here. Interestingly, the following patch *does* pass 'make check-all' (after small change in tests/reg-tests-1b.R which is ok), and leads to behavior which is much closer to the documentation, notably for your two examples above would give what one would expect. (( If the R-Hub would support experiments with branches of R-devel from R-core members, I could just create such a branch and R Hub would run 'R CMD check <pkg>' for thousands of CRAN packages and provide a web page with the *differences* in the package check results ... so we could see ... )) I do agree that we should strongly consider such a change. Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: factor-excl.diff Type: text/x-diff Size: 637 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20160827/b62dd60f/attachment.bin>
>>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>> on Sat, 27 Aug 2016 18:55:37 +0200 writes:>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org> >>>>> on Sat, 27 Aug 2016 03:17:32 +0000 writes:>> In R devel r71157, 'droplevels' documentation, in "Arguments" section, says this about argument 'exclude'. >> passed to factor(); factor levels which should be excluded from the result even if present. Note that this was implicitly NA in R <= 3.3.1 which did drop NA levels even when present in x, contrary to the documentation. The current default is compatible with x[ , drop=FALSE]. >> The part >> x[ , drop=FALSE] >> should be >> x[ , drop=TRUE] > Yes, definitely, thank you! > a "typo" by me. .. fixed now. >> Saying that 'exclude' is factor levels is not quite true for NA element. NA may be not an original level, but NA in 'exclude' affects the result. >> For a factor 'x', factor(x, exclude = exclude) doesn't really work for excluding in general. See, for example, https://stat.ethz.ch/pipermail/r-help/2005-September/079336.html . >> factor(factor(c("a","b","c")), exclude="c") >> However, this excludes "2": >> factor(factor(2:3), exclude=2) >> Rather unexpectedly, this excludes NA: >> factor(factor(c("a",NA), exclude=NULL), exclude="c") >> For a factor 'x', factor(x, exclude = exclude) can only exclude integer-like or NA levels. An explanation is in https://stat.ethz.ch/pipermail/r-help/2011-April/276274.html . > Well, Peter Dalgaard (in that R-devel e-mail, a bit more than 5 > years ago) is confirming the problem there, and suggesting (as > you, right?) that actually `factor()` is not behaving > correctly here. > And your persistence is finally getting close to convince me > that it is not just droplevels(), but factor() itself which > needs care here. > Interestingly, the following patch *does* pass 'make check-all' > (after small change in tests/reg-tests-1b.R which is ok), > and leads to behavior which is much closer to the documentation, > notably for your two examples above would give what one would > expect. > (( If the R-Hub would support experiments with branches of R-devel > from R-core members, I could just create such a branch and R Hub > would run 'R CMD check <pkg>' for thousands of CRAN packages > and provide a web page with the *differences* in the package > check results ... so we could see ... )) > I do agree that we should strongly consider such a change. as nobody has commented, I've been liberal and have taken these no comments as consent. Hence I have committed ------------------------------------------------------------------------ r71178 | maechler | 2016-08-31 09:45:40 +0200 (Wed, 31 Aug 2016) | 1 line Changed paths: M /trunk/doc/NEWS.Rd M /trunk/src/library/base/R/factor.R M /trunk/src/library/base/man/factor.Rd M /trunk/tests/reg-tests-1b.R M /trunk/tests/reg-tests-1c.R factor(x, exclude) more "rational" when x or exclude are character ------------------------------------------------------------------------ which apart from documentation, examples, and regression tests is just the patch below. Martin Maechler ETH Zurich > --- factor.R (revision 71157) > +++ factor.R (working copy) > @@ -28,8 +28,12 @@ > levels <- unique(y[ind]) > } > force(ordered) # check if original x is an ordered factor > - exclude <- as.vector(exclude, typeof(x)) # may result in NA > - x <- as.character(x) > + if(!is.character(x)) { > + if(!is.character(exclude)) > + exclude <- as.vector(exclude, typeof(x)) # may result in NA > + x <- as.character(x) > + } else > + exclude <- as.vector(exclude, typeof(x)) # may result in NA > ## levels could be a long vectors, but match will not handle that. > levels <- levels[is.na(match(levels, exclude))] > f <- match(x, levels)
Possibly Parallel Threads
- Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)
- 'droplevels' inappropriate change
- 'droplevels' inappropriate change
- Coercion of 'exclude' in function 'factor' (was 'droplevels' inappropriate change)
- [bug] droplevels() also drop object attributes (comment…)