Martin Maechler
2016-Jun-04 17:32 UTC
[Rd] factors with non-unique ("duplicated") levels have been deprecated since 2009 -- are *more* deprecated now -- and why you should be hesitant misusing suppressWarnings()
>From this bug report (it's a proposal for speedup only, not a bug),https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16895#c6 the fact that you can construct factors with non-unique aka "duplicated" levels in R has been re-raised. As mentioned there, we had a small discussion here (on 'R-devel') a bit more than 7 years ago, where I had said that indeed R core had decided that factors with duplicated levels will be deprecated from R version 2.10.0 on ... indeed a while ago. As factors are not S4 objects, there is no really formal class definition and no inherent class validation, but even then in 2009, we had changed `levels<-` such that it raised a warning when the levels were not unique:> aba <- c("a","b","a"); x <- factor(aba, levels=aba)Warning message: In `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : duplicated levels in factors are deprecated>We've finally decided to make this an error in R-devel (which is planned for release, probably as R 3.4.0, in April 2017):> aba <- c("a","b","a"); x <- factor(aba, levels=aba)Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : factor level [3] is duplicated>If you know R well, you'll know that it is still very easy to construct factors in R with invalid levels. For this reason, also *printing* such factors now produces a warning:> f[1] 1 2 2 3 3 2 2 1 Levels: 1 2 2 3 Warning message: In print.factor(x) : duplicated level [3] in factor>---------------------------------------------------------------------------------------- We have found at least two packages that are affected by this change by no longer passing 'R CMD check' on R-devel: 1) plyr --- but there it is just a check which has previously checked the *warning* mentioned above, which now is an error. So only the check must be amended (quite easily) 2) MicroDatosEs: now fails in example(censo2010). and that is the reason for this posting: I would claim that it is not primarily the fault of 'MicroDatosEs' maintainer, but actually of a package that it depends on, 'memisc'. Now that has a "nice" S4 method for producing factor from "item.vector" (though I would find an as(..) method [defined via setAs(..)] much more natural than an 'as.factor()' method) :> selectMethod("as.factor", "item.vector")Method Definition: function (x) { labels <- x at value.labels if (length(labels)) { values <- labels at values labels <- labels at .Data } else { values <- labels <- sort(unique(x at .Data)) } filter <- x at value.filter use.levels <- if (length(filter)) is.valid2(values, filter) else TRUE f <- suppressWarnings(factor(x at .Data, levels = values[use.levels], labels = labels[use.levels])) if (length(attr(x, "contrasts"))) contrasts(f) <- contrasts(x) f } <environment: namespace:memisc> and the suppressWarnings(..) has "ensured" all these years since 2009 that users and package writer were never alerted to the programming "glitch" (of not ensuring levels/labels were correct. They should have seen that factor() was called sometimes in situations it produced an invalid factor namely one where some levels were duplicated, and so the memisc authors could have ensured that the above method would produce correct factors. Martin Maechler, R core team / ETH Zurich