Thaler, Thorn, LAUSANNE, Applied Mathematics
2011-Oct-21 12:57 UTC
[Rd] droplevels: drops contrasts as well
Dear all,
Today I figured out that there is a neat function called droplevels,
which, well, drops unused levels in a data frame. I tried the function
with some of my data sets and it turned out that not only the unused
levels were dropped but also the contrasts I set via "C". I had a look
into the code, and this behaviour arises from the fact that droplevels
uses simply factor to drop the unused levels, which uses the default
contrasts as set by options("contrasts").
I think this behaviour is annoying, because if one does not look
carefully enough, one looses the contrasts silently. Hence may I suggest
to change the code of droplevels to something like the following:
droplevels <- function (x, except = NULL, ...) {
ix <- vapply(x, is.factor, NA)
if (!is.null(except))
ix[except] <- FALSE
co <- lapply(x[ix], function(fa) attr(fa, "contrasts"))
x[ix] <- mapply(function(fa, co) {
if (nlevels(factor(fa)) == 1) {
factor(fa)
} else {
C(factor(fa), co)
}
}, x[ix], co, SIMPLIFY = FALSE)
x
}
which keeps the original contrasts AND drops the unused levels?
Similarly, droplevels.factor should be changed to
droplevels.factor <- function (x, ...) {
co <- attr(x, "contrasts")
if (nlevels(factor(x)) == 1) {
factor(x)
} else {
C(factor(x), co)
}
}
The nlevels statement is necessary since C does not work if there are
less than 2 levels.
Any comments appreciated.
KR,
-Thorn
On Fri, Oct 21, 2011 at 5:57 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics <Thorn.Thaler at rdls.nestle.com> wrote:> Dear all, > > Today I figured out that there is a neat function called droplevels, > which, well, drops unused levels in a data frame. I tried the function > with some of my data sets and it turned out that not only the unused > levels were dropped but also the contrasts I set via "C". I had a look > into the code, and this behaviour arises from the fact that droplevels > uses simply factor to drop the unused levels, which uses the default > contrasts as set by options("contrasts"). > > I think this behaviour is annoying, because if one does not look > carefully enough, one looses the contrasts silently. Hence may I suggest > to change the code of droplevels to something like the following:This silently changes the contrasts -- eg, if the first level of the factor is one of the empty levels, the reference level used by contr.treatment() will change. Also, if the contrasts are a matrix rather than specifying a contrast function, the matrix will be invalid for the the new factor. I think just having a warning would be better -- in general it's not clear what (if anything) it means to have the same contrasts on factors with different numbers of levels. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland