Nathaniel Smith
2010-Jun-11 17:29 UTC
[R] passing constrasts=FALSE to contrast functions -- why does this exist?
Hello, I've noticed that all contrast functions, like contr.treatment, contr.poly, etc., take a logical argument called 'contrasts'. The default is TRUE, in which case they do their normal thing of returning a n x n-1 matrix whose columns are linearly-independent of the intercept. If contrasts=FALSE, they instead return an n x n matrix with full rank (usually the identity matrix, corresponding to "dummy" coding, but contr.poly returns orthogonal polynomials that include the zero-th order constant term, instead of starting with the linear term as it normally would). Why does this argument exist? My initial theory was that this was added to support the smart handling of redundancy in model matrix construction -- depending on what other terms exist in a formula, sometimes R will choose to contrast code a factor in n-1 columns, and sometimes it will choose to dummy code it in n columns. So it would make sense to call the contrast function with contrasts=TRUE in the former case and contrasts=FALSE in the latter case, and that way if the contrast function for some reason wanted a full-rank coding *besides* dummy coding then it could do that (like contr.poly). But in fact, when R decides it wants dummy coding, it doesn't call the contrast function, it just dummy codes unconditionally:> a <- factor(c("a", "b", "c")) > trace(contr.treatment) > invisible(model.matrix(~ a)) # contrast codedtrace: ctrfn(levels(x), contrasts = contrasts)> invisible(model.matrix(~ 0 + a)) # dummy coded >In fact, I can't find any code anywhere in R that ever uses contrasts=FALSE. So what's going on? Is this a bug and R *should* be using contrasts=FALSE to "dummy code" factors? Confusedly yours, -- Nathaniel