Hi:
(1) lm() drops columns in a rank-deficient model matrix X to make X'X
nonsingular - this is called a full-rank reparameterization of the linear
model.
(2) How many columns of X are dropped depends on its rank, which in turn
depends on the number of constraints in the model matrix. This is related to
the degrees of freedom associated with each term in the corresponding linear
model.. The default contrasts are> options()$contrasts
unordered ordered
"contr.treatment" "contr.poly"
Other choices include contr.helmert() and contr.sum(). See the help page
?contrasts for further details. See also section 6.2 of Venables and
Ripley's _Modern Applied Statistics with S, 4th ed._ for further information
on the connection between the columns of the model matrix in ANOVA and the
defined sets of contrasts.
Under the default contrasts, the first column is dropped for the main effect
terms. Here's a simple example of a balanced 2-way ANOVA with interaction:
d <- data.frame(a = factor(rep(letters[1:3], each = 4)),
b = factor(rep(rep(1:2, each = 2), 3)),
c = rnorm(12))> d
a b c
1 a 1 -0.77367688
2 a 1 -0.79069791
3 a 2 0.69257133
4 a 2 2.46788204
5 b 1 0.38892289
6 b 1 -0.03521033
7 b 2 -0.01071611
8 b 2 -0.74209425
9 c 1 1.36974281
10 c 1 -1.22775441
11 c 2 0.29621976
12 c 2 0.28208192
m <- aov(c ~ a * b, data = d)
model.matrix(m)
(Intercept) ab ac b2 ab:b2 ac:b2
1 1 0 0 0 0 0
2 1 0 0 0 0 0
3 1 0 0 1 0 0
4 1 0 0 1 0 0
5 1 1 0 0 0 0
6 1 1 0 0 0 0
7 1 1 0 1 1 0
8 1 1 0 1 1 0
9 1 0 1 0 0 0
10 1 0 1 0 0 0
11 1 0 1 1 0 1
12 1 0 1 1 0 1
attr(,"assign")
[1] 0 1 1 2 3 3
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"
attr(,"contrasts")$b
[1] "contr.treatment"
Notice that the first column of each main effect is dropped, and that the
interaction columns are the products of the retained a columns with the
retained b columns. The attr() components verify that the contrast method
used for this matrix is the default contr.treatment.
> anova(m)
Analysis of Variance Table
Response: c
Df Sum Sq Mean Sq F value Pr(>F)
a 2 0.5001 0.25003 0.2827 0.7633
b 1 1.3700 1.36999 1.5489 0.2597
a:b 2 4.5647 2.28235 2.5804 0.1554
Residuals 6 5.3070 0.88450
Examination of the degrees of freedom tells us that there are two
independent contrasts for a, one independent contrast for b and two
independent contrasts for the interaction of a and b, which are shown in
model.matrix(m) above.
If you want to reset the baselline factor level, see ?relevel. Also look
into the contrast package on CRAN.
HTH,
Dennis
On Wed, Jul 21, 2010 at 8:40 AM, Anirban Mukherjee
<am253@cornell.edu>wrote:
> Hi all,
>
> If presented with a singular design matrix, lm drops columns to make the
> design matrix non-singular. What algorithm is used to select which (and how
> many) column(s) to drop? Particularly, given a factor, how does lm choose
> levels of the factor to discard?
>
> Thanks for the help.
>
> Best,
> Anirban
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]