Justin Thong
2016-Aug-22 01:44 UTC
[R] Intercept in Model Matrix (Parameters not what I expected)
I have something which has been bugging me and I have even asked this on cross validated but I did not get a response. Let's construct a simple example. Below is the code. A<-gl(2,4) #factor of 2 levels B<-gl(4,2) #factor of 4 levels df<-data.frame(y,A,B) As you can see, B is nested within A. The peculiar result I am interested in the output of the model matrix when I fit for a nested model . *How does R decide what is included inside the intercept?* Since we are using dummy coding, the coefficients of the model is interpreted as the difference between a particular level and the reference level/the intercept for an single factor model. I understand for model ~A, A1 becomes the intercept and that for model ~A+B, A1 and B1 (both) become the intercept. *I do not get why when we use a nested model, A1:B2 appears as a column inside the model matrix. Why isn't the first parameter of the interaction subspace A1:B1 or A2:B1? *I think I am missing the concept. I think the intercept is A1. *Hence, Why do we not compare the levels of A1:B1 and A1(intercept) or A2:B1 and A1(intercept)?* #nested model> mod<-aov(y~A+A:B) > model.matrix(mod)(Intercept) A2 A1:B2 A2:B2 A1:B3 A2:B3 A1:B4 A2:B4 1 1 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 1 0 1 0 0 0 0 0 4 1 0 1 0 0 0 0 0 5 1 1 0 0 0 1 0 0 6 1 1 0 0 0 1 0 0 7 1 1 0 0 0 0 0 1 8 1 1 0 0 0 0 0 1 -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419(UK) or +60125056192(Malaysia)* [[alternative HTML version deleted]]
Bert Gunter
2016-Aug-22 15:14 UTC
[R] Intercept in Model Matrix (Parameters not what I expected)
Justin: As you have not yet received any reply... Your question is mostly about statistics (linear models) and, as such, is typically off topic here. Briefly, you do seem confused about contrasts in linear models, but I am confused about your confusion, and so may be of little help. However.... Note that in your little 8 run example design, the response lives in 8 dims, and so your model matrix can have at most 8 independent columns. ~(A+B) has 4, which, using contr.treatment treatments could be Intercept, A2,B2, B3 (since (B3+B4) - (B2+B1) is confounded with (A2 - A1), where these are "dummy" encodings of 0 and 1). Adding all pairwise products of the non-intercept columns would not give you any more, as all are all 0's. I do not know the algorithm that lm/aov uses to choose which of the contrasts to estimate, but it makes no difference: there can only be 3 beyond the intercept, and all others are linear combinations of these. If this is not useful to you, either: 1. Hope for a response here that is more helpful; 2. Consult a local statistical expert; 3. Read up on linear models (there are multiple books and internet sources); 4. Post on stats.stackexchange.com again. Cheers, Bert ## Note to others. If I have erred in any of the above, PLEASE CORRECT. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Aug 21, 2016 at 6:44 PM, Justin Thong <justinthong93 at gmail.com> wrote:> I have something which has been bugging me and I have even asked this on > cross validated but I did not get a response. Let's construct a simple > example. Below is the code. > > A<-gl(2,4) #factor of 2 levels > B<-gl(4,2) #factor of 4 levels > df<-data.frame(y,A,B) > > As you can see, B is nested within A. > The peculiar result I am interested in the output of the model matrix when > I fit for a nested model . *How does R decide what is included inside the > intercept?* Since we are using dummy coding, the coefficients of the model > is interpreted as the difference between a particular level and the > reference level/the intercept for an single factor model. I understand for > model ~A, A1 becomes the intercept and that for model ~A+B, A1 and B1 > (both) become the intercept. > > *I do not get why when we use a nested model, A1:B2 appears as a column > inside the model matrix. Why isn't the first parameter of the interaction > subspace A1:B1 or A2:B1? *I think I am missing the concept. I think the > intercept is A1. *Hence, Why do we not compare the levels of A1:B1 and > A1(intercept) or A2:B1 and A1(intercept)?* > > #nested model >> mod<-aov(y~A+A:B) >> model.matrix(mod) > (Intercept) A2 A1:B2 A2:B2 A1:B3 A2:B3 A1:B4 A2:B4 > 1 1 0 0 0 0 0 0 0 > 2 1 0 0 0 0 0 0 0 > 3 1 0 1 0 0 0 0 0 > 4 1 0 1 0 0 0 0 0 > 5 1 1 0 0 0 1 0 0 > 6 1 1 0 0 0 1 0 0 > 7 1 1 0 0 0 0 0 1 > 8 1 1 0 0 0 0 0 1 > > > -- > Yours sincerely, > Justin > > *I check my email at 9AM and 4PM everyday* > *If you have an EMERGENCY, contact me at +447938674419(UK) or > +60125056192(Malaysia)* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.