Justin Thong
2016-Jul-26 15:05 UTC
[R] Linear Dependance of Model Matrix and How Fitted/ Sums of Squares Follow
Below is the covariates for a model ~x1+x2+x3+x4+x5+x6. I noticed that when fitting this model that the coefficient x6 is unestimable.*Is this merely a case that adding more columns to my model matrix will eventually lead to linear dependance so the more terms I have in the model formulae the more likely the model matrix becomes linearly dependant?* I found that for model formulae ~x1+x2+x3+x4+x5, all the coefficients are estimable so I guess this example supports my statement. But that being said, since not all coefficients are estimated then how does R compute the fitted values and anova table. *Does it just ignore the existence of x6 and consider the model to be ~x1+x2+x3+x4+x5? Or is there something deeper that I do not understand.* Because the sums of squares and fitted seem to be the same for model ~x1+x2+x3+x4+x5 as it is for ~x1+x2+x3+x4+x5+x6 However, this is not so clear cut for model with factors. Because factors are only represented by a parameter for each level in the model matrix. Consider factor F with 2 levels and G with 3 levels. The problem is that R has a way of excluding certain rows from the anova table. Again, it can be seen that it excludes the rows associated with the parameters which are not estimable, but this is not absolutely clear in my mind.Look at small example below for model ~F*G for two factors. As you can see, the interaction parameters are not estimable ie F2:G2 and F2:G3. Now from what I was told, F1 and G1 is contained within the (Intercept) parameter so F1:G1, F1:G2, F2:G1 are not considered. You can see from the anova table that the interaction row F:G is ignored. My main problem is why is it ignored. *Does that mean that if all the parameters (excluding the ones asasociated with intercept) that is associated with a particular term is unestimable then the row of that term in the anova table is ignored? How many unestimable parameters must there be for the row of a term to be ignored? *Because If the answer to the second question is to calculate fitted values and sums of squares by ignoring unestimable parameters, then it means that the rows of sums of squares disappear for a different reason other than unestimability. Sorry for the generally wordy question. I may not be thinking of it in the correct manner and I would appreciate if anyone has an answer and perhaps even some generalisations towards the use of QR decomposition. (There is more code below this data) x1 x2 x3 x4 x5 x6 1 12 0 0 0 0 0 2 12 0 0 0 0 0 3 12 0 0 0 0 0 4 12 0 0 0 0 0 5 0 12 0 0 0 0 6 0 12 0 0 0 0 7 0 12 0 0 0 0 8 0 12 0 0 0 0 9 0 0 12 0 0 0 10 0 0 12 0 0 0 11 0 0 12 0 0 0 12 0 0 12 0 0 0 13 0 0 0 12 0 0 14 0 0 0 12 0 0 15 0 0 0 12 0 0 16 0 0 0 12 0 0 17 0 0 0 0 12 0 18 0 0 0 0 12 0 19 0 0 0 0 12 0 20 0 0 0 0 12 0 21 0 0 0 0 0 12 22 0 0 0 0 0 12 23 0 0 0 0 0 12 24 0 0 0 0 0 12 25 6 6 0 0 0 0 26 6 6 0 0 0 0 27 6 6 0 0 0 0 28 6 6 0 0 0 0 29 6 0 6 0 0 0 30 6 0 6 0 0 0 31 6 0 6 0 0 0 32 6 0 6 0 0 0 33 6 0 0 6 0 0 34 6 0 0 6 0 0 35 6 0 0 6 0 0 36 6 0 0 6 0 0 37 6 0 0 0 6 0 38 6 0 0 0 6 0 39 6 0 0 0 6 0 40 6 0 0 0 6 0 41 6 0 0 0 0 6 42 6 0 0 0 0 6 43 6 0 0 0 0 6 44 6 0 0 0 0 6 45 0 6 6 0 0 0 46 0 6 6 0 0 0 47 0 6 6 0 0 0 48 0 6 6 0 0 0 49 0 6 0 6 0 0 50 0 6 0 6 0 0 51 0 6 0 6 0 0 52 0 6 0 6 0 0 53 0 6 0 0 6 0 54 0 6 0 0 6 0 55 0 6 0 0 6 0 56 0 6 0 0 6 0 57 0 6 0 0 0 6 58 0 6 0 0 0 6 59 0 6 0 0 0 6 60 0 6 0 0 0 6 61 0 0 6 6 0 0 62 0 0 6 6 0 0 63 0 0 6 6 0 0 64 0 0 6 6 0 0 65 0 0 6 0 6 0 66 0 0 6 0 6 0 67 0 0 6 0 6 0 68 0 0 6 0 6 0 69 0 0 6 0 0 6 70 0 0 6 0 0 6 71 0 0 6 0 0 6 72 0 0 6 0 0 6 73 0 0 0 6 6 0 74 0 0 0 6 6 0 75 0 0 0 6 6 0 76 0 0 0 6 6 0 77 0 0 0 6 0 6 78 0 0 0 6 0 6 79 0 0 0 6 0 6 80 0 0 0 6 0 6 81 0 0 0 0 6 6 82 0 0 0 0 6 6 83 0 0 0 0 6 6 84 0 0 0 0 6 6 85 4 4 4 0 0 0 86 4 4 4 0 0 0 87 4 4 4 0 0 0 88 4 4 4 0 0 0 89 4 4 0 4 0 0 90 4 4 0 4 0 0 91 4 4 0 4 0 0 92 4 4 0 4 0 0 93 4 4 0 0 4 0 94 4 4 0 0 4 0 95 4 4 0 0 4 0 96 4 4 0 0 4 0 97 4 4 0 0 0 4 98 4 4 0 0 0 4 99 4 4 0 0 0 4 100 4 4 0 0 0 4 101 4 0 4 4 0 0 102 4 0 4 4 0 0 103 4 0 4 4 0 0 104 4 0 4 4 0 0 105 4 0 4 0 4 0 106 4 0 4 0 4 0 107 4 0 4 0 4 0 108 4 0 4 0 4 0 109 4 0 4 0 0 4 110 4 0 4 0 0 4 111 4 0 4 0 0 4 112 4 0 4 0 0 4 113 4 0 0 4 4 0 114 4 0 0 4 4 0 115 4 0 0 4 4 0 116 4 0 0 4 4 0 117 4 0 0 4 0 4 118 4 0 0 4 0 4 119 4 0 0 4 0 4 120 4 0 0 4 0 4 121 4 0 0 0 4 4 122 4 0 0 0 4 4 123 4 0 0 0 4 4 124 4 0 0 0 4 4 125 0 4 4 4 0 0 126 0 4 4 4 0 0 127 0 4 4 4 0 0 128 0 4 4 4 0 0 129 0 4 4 0 4 0 130 0 4 4 0 4 0 131 0 4 4 0 4 0 132 0 4 4 0 4 0 133 0 4 4 0 0 4 134 0 4 4 0 0 4 135 0 4 4 0 0 4 136 0 4 4 0 0 4 137 0 4 0 4 4 0 138 0 4 0 4 4 0 139 0 4 0 4 4 0 140 0 4 0 4 4 0 141 0 4 0 4 0 4 142 0 4 0 4 0 4 143 0 4 0 4 0 4 144 0 4 0 4 0 4 145 0 4 0 0 4 4 146 0 4 0 0 4 4 147 0 4 0 0 4 4 148 0 4 0 0 4 4 149 0 0 4 4 4 0 150 0 0 4 4 4 0 151 0 0 4 4 4 0 152 0 0 4 4 4 0 153 0 0 4 4 0 4 154 0 0 4 4 0 4 155 0 0 4 4 0 4 156 0 0 4 4 0 4 157 0 0 4 0 4 4 158 0 0 4 0 4 4 159 0 0 4 0 4 4 160 0 0 4 0 4 4 161 0 0 0 4 4 4 162 0 0 0 4 4 4 163 0 0 0 4 4 4 164 0 0 0 4 4 4 *F<- factor(c(rep(1,3),rep(2,3)))* *G<- factor(c(rep(1,2),rep(2,2),rep(3,2)))* *H<-F<- factor(c(rep(1,3),rep(2,3)))* *y<-rnorm(6,2)* *test3<-aov(y~F*G)* *model.matrix(test3)* (Intercept) F2 G2 G3 F2:G2 F2:G3 1 1 0 0 0 0 0 2 1 0 0 0 0 0 3 1 0 1 0 0 0 4 1 1 1 0 1 0 5 1 1 0 1 0 1 6 1 1 0 1 0 1 attr(,"assign") [1] 0 1 2 2 3 3 attr(,"contrasts") attr(,"contrasts")$F [1] "contr.treatment" attr(,"contrasts")$G [1] "contr.treatment" *alias(test3)* Model : y ~ F * G Complete : (Intercept) F2 G2 G3 F2:G2 0 1 0 -1 F2:G3 0 0 0 1 *summary(test3)* Df Sum Sq Mean Sq F value Pr(>F) F 1 0.0479 0.0479 0.059 0.830 G 2 0.9762 0.4881 0.604 0.624 Residuals 2 1.6175 0.8087 -- Yours sincerely, Justin *I check my email at 9AM and 4PM everyday* *If you have an EMERGENCY, contact me at +447938674419 <%2B447938674419>(UK) or +60125056192 <%2B60125056192>(Malaysia)* [[alternative HTML version deleted]]