To R-help list: I would like to use lm() and lmRob() to estimate the parameters of a fixed-effects model that includes nested factors with unequal numbers of levels, in some cases without also including the nesting factor in the model. When I specify options(contrasts=c("contr.sum","contr.poly")), the model matrix generated by model.matrix() can be incorrect. I realize that I can create the desired matrix myself, and use lm.fit(), or lmRob.fit.compute() but there are two problems with this: First, we are warned against using those functions directly, and, second, whereas the models I want to apply are the same for about 100 different data frames, each with about 500 observations, a different model matrix would have to be constructed for each data frame, increasing the chance of error. ------------------------------------------------------------------- Here is a toy example: Dataframe diff.df: G D T2 1 g1 d1 -60 2 g1 d2 -50 3 g1 d3 -40 4 g2 d1 20 5 g2 d2 40 6 g2 d3 60 7 g2 d4 80 (G, D factors; T2 numeric) After options(contrasts=c("contr.sum","contr.poly")) neither of the following produces the desired model matrix: model.matrix(T2 ~ G + D%in%G, diff.df) model.matrix(T2 ~ D%in%G, diff.df) For example, omitting row numbers, the second command produces: Icpt Dd1:Gg1 Dd2:Gg1 Dd3:Gg1 Dd4:Gg1 Dd1:Gg2 Dd2:Gg2 Dd3:Gg2 Dd4:Gg2 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 whereas the correct matrix is: Icpt Gg1:D1 Gg1:D2 Gg2:D1 Gg2:D2 Gg2:D3 1 1 0 0 0 0 1 0 1 0 0 0 1 -1 -1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 -1 -1 -1 ------------------------------------------------------------------- Other behavior of model.matrix() with nested factors in tiny examples: When the nested factors had unequal numbers of levels and the nesting factor was not included in the model, as above, the matrix generated for contr.treatment was also wrong. When the nested factors had unequal numbers of levels and the nesting factor was included in the model, the matrix for contr.sum was wrong, while the matrix for contr.treatment was wrong only in having an extra column of zeros. When the nested factors had equal numbers of levels and the nesting factor was not included in the model, the correct matrix was generated for neither contr.sum nor contr.treatment. When the nested factors had equal numbers of levels and the nesting factor was included in the model, the correct matrix was generated for both contr.sum and contr.treatment. ------------------------------------------------------------------- My questions: (1) If there is a way to use lm() and lmRob() in such cases so that they generate the correct contrast matrices (and hence the desired parameter estimates), what is it? (2) If there is no way to do this, is the best alternative for the user to create the desired model matrices "by hand" and provide them as arguments to lim.fit() and lmRob.fit.compute()? (3) If one uses lm.fit() and lmRob.fit.compute() directly in this way, then, given that one is warned against doing so, what are the dangers? Many thanks. Saul Sternberg Psychology University of Pennsylvania ---------------------------------------------------------------- R%%>sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.12.0