Thaler, Thorn, LAUSANNE, Applied Mathematics
2010-Jul-12 10:14 UTC
[R] Checking formulae: are lower order terms included
Dear all, I have a very rudimental function which takes a vector of terms and returns a list of all possible models which can be made using the given terms. For example for the set c("1", "x", "y", "x:y") I'd get: ~ 1 ~ x ~ y ~ x:y ~ 1 + x ~ 1 + y ~ 1 + x:y ~ x + y ~ x + x:y ~ y + x:y ~ 1 + x + y ~ 1 + x + x:y ~ 1 + y + x:y ~ x + y + x:y ~ 1 + x + y + x:y Some of the models don't make sense, since the interaction term is included while one of the main effect (or even both) is not. So what I'm looking for is a way to check whether a given formula makes sense, i.e. if there is an interaction, there should be all the main effects included in the model. Ideally, this would work as well for higher order interaction (e.g. x:y:z in the model => x, y, z, x:y, y:z and x:z should be there as well). So does anybody know how I could solve this issue? Maybe there is even a function solving this problem. I had a look at terms.formula and there is this variable-terms matrix. The help says: The entries are 0 if the variable does not occur in the term, 1 if it does occur and should be coded by contrasts, and 2 if it occurs and should be coded via dummy variables for all levels (as when an intercept or lower-order term is missing). This seems to be what I need, but I'm not sure whether I understood the help correctly:> attr(terms(as.formula("a ~ x:y + z:w + z:y + w + z")),"factors")w z x:y z:w y:z a 0 0 0 0 0 x 0 0 2 0 0 y 0 0 2 0 1 z 0 1 0 1 1 w 1 0 0 1 0 so the value of [x, x:y] is 2, because "y" itself is not in the model, while [y, y:z] gives 1 since "z" is in the model. But why does [z, y:z] give 1? y is not in the model so in my understanding the value should be 2. However, would it be sufficient to test any(attr(terms(my.formula), "factors") == 2) to see whether the given model is not valid? Thanks for your input. BR Thorn Thorn Thaler Applied Mathematics Group Nestl? Research Center PO Box 44 CH-1000 Lausanne 26, Switzerland Phone: + 41 21 785 8220 Fax: + 41 21 785 8556
RICHARD M. HEIBERGER
2010-Jul-12 13:40 UTC
[R] Checking formulae: are lower order terms included
Thorn, All the formulas you listed make sense. Remember that the notation x:y is an indexing scheme for dummy variables. It is not a statement about degrees of freedom. The dummy variables specified may be linearly dependent on earlier dummy variables in the system x/y is correctly expanded to x + x:y and in an anova setting has a-1 and a(b-1) degrees of freedom. x*y is correctly expanded to x + y + x:y and in an anova setting has a-1, b-1, and (a-1)(b-1) degrees of freedom. Rich [[alternative HTML version deleted]]