Thaler, Thorn, LAUSANNE, Applied Mathematics
2010-Jul-12 10:14 UTC
[R] Checking formulae: are lower order terms included
Dear all,
I have a very rudimental function which takes a vector of terms and returns a
list of all possible models which can be made using the given terms. For example
for the set c("1", "x", "y", "x:y")
I'd get:
~ 1
~ x
~ y
~ x:y
~ 1 + x
~ 1 + y
~ 1 + x:y
~ x + y
~ x + x:y
~ y + x:y
~ 1 + x + y
~ 1 + x + x:y
~ 1 + y + x:y
~ x + y + x:y
~ 1 + x + y + x:y
Some of the models don't make sense, since the interaction term is included
while one of the main effect (or even both) is not. So what I'm looking for
is a way to check whether a given formula makes sense, i.e. if there is an
interaction, there should be all the main effects included in the model.
Ideally, this would work as well for higher order interaction (e.g. x:y:z in the
model => x, y, z, x:y, y:z and x:z should be there as well).
So does anybody know how I could solve this issue? Maybe there is even a
function solving this problem. I had a look at terms.formula and there is this
variable-terms matrix. The help says:
The entries are 0 if the variable does not occur in the term, 1 if it does occur
and should be coded by contrasts, and 2 if it occurs and should be coded via
dummy variables for all levels (as when an intercept or lower-order term is
missing).
This seems to be what I need, but I'm not sure whether I understood the help
correctly:
> attr(terms(as.formula("a ~ x:y + z:w + z:y + w +
z")),"factors")
w z x:y z:w y:z
a 0 0 0 0 0
x 0 0 2 0 0
y 0 0 2 0 1
z 0 1 0 1 1
w 1 0 0 1 0
so the value of [x, x:y] is 2, because "y" itself is not in the model,
while [y, y:z] gives 1 since "z" is in the model. But why does [z,
y:z] give 1? y is not in the model so in my understanding the value should be 2.
However, would it be sufficient to test
any(attr(terms(my.formula), "factors") == 2)
to see whether the given model is not valid?
Thanks for your input.
BR Thorn
Thorn Thaler
Applied Mathematics Group
Nestl? Research Center
PO Box 44
CH-1000 Lausanne 26, Switzerland
Phone: + 41 21 785 8220
Fax: + 41 21 785 8556
RICHARD M. HEIBERGER
2010-Jul-12 13:40 UTC
[R] Checking formulae: are lower order terms included
Thorn, All the formulas you listed make sense. Remember that the notation x:y is an indexing scheme for dummy variables. It is not a statement about degrees of freedom. The dummy variables specified may be linearly dependent on earlier dummy variables in the system x/y is correctly expanded to x + x:y and in an anova setting has a-1 and a(b-1) degrees of freedom. x*y is correctly expanded to x + y + x:y and in an anova setting has a-1, b-1, and (a-1)(b-1) degrees of freedom. Rich [[alternative HTML version deleted]]