thr3ads.net - R help - [R] Order of formula terms in model.matrix [Jan 2016]

If this information is useful, please help other people find it:
Share via:

Lars Bishop

2016-Jan-17 13:42 UTC

[R] Order of formula terms in model.matrix

I?d appreciate your help on understanding the following. 

It is not very clear to me from the model.matrix documentation, why simply
changing the order of terms in the formula may change the number of resulting
columns. Please note I?m purposely not including main effects in the model
formula in this case.
 

set.seed(1)
x1 <- rnorm(100)
f1 <- factor(sample(letters[1:3], 100, replace = TRUE))
trt <- sample(c(-1,1), 100, replace = TRUE)
df <- data.frame(x1=x1, f1=f1, trt=trt)

dim(model.matrix( ~ x1:trt + f1:trt, data = df))
[1] 100 4

dim(model.matrix(~ f1:trt + x1:trt, data = df))
[1] 100 5


Thanks,
Lars.

Charles C. Berry

2016-Jan-17 18:34 UTC

head link

[R] Order of formula terms in model.matrix

On Sun, 17 Jan 2016, Lars Bishop wrote:
> I?d appreciate your help on understanding the following. 
>
> It is not very clear to me from the model.matrix documentation, why 
> simply changing the order of terms in the formula may change the number 
> of resulting columns. Please note I?m purposely not including main 
> effects in the model formula in this case.

IIRC, there are some heuristics involved harking back to the White Book. I 
recall there have been discussions of whether and how this could be fixed 
before on this list and or R-devel, but I cannot seem to lay my browser on 
them right now.

> 
>
> set.seed(1)
> x1 <- rnorm(100)
> f1 <- factor(sample(letters[1:3], 100, replace = TRUE))
> trt <- sample(c(-1,1), 100, replace = TRUE)
> df <- data.frame(x1=x1, f1=f1, trt=trt)
>
> dim(model.matrix( ~ x1:trt + f1:trt, data = df))
> [1] 100 4
>
> dim(model.matrix(~ f1:trt + x1:trt, data = df))
> [1] 100 5
>
By `x1:trt' I guess you mean the same thing as `I(x1*trt)'.

If you use the latter form, the issue you raise goes away.

Note that `I(some.expr)' gives you the ability to force the behavior of 
model.matrix to be exactly what you want by suitably crafting `some.expr', 
heuristics notwithstanding.

HTH,

Chuck

Lars Bishop

2016-Jan-17 19:53 UTC

head link

[R] Order of formula terms in model.matrix

This is very helpful, thanks!

Lars.

> On Jan 17, 2016, at 1:34 PM, Charles C. Berry <ccberry at ucsd.edu>
wrote:
> 
> On Sun, 17 Jan 2016, Lars Bishop wrote:
> 
>> I?d appreciate your help on understanding the following. 
> 
>> It is not very clear to me from the model.matrix documentation, why
simply changing the order of terms in the formula may change the number of
resulting columns. Please note I?m purposely not including main effects in the
model formula in this case.
> 
> 
> IIRC, there are some heuristics involved harking back to the White Book. I
recall there have been discussions of whether and how this could be fixed before
on this list and or R-devel, but I cannot seem to lay my browser on them right
now.
> 
> 
>> 
>> set.seed(1)
>> x1 <- rnorm(100)
>> f1 <- factor(sample(letters[1:3], 100, replace = TRUE))
>> trt <- sample(c(-1,1), 100, replace = TRUE)
>> df <- data.frame(x1=x1, f1=f1, trt=trt)
>> 
>> dim(model.matrix( ~ x1:trt + f1:trt, data = df))
>> [1] 100 4
>> 
>> dim(model.matrix(~ f1:trt + x1:trt, data = df))
>> [1] 100 5
>> 
> 
> By `x1:trt' I guess you mean the same thing as `I(x1*trt)'.
> 
> If you use the latter form, the issue you raise goes away.
> 
> Note that `I(some.expr)' gives you the ability to force the behavior of
model.matrix to be exactly what you want by suitably crafting `some.expr',
heuristics notwithstanding.
> 
> HTH,
> 
> Chuck
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

peter dalgaard

2016-Jan-17 23:39 UTC

head link

[R] Order of formula terms in model.matrix

> On 17 Jan 2016, at 19:34 , Charles C. Berry <ccberry at ucsd.edu>
wrote:
> 
> 
> IIRC, there are some heuristics involved harking back to the White Book. I
recall there have been discussions of whether and how this could be fixed before
on this list and or R-devel, but I cannot seem to lay my browser on them right
now.
> 
And IIRC: yup, and one of the issues is that 
(a) some rules work left-to-right
(b) the logic is oblivious to the factor/vector distinction

For factors a,b,c, what happens for ~a:b + b:c is that a:b gets the full term
expansion since the marginals a and b are not in the model but since b is part
of the fully expanded a:b,  b:c gets the reduced form expansion as it would in
~b + b:c (the c-within-b thing). Swapping the terms gives you a different
result, but at least it is the same model in the sense that the columns span the
same subspace.

If a and b are vectors, and c is a factor, you get the same logic: expand a:b
fully, then treat b:c as in b + b:c. Unfortunately, a:b is just the product of a
and b, whether or not it is fully expanded, so it doesn't really make sense
to proceed as if b is contained in a preceding term. So the net result is that
you end up with one column less than you probably wanted.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

R help - Jan 2016 - Order of formula terms in model.matrix

[R] Order of formula terms in model.matrix

[R] Order of formula terms in model.matrix

[R] Order of formula terms in model.matrix

[R] Order of formula terms in model.matrix