thr3ads.net - R help - [R] How to properly build model matrices [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Yang Zhang

2012-Feb-09 21:39 UTC

[R] How to properly build model matrices

I always bump into a few (very minor) problems when building model
matrices with e.g.:

train = model.matrix(label~., read.csv('train.csv'))
target = model.matrix(label~., read.csv('target.csv'))

(1) The two may have different factor levels, yielding different
matrices.  I usually first rbind the data frames together to "meld"
the factors, and then split them apart and matrixify them.

(2) The target set that I'm predicting on typically doesn't have
labels.  I usually manually append dummy labels to the target data
frame.

(3) I almost always remove the Intercept from the model matrices,
since it seems to always be redundant (I usually use caret).

None of these is a big deal at all, but I'm just curious if I'm
missing something simple in how I'm doing things.  Thanks.

-- 
Yang Zhang
http://yz.mit.edu/

Uwe Ligges

2012-Feb-11 18:25 UTC

head link

[R] How to properly build model matrices

On 09.02.2012 22:39, Yang Zhang wrote:> I always bump into a few (very minor) problems when building model
> matrices with e.g.:
>
> train = model.matrix(label~., read.csv('train.csv'))
> target = model.matrix(label~., read.csv('target.csv'))
>
> (1) The two may have different factor levels, yielding different
> matrices.  I usually first rbind the data frames together to
"meld"
> the factors, and then split them apart and matrixify them.

You can preprocess the data and explicitly define the levels for factor 
variables in your data.frames.

> (2) The target set that I'm predicting on typically doesn't have
> labels.  I usually manually append dummy labels to the target data
> frame.
R cannot know labels if you do not provide any.
> (3) I almost always remove the Intercept from the model matrices,
> since it seems to always be redundant (I usually use caret).
Then change your model formula to: "label ~ . - 1". But note the 
interpretation changes and it is *not* redundant in general.

Uwe Ligges

> None of these is a big deal at all, but I'm just curious if I'm
> missing something simple in how I'm doing things.  Thanks.
>

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Feb 2012 - How to properly build model matrices

[R] How to properly build model matrices

[R] How to properly build model matrices

Seemingly Similar Threads