Hi, As far as I am aware, the model.matrix function does not return perfect metadata on what each column of the model matrix "means". The columns are named (e.g. age:genderM), but encoding the metadata as strings can result in ambiguity. For example, the dummy variables created when the factors var0 = 0 and var = 00 both are named var00. Additionally, if a level of a factor variable contains a colon, this could be confused for an interaction. While a human can generally work out the meaning of each column somewhat manually, I am interested in achieving this programmatically. My solution is to edit the modelmatrix function in /src/library/stats/src/model.c to additionally return the following: intrcept factors contr1 contr2 count With the availability of these in R it is possible to determine the precise meaning of each column without the error-prone parsing of strings. I have attached my edit: see lines 753-764. I am seeking advice on this approach. Am I missing a simpler way of achieving this (which perhaps avoids rebuilding R)? Since model.matrix is used in so many modeling functions this would be very helpful for the programmatic interpretation of model output. A search on the Internet suggests there are other R users who would welcome such functionality. Many thanks in advance, Pat O'Reilly -------------- next part -------------- A non-text attachment was scrubbed... Name: model.c Type: text/x-csrc Size: 56318 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20141017/5ba1e6aa/attachment.bin>
Patrick O'Reilly <patrick.a.oreilly <at> gmail.com> writes:> > Hi, > > As far as I am aware, the model.matrix function does not return > perfect metadata on what each column of the model matrix "means". > > The columns are named (e.g. age:genderM), but encoding the metadata as > strings can result in ambiguity. For example, the dummy variables > created when the factors var0 = 0 and var = 00 both are named var00. > Additionally, if a level of a factor variable contains a colon, this > could be confused for an interaction. > > While a human can generally work out the meaning of each column > somewhat manually, I am interested in achieving this programmatically. >Why don't you just retain the terms.object? i.e my.terms <- terms( my.formula, data=my.data.frame ) my.model.matrix <- model.matrix( my.terms, data= my.data.frame ) attributes(my.terms) See ?terms, ?terms.object, ?model.frame (which contains a terms.object) HTH, Chuck