Ross Boylan
2013-Jul-27 02:23 UTC
[R] matching columns of model matrix to those in original data.frame
What is a reliable way to go from a column of a model matrix back to the column (or columns) of the original data source used to make the model matrix? I can come up with a method that seems to work, but I don't see guarantees in the documentation that it will. In particular, does the order of the term.labels match the order of columns for factors in a terms object? The documentation says the model.matrix assign attribute uses the ordering of terms.labels. If anyone can tell me if this approach is reliable, or of one that is, I would appreciate it. Ross Boylan Proposed function and a little example follow. # return a vector v such that data[,v[i]] contributed to mm[,i] # mm = model matrix produced by # form = formula # data = data reverse.map <- function(mm, form, data){ tt <- terms(form, data=data) ttf <- attr(tt, "factors") mmi <- attr(mm, "assign") # this depends on assign using same order as columns of factors # entries in mmi that are 0 (the intercept) are silently dropped ttf2 <- ttf[,mmi] # take the first row that contributes r <- apply(ttf2, 2, function(is) rownames(ttf)[is > 0][1]) match(r, colnames(data)) }> ### experiment with mapping model matrix to original columns > df <- sp2b[sample(nrow(sp2b), 8), c("pEthnic", "ethnic_sg", "rac_gay")] > form <- ~pEthnic+ethnic_sg*rac_gay > mm <- model.matrix(form, df) > tt <- terms(form, data=df) > ttf <- attr(tt, "factors") > mmi <- attr(mm, "assign") > dfpEthnic ethnic_sg rac_gay 1366 Afr Amer Afr Amer 3.25 3052 Afr Amer Afr Amer 1.75 3012 Latino Afr Amer 2.00 369 Afr Amer Asian/PI 2.00 529 White Asian/PI 2.00 194 Asian/PI Asian/PI 3.25 126 White Asian/PI 2.25 2147 Latino Latino 2.75> colnames(mm)[1] "(Intercept)" "pEthnicAsian/PI" [3] "pEthnicLatino" "pEthnicOther" [5] "pEthnicWhite" "ethnic_sgAsian/PI" [7] "ethnic_sgLatino" "rac_gay" [9] "ethnic_sgAsian/PI:rac_gay" "ethnic_sgLatino:rac_gay"> ttf # term "factors"pEthnic ethnic_sg rac_gay ethnic_sg:rac_gay pEthnic 1 0 0 0 ethnic_sg 0 1 0 1 rac_gay 0 0 1 1> mmi #model matrix "assign"[1] 0 1 1 1 1 2 2 3 4 4> reverse.map(mm, form, df)[1] 1 1 1 1 2 2 3 2 2