Michael Friendly
2009-Apr-16 16:05 UTC
[R] manipulating data via the factors of a term in a lm()
[Env: R 2.8.1, Win XP] For a package I'm working on, I need two small helper functions to manipulate the data used in an lm or mlm object, given the *name* of a term, which will always be a character string representing a factor ("A") or an interaction of two or more factors ("A:B", "A:B:C", ...). I'd prefer not to have to require() additional packages if possible. The functions I need are [1] termMeans(mod, term): Find the (possibly vector) response means for a model term, specified by a string, "A", "A:B", ... [2] dataIndices(mod, term): Find indices of the observations in the data corresponding to the combination of levels in the term Here is a small example to illustrate what I need: a 3-factor mlm factors <- expand.grid(A=factor(1:3),B=factor(1:2),C=factor(1:2)) n <- nrow(factors) responses <-data.frame(Y1=10+round(10*rnorm(n)),Y2=10+round(10*rnorm(n))) test <- data.frame(factors, responses) mod <- lm(cbind(Y1,Y2) ~ A*B, data=test) [1] Find the means for a model term, specified by a string, "A", "A:B", "A:C", ... Similar to: > # main effect means > aggregate(responses,by=list(test$A), mean) Group.1 Y1 Y2 1 1 12.25 9.50 2 2 10.75 6.25 3 3 11.75 11.00 4 4 10.25 4.25 > > #interaction means > aggregate(responses,by=list(test$A,test$B), mean) Group.1 Group.2 Y1 Y2 1 1 1 16.0 5.5 2 2 1 13.5 8.0 3 3 1 16.0 7.5 4 4 1 10.0 10.5 5 1 2 8.5 13.5 6 2 2 8.0 4.5 7 3 2 7.5 14.5 8 4 2 10.5 -2.0 > Here is what I've tried so far: # find means for a model term in an lm or mlm object termMeans <- function(mod, term){ data <- model.frame(mod) Y <- model.response(data) factors <- data[, sapply(data, is.factor), drop=FALSE] term.factors <- strsplit(term, ":") means <- aggregate(Y, by=factors[,term.factors], mean) } but I'm missing something. Maybe there is an easier path using apply() or by()? > termMeans(mod, "A") Error in .subset(x, j) : invalid subscript type 'list' [2] For some plots, I need to be able to assign symbol characteristics (col, pch, etc) by indexing a vector in the form of arguments col=col[index], pch=pch[index], ... where index is a vector of integers 1:# levels in a model term. Thus, for my example, with the data 'test' > test A B C Y1 Y2 1 1 1 1 22 6 2 2 1 1 22 2 3 3 1 1 16 11 4 4 1 1 2 13 5 1 2 1 0 19 6 2 2 1 -2 -7 7 3 2 1 5 8 8 4 2 1 37 -9 9 1 1 2 10 5 10 2 1 2 5 14 11 3 1 2 16 4 12 4 1 2 18 8 13 1 2 2 17 8 14 2 2 2 18 16 15 3 2 2 10 21 16 4 2 2 -16 5 I need a similar function dataIndex(mod, term): dataIndex(mod,"A") --> same as > test$A [1] 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 dataIndex(mod,"A:B") should deliver [1] 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 # find sequential indices for observations in a data frame # corresponding to the unique combinations of the levels # of a given model term dataIndex <- function(mod, term){ data <- model.frame(mod) Y <- model.response(data) factors <- data[, sapply(data, is.factor), drop=FALSE] term.factors <- strsplit(term, ":") ??? } thanks, -Michael -- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html Toronto, ONT M3J 1P3 CANADA