dhinds@sonic.net
2006-Jan-11 01:53 UTC
[R] hypothesis testing for rank-deficient linear models
Take the following example: a <- rnorm(100) b <- trunc(3*runif(100)) g <- factor(trunc(4*runif(100)),labels=c('A','B','C','D')) y <- rnorm(100) + a + (b+1) * (unclass(g)+2) m <- lm(y~a+b*g) summary(m) Here b is discrete but not treated as a factor. I am interested in computing the effect of b within groups defined by the factor g. One way to do that is with the estimable() function from gmodels: ct <- cbind(1,contr.treatment(4)) dimnames(ct)[[2]] <- c('b','b:gB','b:gC','b:gD') estimable(m, ct, conf.int=0.95) Another way is the contrast() function from the Design library: dd <- datadist(a,b,g) options(datadist='dd') o <- ols(y~a+b*g) contrast(o, list(b=1,g=levels(g)), list(b=0,g=levels(g))) Now take the situation where there are empty cells in the b x g table, so that the model is rank deficient, i.e.: m <- lm(y~a+b*g, subset=(b==0 | g!='B')) summary(m) Now there's trouble. The estimable() function and Design library do not seem to handle rank deficient models well. Here is my current best attempt to get what I want: my.coef <- function(n) { ct <- contr.treatment(levels(g), base=n) u <- update(m, contrasts=list(g=ct)) uc <- coef(summary(u))['b',] if (any(is.na(coef(u))) && any(!is.na(alias(u)$Complete))) uc[1:4] <- NA uc } t(sapply(1:4, my.coef)) This seems adequate for my immediate purposes and I can clean it up to be more general. But I'm wondering if there is an easier/better way using existing R facilities? Here, I'm doing more model fitting than necessary, but (for now) speed is not a factor and I was having trouble writing a concise solution that worked on just the original model object. -- David Hinds
dhinds@sonic.net
2006-Jan-13 01:22 UTC
[R] hypothesis testing for rank-deficient linear models
dhinds at sonic.net wrote:> Take the following example:> a <- rnorm(100) > b <- trunc(3*runif(100)) > g <- factor(trunc(4*runif(100)),labels=c('A','B','C','D')) > y <- rnorm(100) + a + (b+1) * (unclass(g)+2)... Here's a cleaned-up function to compute estimable within-group effects for rank deficient models. For the above data, it could be invoked as: > m <- lm(y~a+b*g, subset=(b==0 | g!='B')) > subgroup.effects(m, 'b', g=c('A','B','C','D')) g Estimate StdError t.value p.value 1 A 2.779167 0.4190213 6.63252 4.722978e-09 2 B NA NA NA NA 3 C 4.572431 0.3074402 14.87258 6.226445e-24 4 D 5.920809 0.3502251 16.90572 3.995266e-27 Again, I'm not sure whether this is a good approach, or whether there is an easier way using existing R functions. One problem is figuring exactly which terms are not estimable from the available data. My hack using alias() is not satisfactory and I've already run into cases where it fails. But I'm having trouble coming up with a more general, correct test? -- David Hinds -------------------- subgroup.effects <- function(model, term, ...) { my.coef <- function(n) { contr <- lapply(names(args), function(i) contr.treatment(args[[i]], unclass(gr[n,i]))) names(contr) <- names(args) u <- update(model, formula=model$formula, data=model$data, contrasts=contr) uc <- coef(summary(u))[term,] if (any(is.na(coef(u))) && any(!is.na(alias(u)$Complete))) uc[1:4] <- NA uc } args <- list(...) gr <- expand.grid(...) d <- data.frame(gr, t(sapply(1:nrow(gr), my.coef))) names(d) <- c(names(gr),'Estimate','StdError','t.value','p.value') d }
Maybe Matching Threads
- [PATCH] daemon: improve debugging for "stdout on stderr" flag
- [PATCH] daemon: Add sentinel attribute to commandf and commandrf
- [PATCH 1/2] daemon: NFC Use symbolic names in commandrvf
- [PATCH 00/13] Fix errors found using Coverity static analyzer.
- How to calculate standard error for a vector?