dhinds@sonic.net
2006-Jan-11  01:53 UTC
[R] hypothesis testing for rank-deficient linear models
Take the following example:
    a <- rnorm(100)
    b <- trunc(3*runif(100))
    g <-
factor(trunc(4*runif(100)),labels=c('A','B','C','D'))
    y <- rnorm(100) + a + (b+1) * (unclass(g)+2)
    m <- lm(y~a+b*g)
    summary(m)
Here b is discrete but not treated as a factor.  I am interested in
computing the effect of b within groups defined by the factor g.  One
way to do that is with the estimable() function from gmodels:
    ct <- cbind(1,contr.treatment(4))
    dimnames(ct)[[2]] <-
c('b','b:gB','b:gC','b:gD')
    estimable(m, ct, conf.int=0.95)
Another way is the contrast() function from the Design library:
    dd <- datadist(a,b,g)
    options(datadist='dd')
    o <- ols(y~a+b*g)
    contrast(o, list(b=1,g=levels(g)), list(b=0,g=levels(g)))
Now take the situation where there are empty cells in the b x g table,
so that the model is rank deficient, i.e.:
    m <- lm(y~a+b*g, subset=(b==0 | g!='B'))
    summary(m)
Now there's trouble.  The estimable() function and Design library do
not seem to handle rank deficient models well.  Here is my current
best attempt to get what I want:
    my.coef <- function(n)
    {
        ct <- contr.treatment(levels(g), base=n)
        u <- update(m, contrasts=list(g=ct))
        uc <- coef(summary(u))['b',]
        if (any(is.na(coef(u))) && any(!is.na(alias(u)$Complete)))
            uc[1:4] <- NA
        uc
    }
    t(sapply(1:4, my.coef))
This seems adequate for my immediate purposes and I can clean it up to
be more general.  But I'm wondering if there is an easier/better way
using existing R facilities?  Here, I'm doing more model fitting than
necessary, but (for now) speed is not a factor and I was having
trouble writing a concise solution that worked on just the original
model object.
-- David Hinds
dhinds@sonic.net
2006-Jan-13  01:22 UTC
[R] hypothesis testing for rank-deficient linear models
dhinds at sonic.net wrote:> Take the following example:> a <- rnorm(100) > b <- trunc(3*runif(100)) > g <- factor(trunc(4*runif(100)),labels=c('A','B','C','D')) > y <- rnorm(100) + a + (b+1) * (unclass(g)+2)... Here's a cleaned-up function to compute estimable within-group effects for rank deficient models. For the above data, it could be invoked as: > m <- lm(y~a+b*g, subset=(b==0 | g!='B')) > subgroup.effects(m, 'b', g=c('A','B','C','D')) g Estimate StdError t.value p.value 1 A 2.779167 0.4190213 6.63252 4.722978e-09 2 B NA NA NA NA 3 C 4.572431 0.3074402 14.87258 6.226445e-24 4 D 5.920809 0.3502251 16.90572 3.995266e-27 Again, I'm not sure whether this is a good approach, or whether there is an easier way using existing R functions. One problem is figuring exactly which terms are not estimable from the available data. My hack using alias() is not satisfactory and I've already run into cases where it fails. But I'm having trouble coming up with a more general, correct test? -- David Hinds -------------------- subgroup.effects <- function(model, term, ...) { my.coef <- function(n) { contr <- lapply(names(args), function(i) contr.treatment(args[[i]], unclass(gr[n,i]))) names(contr) <- names(args) u <- update(model, formula=model$formula, data=model$data, contrasts=contr) uc <- coef(summary(u))[term,] if (any(is.na(coef(u))) && any(!is.na(alias(u)$Complete))) uc[1:4] <- NA uc } args <- list(...) gr <- expand.grid(...) d <- data.frame(gr, t(sapply(1:nrow(gr), my.coef))) names(d) <- c(names(gr),'Estimate','StdError','t.value','p.value') d }
Reasonably Related Threads
- [PATCH] daemon: improve debugging for "stdout on stderr" flag
- [PATCH] daemon: Add sentinel attribute to commandf and commandrf
- [PATCH 1/2] daemon: NFC Use symbolic names in commandrvf
- [PATCH 00/13] Fix errors found using Coverity static analyzer.
- How to calculate standard error for a vector?