thr3ads.net - R help - [R] contr.sum, model summaries and `missing' information [Sep 2010]

If this information is useful, please help other people find it:
Share via:

Christophe Rhodes

2010-Sep-15 19:46 UTC

[R] contr.sum, model summaries and `missing' information

Hi,

I have a dataset with a response variable and multiple factors with more
than two levels, which I have been fitting using lm() or glm().  In
these fits, I am generally more interested in deviations from the global
mean than I am in comparing to a "control" group, so I use contr.sum()
as the factor contrasts.  I think I'm happy to interpret the
coefficients in the model summary as the effect of a particular factor
level on the deviation from the overall mean; I'm not after a highly
rigorous treatment of these coefficients and their standard errors, but
rather using them as suggestive of further things to investigate.

I have found dummy.coef() to extract the coefficients for the original
factor levels (rather than the contrasts), though given the simplicity
of my models I hardly need it, as the `missing' coefficient is just that
which makes the whole set sum to zero.  However, what I would really
like is not just the extra coefficient that isn't displayed by
summary(), but also its standard error (and t-statistic and p-value).

The code below is an illustration of why I (think I) want this:

  foo <- 
    data.frame(educ=factor(c(rep(1,10), rep(2,5), rep(3,15)), 
                      labels=c("BA", "MA",
"PhD")),
               cont=factor(c(1,2,3,1,3,2,3,2,1,2,3,1,3,2,3,2,1,2,3,1,2,1,
                 2,3,1,2,3,1,2,3), labels=c("Europe",
"Asia", "America")),
               resp=c(0.224340, -0.064126, -0.001028, 
                 -0.079769, 0.001746, 0.023284, 0.164906, 0.109045,
                 -0.052815, 0.045120, 0.040059, 0.084997, -0.128838,
                 0.114124, 0.178218, -0.037562, -0.068397, -0.051447,
                 -0.020165, -0.043924, -0.038990, -0.048056, -0.064817, 
                 -0.045902, -0.067005, -0.045253, -0.048954, -0.054677,
                 -0.057463, -0.046117))
  contrasts(foo$educ) <- contr.sum(3)
  contrasts(foo$cont) <- contr.sum(3)
  
  ### The point here is that I have two groups with indistinguishable
  ### distributions with wide spread, and a third with narrow spread and
  ### a noticeably distinct mean.  The above initialization of
  ### data.frame(resp=) is one instance of the three steps below.
  ###
  ### foo$resp[foo$educ=="BA"] <-
rnorm(sum(foo$educ=="BA"))*0.1+0.05
  ### foo$resp[foo$educ=="MA"] <-
rnorm(sum(foo$educ=="MA"))*0.1+0.05
  ### foo$resp[foo$educ=="PhD"] <-
rnorm(sum(foo$educ=="PhD"))*0.01-0.05
  
  m1 <- lm(resp~educ+cont, data=foo)
  summary(m1) # does not display any significant effects
  dummy.coef(m1) # gives the three coefficients for educ but no standard 
                 # error estimates
  
  ### if we set the sum contrasts up a different way, though...
  contrasts(foo$educ) <- matrix(c(-1,1,0, -1,0,1), ncol=2)
  m2 <- lm(resp~educ+cont, data=foo)
  summary(m2) # displays a significant effect from the third ("PhD")
              # group relative to the overall mean

As far as I can tell, models m1 and m2 are semantically equivalent.  Is
there a straightforward way of extracting the standard error and
t-statistic for the `redundant' comparison directly from m1?  I'd rather
not have to fit two linear models if I can fit just one.

Thanks,

Christophe

Christophe Rhodes

2010-Sep-19 10:24 UTC

head link

[R] contr.sum, model summaries and `missing' information

Christophe Rhodes <csr21 at cantab.net> writes:
> I have a dataset with a response variable and multiple factors with more
> than two levels, which I have been fitting using lm() or glm().  In
> these fits, I am generally more interested in deviations from the global
> mean than I am in comparing to a "control" group, so I use
contr.sum()
> as the factor contrasts.  I think I'm happy to interpret the
> coefficients in the model summary as the effect of a particular factor
> level on the deviation from the overall mean; I'm not after a highly
> rigorous treatment of these coefficients and their standard errors, but
> rather using them as suggestive of further things to investigate.
>
> [...]
>
> As far as I can tell, models m1 and m2 are semantically equivalent.  Is
> there a straightforward way of extracting the standard error and
> t-statistic for the `redundant' comparison directly from m1?  I'd
rather
> not have to fit two linear models if I can fit just one.
Did I post this to the wrong list?  I'm still very much interested in
any answer, or a redirection to a more appropriate forum...

Thanks,

Christophe

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Sep 2010 - contr.sum, model summaries and `missing' information

[R] contr.sum, model summaries and `missing' information

[R] contr.sum, model summaries and `missing' information

Apparently Analagous Threads