Christophe Rhodes
2010-Sep-15 19:46 UTC
[R] contr.sum, model summaries and `missing' information
Hi, I have a dataset with a response variable and multiple factors with more than two levels, which I have been fitting using lm() or glm(). In these fits, I am generally more interested in deviations from the global mean than I am in comparing to a "control" group, so I use contr.sum() as the factor contrasts. I think I'm happy to interpret the coefficients in the model summary as the effect of a particular factor level on the deviation from the overall mean; I'm not after a highly rigorous treatment of these coefficients and their standard errors, but rather using them as suggestive of further things to investigate. I have found dummy.coef() to extract the coefficients for the original factor levels (rather than the contrasts), though given the simplicity of my models I hardly need it, as the `missing' coefficient is just that which makes the whole set sum to zero. However, what I would really like is not just the extra coefficient that isn't displayed by summary(), but also its standard error (and t-statistic and p-value). The code below is an illustration of why I (think I) want this: foo <- data.frame(educ=factor(c(rep(1,10), rep(2,5), rep(3,15)), labels=c("BA", "MA", "PhD")), cont=factor(c(1,2,3,1,3,2,3,2,1,2,3,1,3,2,3,2,1,2,3,1,2,1, 2,3,1,2,3,1,2,3), labels=c("Europe", "Asia", "America")), resp=c(0.224340, -0.064126, -0.001028, -0.079769, 0.001746, 0.023284, 0.164906, 0.109045, -0.052815, 0.045120, 0.040059, 0.084997, -0.128838, 0.114124, 0.178218, -0.037562, -0.068397, -0.051447, -0.020165, -0.043924, -0.038990, -0.048056, -0.064817, -0.045902, -0.067005, -0.045253, -0.048954, -0.054677, -0.057463, -0.046117)) contrasts(foo$educ) <- contr.sum(3) contrasts(foo$cont) <- contr.sum(3) ### The point here is that I have two groups with indistinguishable ### distributions with wide spread, and a third with narrow spread and ### a noticeably distinct mean. The above initialization of ### data.frame(resp=) is one instance of the three steps below. ### ### foo$resp[foo$educ=="BA"] <- rnorm(sum(foo$educ=="BA"))*0.1+0.05 ### foo$resp[foo$educ=="MA"] <- rnorm(sum(foo$educ=="MA"))*0.1+0.05 ### foo$resp[foo$educ=="PhD"] <- rnorm(sum(foo$educ=="PhD"))*0.01-0.05 m1 <- lm(resp~educ+cont, data=foo) summary(m1) # does not display any significant effects dummy.coef(m1) # gives the three coefficients for educ but no standard # error estimates ### if we set the sum contrasts up a different way, though... contrasts(foo$educ) <- matrix(c(-1,1,0, -1,0,1), ncol=2) m2 <- lm(resp~educ+cont, data=foo) summary(m2) # displays a significant effect from the third ("PhD") # group relative to the overall mean As far as I can tell, models m1 and m2 are semantically equivalent. Is there a straightforward way of extracting the standard error and t-statistic for the `redundant' comparison directly from m1? I'd rather not have to fit two linear models if I can fit just one. Thanks, Christophe
Christophe Rhodes
2010-Sep-19 10:24 UTC
[R] contr.sum, model summaries and `missing' information
Christophe Rhodes <csr21 at cantab.net> writes:> I have a dataset with a response variable and multiple factors with more > than two levels, which I have been fitting using lm() or glm(). In > these fits, I am generally more interested in deviations from the global > mean than I am in comparing to a "control" group, so I use contr.sum() > as the factor contrasts. I think I'm happy to interpret the > coefficients in the model summary as the effect of a particular factor > level on the deviation from the overall mean; I'm not after a highly > rigorous treatment of these coefficients and their standard errors, but > rather using them as suggestive of further things to investigate. > > [...] > > As far as I can tell, models m1 and m2 are semantically equivalent. Is > there a straightforward way of extracting the standard error and > t-statistic for the `redundant' comparison directly from m1? I'd rather > not have to fit two linear models if I can fit just one.Did I post this to the wrong list? I'm still very much interested in any answer, or a redirection to a more appropriate forum... Thanks, Christophe