thr3ads.net - R help - [R] Effect size of comparison of two levels of a factor in multiple linear regression [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Christoph Mathys

2008-Feb-03 15:09 UTC

[R] Effect size of comparison of two levels of a factor in multiple linear regression

Dear R users,

I have a linear model of the kind

outcome ~ treatment + covariate

where 'treatment' is a factor with three levels ("0",
"1", and "2"),
and the covariate is continuous. Treatments "1" and "2" both
have
regression coefficients significantly different from 0 when using
treatment contrasts with treatment "0" as the baseline. I would now
like
to determine effect sizes (akin to Cohen's d in a two-sample comparison)
for the comparison to baseline of treatments "1" and "2". I
have
illustrated a way to do this in the reproducible example below and am
grateful for any comments on the soundness of what I'm doing. I have not
yet found a way to determine confidence intervals for the effect sizes
derived below and would appreciate tips on that.

set.seed(123456) # Make session reproducible

# Set up the treatment factor with three levels and 100 observations
# each
treatment <- factor(c(rep(0, 100), rep(1, 100), rep(2, 100)))

# Simulate outcomes
outcome <- rep(NA, 300)
outcome[treatment==0] <- rnorm(100, 10, 5) # baseline: mean=10, sd=5
outcome[treatment==1] <- rnorm(100, 30, 5) # effect size 4
outcome[treatment==2] <- rnorm(100, 40, 5) # effect size 6

# Check effect sizes (Cohen's d)
cohens.d <- function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) }
cohens.d(outcome[treatment==1], outcome[treatment==0])
[1] 3.984774
cohens.d(outcome[treatment==2], outcome[treatment==0])
[1] 6.167798

# Sometimes standardized regression coefficients are recommended
# for determining effect size but that clearly doesn't work here:
coef(lm(scale(outcome) ~ treatment))
(Intercept)  treatment1  treatment2
  -1.233366    1.453152    2.246946
# The reason it doesn't work is that the difference of outcome
# means is divided by the sd of *all* outcomes:
(mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome)
[1] 1.453152
(mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome)
[1] 2.246946

# Now, create a situation where Cohen's d is impossible to
# calculate directly by introducing a continuous covariate:
covariate <- 1:300
outcome <- outcome + rnorm(300, covariate, 2)
model <- lm(scale(outcome) ~ treatment + scale(covariate))
coef(model)
     (Intercept)       treatment1       treatment2 scale(covariate)
      -0.1720456        0.1994251        0.3167116        0.8753761

# Proposed way to determine effect size: simulate outcomes for each
# treatment level assuming the covariate to have a fixed value (here
# its mean value after standardization: zero)
library(MCMCpack)
no.of.sims <- 10000
sims.model <- MCMCregress(model, mcmc=no.of.sims)
sims.model[1:2,]
     (Intercept) treatment1 treatment2 scale(covariate)      sigma2
[1,]  -0.1780735  0.2024111  0.3395233        0.8682119 0.002617449
[2,]  -0.1521623  0.1773623  0.2956053        0.8764573 0.003529013
sims.treat0 <- rnorm(no.of.sims, sims.model[,"(Intercept)"],
sqrt(sims.model[,"sigma2"]))
sims.treat1 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] +
sims.model[,"treatment1"], sqrt(sims.model[,"sigma2"]))
sims.treat2 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] +
sims.model[,"treatment2"], sqrt(sims.model[,"sigma2"]))

# Calculate Cohen's d for simulated values
cohens.d(sims.treat1, sims.treat0)
[1] 3.683093
cohens.d(sims.treat2, sims.treat0)
[1] 5.782622

These values are reasonably close to the ones (4 and 6) I plugged in at
the beginning. It would be even nicer to have a confidence interval for
them, but if I bootstrap one out of the simulated outcomes its width
depends on the number of simulations and is therefore arbitrary. If
anyone knew a better way to get at the effect sizes I'm looking for or
how I could also get confidence intervals for them, that would be
greatly appreciated.

Thanks,

Christoph

--
Christoph Mathys, M.S.
Music and Neuroimaging Laboratory
Beth Israel Deaconess Medical Center
and Harvard Medical School

Chuck Cleland

2008-Feb-04 13:49 UTC

head link

[R] Effect size of comparison of two levels of a factor in multiple linear regression

On 2/3/2008 10:09 AM, Christoph Mathys wrote:> Dear R users,
> 
> I have a linear model of the kind
> 
> outcome ~ treatment + covariate
> 
> where 'treatment' is a factor with three levels ("0",
"1", and "2"),
> and the covariate is continuous. Treatments "1" and "2"
both have
> regression coefficients significantly different from 0 when using
> treatment contrasts with treatment "0" as the baseline. I would
now like
> to determine effect sizes (akin to Cohen's d in a two-sample
comparison)
> for the comparison to baseline of treatments "1" and
"2". I have
> illustrated a way to do this in the reproducible example below and am
> grateful for any comments on the soundness of what I'm doing. I have
not
> yet found a way to determine confidence intervals for the effect sizes
> derived below and would appreciate tips on that.
> 
> set.seed(123456) # Make session reproducible
> 
> # Set up the treatment factor with three levels and 100 observations
> # each
> treatment <- factor(c(rep(0, 100), rep(1, 100), rep(2, 100)))
> 
> # Simulate outcomes
> outcome <- rep(NA, 300)
> outcome[treatment==0] <- rnorm(100, 10, 5) # baseline: mean=10, sd=5
> outcome[treatment==1] <- rnorm(100, 30, 5) # effect size 4
> outcome[treatment==2] <- rnorm(100, 40, 5) # effect size 6
> 
> # Check effect sizes (Cohen's d)
> cohens.d <- function (x, y) {(mean(x)-mean(y))/sqrt((var(x)+var(y))/2) }
> cohens.d(outcome[treatment==1], outcome[treatment==0])
> [1] 3.984774
> cohens.d(outcome[treatment==2], outcome[treatment==0])
> [1] 6.167798
> 
> # Sometimes standardized regression coefficients are recommended
> # for determining effect size but that clearly doesn't work here:
> coef(lm(scale(outcome) ~ treatment))
> (Intercept)  treatment1  treatment2
>   -1.233366    1.453152    2.246946
> # The reason it doesn't work is that the difference of outcome
> # means is divided by the sd of *all* outcomes:
> (mean(outcome[treatment==1])-mean(outcome[treatment==0]))/sd(outcome)
> [1] 1.453152
> (mean(outcome[treatment==2])-mean(outcome[treatment==0]))/sd(outcome)
> [1] 2.246946
   How about scaling the outcome by the residual standard error from the 
unstandardized model?  For example:

library(MASS)

cmat <- diag(3)
diag(cmat) <- c(25,25,25)

df <- data.frame(Y = c(mvrnorm(100, mu=c(10,30,40), Sigma=cmat, 
empirical=TRUE)), TX = factor(rep(c(0,1,2), each=100)))

fm1 <- lm(Y ~ TX, data = df)
fm2 <- lm(scale(Y, scale=summary(fm1)$sigma) ~ TX, data = df)

summary(fm1)

Call:
lm(formula = Y ~ TX, data = df)

Residuals:
      Min       1Q   Median       3Q      Max
-12.7260  -3.5215  -0.1982   3.4517  12.1690

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  10.0000     0.5000   20.00   <2e-16 ***
TX1          20.0000     0.7071   28.28   <2e-16 ***
TX2          30.0000     0.7071   42.43   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 5 on 297 degrees of freedom
Multiple R-squared: 0.8627,     Adjusted R-squared: 0.8618
F-statistic: 933.3 on 2 and 297 DF,  p-value: < 2.2e-16

summary(fm2)

Call:
lm(formula = scale(Y, scale = summary(fm1)$sigma) ~ TX, data = df)

Residuals:
      Min       1Q   Median       3Q      Max
-2.54521 -0.70431 -0.03965  0.69034  2.43380

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3.3333     0.1000  -33.33   <2e-16 ***
TX1           4.0000     0.1414   28.28   <2e-16 ***
TX2           6.0000     0.1414   42.43   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1 on 297 degrees of freedom
Multiple R-squared: 0.8627,     Adjusted R-squared: 0.8618
F-statistic: 933.3 on 2 and 297 DF,  p-value: < 2.2e-16

confint(fm2)
                 2.5 %    97.5 %
(Intercept) -3.530132 -3.136535
TX1          3.721685  4.278315
TX2          5.721685  6.278315

   I've never seen this approach before, and I'm not how it would work 
when the variances within groups are heterogeneous or one or more 
covariates are added to the model.

hope this helps,

Chuck
> # Now, create a situation where Cohen's d is impossible to
> # calculate directly by introducing a continuous covariate:
> covariate <- 1:300
> outcome <- outcome + rnorm(300, covariate, 2)
> model <- lm(scale(outcome) ~ treatment + scale(covariate))
> coef(model)
>      (Intercept)       treatment1       treatment2 scale(covariate)
>       -0.1720456        0.1994251        0.3167116        0.8753761
> 
> # Proposed way to determine effect size: simulate outcomes for each
> # treatment level assuming the covariate to have a fixed value (here
> # its mean value after standardization: zero)
> library(MCMCpack)
> no.of.sims <- 10000
> sims.model <- MCMCregress(model, mcmc=no.of.sims)
> sims.model[1:2,]
>      (Intercept) treatment1 treatment2 scale(covariate)      sigma2
> [1,]  -0.1780735  0.2024111  0.3395233        0.8682119 0.002617449
> [2,]  -0.1521623  0.1773623  0.2956053        0.8764573 0.003529013
> sims.treat0 <- rnorm(no.of.sims, sims.model[,"(Intercept)"],
sqrt(sims.model[,"sigma2"]))
> sims.treat1 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] +
sims.model[,"treatment1"], sqrt(sims.model[,"sigma2"]))
> sims.treat2 <- rnorm(no.of.sims, sims.model[,"(Intercept)"] +
sims.model[,"treatment2"], sqrt(sims.model[,"sigma2"]))
> 
> # Calculate Cohen's d for simulated values
> cohens.d(sims.treat1, sims.treat0)
> [1] 3.683093
> cohens.d(sims.treat2, sims.treat0)
> [1] 5.782622
> 
> These values are reasonably close to the ones (4 and 6) I plugged in at
> the beginning. It would be even nicer to have a confidence interval for
> them, but if I bootstrap one out of the simulated outcomes its width
> depends on the number of simulations and is therefore arbitrary. If
> anyone knew a better way to get at the effect sizes I'm looking for or
> how I could also get confidence intervals for them, that would be
> greatly appreciated.
> 
> Thanks,
> 
> Christoph
> 
> --
> Christoph Mathys, M.S.
> Music and Neuroimaging Laboratory
> Beth Israel Deaconess Medical Center
> and Harvard Medical School
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 
-- 
Chuck Cleland, Ph.D.
NDRI, Inc.
71 West 23rd Street, 8th floor
New York, NY 10010
tel: (212) 845-4495 (Tu, Th)
tel: (732) 512-0171 (M, W, F)
fax: (917) 438-0894

Maybe Matching Threads

Search for more maybe matching threads

R help - Feb 2008 - Effect size of comparison of two levels of a factor in multiple linear regression

[R] Effect size of comparison of two levels of a factor in multiple linear regression

[R] Effect size of comparison of two levels of a factor in multiple linear regression

Maybe Matching Threads