Sven Garbade
2007-Aug-07  14:58 UTC
[R] Interaction factor and numeric variable versus separate regressions
Dear list members,
I have problems to interpret the coefficients from a lm model involving
the interaction of a numeric and factor variable compared to separate lm
models for each level of the factor variable.
## data:
y1 <- rnorm(20) + 6.8
y2 <- rnorm(20) + (1:20*1.7 + 1)
y3 <- rnorm(20) + (1:20*6.7 + 3.7)
y <- c(y1,y2,y3)
x <- rep(1:20,3)
f <- gl(3,20, labels=paste("lev", 1:3, sep=""))	
d <- data.frame(x=x,y=y, f=f)
## plot
# xyplot(y~x|f)
## lm model with interaction
summary(lm(y~x:f, data=d))
Call:
lm(formula = y ~ x:f, data = d)
Residuals:
    Min      1Q  Median      3Q     Max 
-2.8109 -0.8302  0.2542  0.6737  3.5383 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
Residual standard error: 1.53 on 56 degrees of freedom
Multiple R-Squared: 0.9984,	Adjusted R-squared: 0.9984 
F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16 
## separate lm fits
lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
$lev1
(Intercept)           x 
 6.77022860 -0.01667528 
$lev2
(Intercept)           x 
   1.019078    1.691982 
$lev3
(Intercept)           x 
   3.274656    6.738396 
Can anybody give me a hint why the coefficients for the slopes
(especially for lev1) are so different and how the coefficients from the
lm model with interaction are related to the separate fits?
Thanks, Sven
Gabor Grothendieck
2007-Aug-07  15:34 UTC
[R] Interaction factor and numeric variable versus separate regressions
In the single model all three levels share the same intercept which
means that the slope must change to accomodate it
whereas in the three separate models they each have their own
intercept.
Try looking at it graphically and note how the black dotted lines
are all forced to go through the same intercept, i.e. the same point
on the y axis, whereas the red dashed lines are each able to
fit their portion of the data using both the intercept and the slope.
y.lm <- lm(y~x:f, data=d)
plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20))
for(i in 1:3) {
	abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted")
	abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty =
"dashed")
}
grid()
On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de>
wrote:> Dear list members,
>
> I have problems to interpret the coefficients from a lm model involving
> the interaction of a numeric and factor variable compared to separate lm
> models for each level of the factor variable.
>
> ## data:
> y1 <- rnorm(20) + 6.8
> y2 <- rnorm(20) + (1:20*1.7 + 1)
> y3 <- rnorm(20) + (1:20*6.7 + 3.7)
> y <- c(y1,y2,y3)
> x <- rep(1:20,3)
> f <- gl(3,20, labels=paste("lev", 1:3, sep=""))
> d <- data.frame(x=x,y=y, f=f)
>
> ## plot
> # xyplot(y~x|f)
>
> ## lm model with interaction
> summary(lm(y~x:f, data=d))
>
> Call:
> lm(formula = y ~ x:f, data = d)
>
> Residuals:
>    Min      1Q  Median      3Q     Max
> -2.8109 -0.8302  0.2542  0.6737  3.5383
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept)  3.68799    0.41045   8.985 1.91e-12 ***
> x:flev1      0.20885    0.04145   5.039 5.21e-06 ***
> x:flev2      1.49670    0.04145  36.109  < 2e-16 ***
> x:flev3      6.70815    0.04145 161.838  < 2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05
'.' 0.1 ' ' 1
>
> Residual standard error: 1.53 on 56 degrees of freedom
> Multiple R-Squared: 0.9984,     Adjusted R-squared: 0.9984
> F-statistic: 1.191e+04 on 3 and 56 DF,  p-value: < 2.2e-16
>
> ## separate lm fits
> lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef)
> $lev1
> (Intercept)           x
>  6.77022860 -0.01667528
>
> $lev2
> (Intercept)           x
>   1.019078    1.691982
>
> $lev3
> (Intercept)           x
>   3.274656    6.738396
>
>
> Can anybody give me a hint why the coefficients for the slopes
> (especially for lev1) are so different and how the coefficients from the
> lm model with interaction are related to the separate fits?
>
> Thanks, Sven
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Prof Brian Ripley
2007-Aug-07  16:33 UTC
[R] Interaction factor and numeric variable versus separate regressions
These are not the same model. You want x*f, and then you will find the differences in intercepts and slopes from group 1 as the coefficients. Remember too that the combined model pools error variances and the separate model has separate error variance for each group. To understand model formulae, study Bill Venables' exposition in chapter 6 of MASS. On Tue, 7 Aug 2007, Sven Garbade wrote:> Dear list members, > > I have problems to interpret the coefficients from a lm model involving > the interaction of a numeric and factor variable compared to separate lm > models for each level of the factor variable. > > ## data: > y1 <- rnorm(20) + 6.8 > y2 <- rnorm(20) + (1:20*1.7 + 1) > y3 <- rnorm(20) + (1:20*6.7 + 3.7) > y <- c(y1,y2,y3) > x <- rep(1:20,3) > f <- gl(3,20, labels=paste("lev", 1:3, sep="")) > d <- data.frame(x=x,y=y, f=f) > > ## plot > # xyplot(y~x|f) > > ## lm model with interaction > summary(lm(y~x:f, data=d)) > > Call: > lm(formula = y ~ x:f, data = d) > > Residuals: > Min 1Q Median 3Q Max > -2.8109 -0.8302 0.2542 0.6737 3.5383 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.68799 0.41045 8.985 1.91e-12 *** > x:flev1 0.20885 0.04145 5.039 5.21e-06 *** > x:flev2 1.49670 0.04145 36.109 < 2e-16 *** > x:flev3 6.70815 0.04145 161.838 < 2e-16 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 1.53 on 56 degrees of freedom > Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984 > F-statistic: 1.191e+04 on 3 and 56 DF, p-value: < 2.2e-16 > > ## separate lm fits > lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef) > $lev1 > (Intercept) x > 6.77022860 -0.01667528 > > $lev2 > (Intercept) x > 1.019078 1.691982 > > $lev3 > (Intercept) x > 3.274656 6.738396 > > > Can anybody give me a hint why the coefficients for the slopes > (especially for lev1) are so different and how the coefficients from the > lm model with interaction are related to the separate fits? > > Thanks, Sven > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595