Sven Garbade
2007-Aug-07 14:58 UTC
[R] Interaction factor and numeric variable versus separate regressions
Dear list members, I have problems to interpret the coefficients from a lm model involving the interaction of a numeric and factor variable compared to separate lm models for each level of the factor variable. ## data: y1 <- rnorm(20) + 6.8 y2 <- rnorm(20) + (1:20*1.7 + 1) y3 <- rnorm(20) + (1:20*6.7 + 3.7) y <- c(y1,y2,y3) x <- rep(1:20,3) f <- gl(3,20, labels=paste("lev", 1:3, sep="")) d <- data.frame(x=x,y=y, f=f) ## plot # xyplot(y~x|f) ## lm model with interaction summary(lm(y~x:f, data=d)) Call: lm(formula = y ~ x:f, data = d) Residuals: Min 1Q Median 3Q Max -2.8109 -0.8302 0.2542 0.6737 3.5383 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 3.68799 0.41045 8.985 1.91e-12 *** x:flev1 0.20885 0.04145 5.039 5.21e-06 *** x:flev2 1.49670 0.04145 36.109 < 2e-16 *** x:flev3 6.70815 0.04145 161.838 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.53 on 56 degrees of freedom Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984 F-statistic: 1.191e+04 on 3 and 56 DF, p-value: < 2.2e-16 ## separate lm fits lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef) $lev1 (Intercept) x 6.77022860 -0.01667528 $lev2 (Intercept) x 1.019078 1.691982 $lev3 (Intercept) x 3.274656 6.738396 Can anybody give me a hint why the coefficients for the slopes (especially for lev1) are so different and how the coefficients from the lm model with interaction are related to the separate fits? Thanks, Sven
Gabor Grothendieck
2007-Aug-07 15:34 UTC
[R] Interaction factor and numeric variable versus separate regressions
In the single model all three levels share the same intercept which means that the slope must change to accomodate it whereas in the three separate models they each have their own intercept. Try looking at it graphically and note how the black dotted lines are all forced to go through the same intercept, i.e. the same point on the y axis, whereas the red dashed lines are each able to fit their portion of the data using both the intercept and the slope. y.lm <- lm(y~x:f, data=d) plot(y ~ x, d, col = as.numeric(d$f), xlim = c(-5, 20)) for(i in 1:3) { abline(a = coef(y.lm)[1], b = coef(y.lm)[1+i], lty = "dotted") abline(lm(y ~ x, d[as.numeric(d$f) == i,]), col = "red", lty = "dashed") } grid() On 8/7/07, Sven Garbade <Sven.Garbade at med.uni-heidelberg.de> wrote:> Dear list members, > > I have problems to interpret the coefficients from a lm model involving > the interaction of a numeric and factor variable compared to separate lm > models for each level of the factor variable. > > ## data: > y1 <- rnorm(20) + 6.8 > y2 <- rnorm(20) + (1:20*1.7 + 1) > y3 <- rnorm(20) + (1:20*6.7 + 3.7) > y <- c(y1,y2,y3) > x <- rep(1:20,3) > f <- gl(3,20, labels=paste("lev", 1:3, sep="")) > d <- data.frame(x=x,y=y, f=f) > > ## plot > # xyplot(y~x|f) > > ## lm model with interaction > summary(lm(y~x:f, data=d)) > > Call: > lm(formula = y ~ x:f, data = d) > > Residuals: > Min 1Q Median 3Q Max > -2.8109 -0.8302 0.2542 0.6737 3.5383 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.68799 0.41045 8.985 1.91e-12 *** > x:flev1 0.20885 0.04145 5.039 5.21e-06 *** > x:flev2 1.49670 0.04145 36.109 < 2e-16 *** > x:flev3 6.70815 0.04145 161.838 < 2e-16 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 1.53 on 56 degrees of freedom > Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984 > F-statistic: 1.191e+04 on 3 and 56 DF, p-value: < 2.2e-16 > > ## separate lm fits > lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef) > $lev1 > (Intercept) x > 6.77022860 -0.01667528 > > $lev2 > (Intercept) x > 1.019078 1.691982 > > $lev3 > (Intercept) x > 3.274656 6.738396 > > > Can anybody give me a hint why the coefficients for the slopes > (especially for lev1) are so different and how the coefficients from the > lm model with interaction are related to the separate fits? > > Thanks, Sven > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Prof Brian Ripley
2007-Aug-07 16:33 UTC
[R] Interaction factor and numeric variable versus separate regressions
These are not the same model. You want x*f, and then you will find the differences in intercepts and slopes from group 1 as the coefficients. Remember too that the combined model pools error variances and the separate model has separate error variance for each group. To understand model formulae, study Bill Venables' exposition in chapter 6 of MASS. On Tue, 7 Aug 2007, Sven Garbade wrote:> Dear list members, > > I have problems to interpret the coefficients from a lm model involving > the interaction of a numeric and factor variable compared to separate lm > models for each level of the factor variable. > > ## data: > y1 <- rnorm(20) + 6.8 > y2 <- rnorm(20) + (1:20*1.7 + 1) > y3 <- rnorm(20) + (1:20*6.7 + 3.7) > y <- c(y1,y2,y3) > x <- rep(1:20,3) > f <- gl(3,20, labels=paste("lev", 1:3, sep="")) > d <- data.frame(x=x,y=y, f=f) > > ## plot > # xyplot(y~x|f) > > ## lm model with interaction > summary(lm(y~x:f, data=d)) > > Call: > lm(formula = y ~ x:f, data = d) > > Residuals: > Min 1Q Median 3Q Max > -2.8109 -0.8302 0.2542 0.6737 3.5383 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.68799 0.41045 8.985 1.91e-12 *** > x:flev1 0.20885 0.04145 5.039 5.21e-06 *** > x:flev2 1.49670 0.04145 36.109 < 2e-16 *** > x:flev3 6.70815 0.04145 161.838 < 2e-16 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 1.53 on 56 degrees of freedom > Multiple R-Squared: 0.9984, Adjusted R-squared: 0.9984 > F-statistic: 1.191e+04 on 3 and 56 DF, p-value: < 2.2e-16 > > ## separate lm fits > lapply(by(d, d$f, function(x) lm(y ~ x, data=x)), coef) > $lev1 > (Intercept) x > 6.77022860 -0.01667528 > > $lev2 > (Intercept) x > 1.019078 1.691982 > > $lev3 > (Intercept) x > 3.274656 6.738396 > > > Can anybody give me a hint why the coefficients for the slopes > (especially for lev1) are so different and how the coefficients from the > lm model with interaction are related to the separate fits? > > Thanks, Sven > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595