Gustaf Granath
2007-Dec-05 19:31 UTC
[R] Interpretation of 'Intercept' in a 2-way factorial lm
Hi all,
I hope this question is not too trivial. I can't find an explanation
anywhere (Stats and R books, R-archives) so now I have to turn to the R-list.
Question:
If you have a factorial design with two factors (say A and B with two
levels each). What does the intercept coefficient with
treatment.contrasts represent??
Here is an example without interaction where A has two levels A1 and
A2, and B has two levels B1 and B2. So R takes as a baseline A1 and B1.
coef( summary ( lm ( fruit ~ A + B, data = test)))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.716667 0.5484828 4.953058 7.879890e-04
A2 6.266667 0.6333333 9.894737 3.907437e-06
B2 5.166667 0.6333333 8.157895 1.892846e-05
I understand that the mean of A2 is +6.3 more than A1, and
that B2 is 5.2 more than B1.
So the question is: Is the intercept A1 and B1 combined as one mean
("the baseline")? or is it something else? Does this number actually
tell me anything
useful (2.716)??
What does the model (y = intercept + ??) look like then? I can't understand
how both factors (A and B) can have the same intercept?
Thanks in advance!!
Gustaf Granath
Dept of Plant Ecology
Uppsala University, Sweden
Peter Dalgaard
2007-Dec-05 19:41 UTC
[R] Interpretation of 'Intercept' in a 2-way factorial lm
Gustaf Granath wrote:> Hi all, > > I hope this question is not too trivial. I can't find an explanation > anywhere (Stats and R books, R-archives) so now I have to turn to the R-list. > > Question: > > If you have a factorial design with two factors (say A and B with two > levels each). What does the intercept coefficient with > treatment.contrasts represent?? > > Here is an example without interaction where A has two levels A1 and > A2, and B has two levels B1 and B2. So R takes as a baseline A1 and B1. > > coef( summary ( lm ( fruit ~ A + B, data = test))) > > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.716667 0.5484828 4.953058 7.879890e-04 > A2 6.266667 0.6333333 9.894737 3.907437e-06 > B2 5.166667 0.6333333 8.157895 1.892846e-05 > > I understand that the mean of A2 is +6.3 more than A1, and > that B2 is 5.2 more than B1. > > So the question is: Is the intercept A1 and B1 combined as one mean > ("the baseline")? or is it something else? Does this number actually > tell me anything > useful (2.716)?? > > What does the model (y = intercept + ??) look like then? I can't understand > how both factors (A and B) can have the same intercept? > >Consider an AxB crosstable of (fitted) means. Upper left corner is intercept , add A2, B2, or both to get the other three cells. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Daniel Malter
2007-Dec-05 21:02 UTC
[R] Interpretation of 'Intercept' in a 2-way factorial lm
You estimate a model with the Factors A or B either present (1) or not
present (0) and with an intercept. Thus you would predict:
For both A and B not present: Intercept
For A only present: Intercept+coef(A)
For B only preseent: Intercept+coef(B)
For both present: Intercept+coef(A)+coef(B).
Again, you would interpret the intercept as the value of "fruit" when
A and
B are not present (or inactive). If the intercept is not meaningful in your
setting and you just want to know if both groups differ, then you want to
use function aov I guess. What is your "fruit" variable? I would also
suggest to visually inspect your data. That always helps :) The code is also
down below.
Look at the following example in which 4 x 10 Ys are drawn randomly from
normal distributions with equal variance but different means. The first ten
observations have both A and B not present (i.e. 0) as specified in the
vectors "a" and "b". The mean of these observations where A
and B are zero
is 1 as specified in y1=rnorm(10, -> 1 <-,1). As you will see if you run
this code, the estimated Intercept is 1.0512 which is close to 1 (the true
mean). As you see (just confirming what was said above), this is the average
of the baseline (or reference group if you will) when both A and B are
absent.
y1=rnorm(10,1,1)
y2=rnorm(10,2,1)
y3=rnorm(10,3,1)
y4=rnorm(10,4,1)
a=c(rep(0,20),rep(1,20))
b=c(rep(0,10),rep(1,10),rep(0,10),rep(1,10))
y=c(y1,y2,y3,y4)
data=data.frame(cbind(y,a,b))
####Plot####
interaction.plot(a,b,y)
####Models####
summary(lm(y~factor(a)+factor(b),data=data)
####Compare this to####
summary(aov(y~factor(a)+factor(b),data=data)
Cheers,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Im
Auftrag von Gustaf Granath
Gesendet: Wednesday, December 05, 2007 2:32 PM
An: r-help at r-project.org
Betreff: [R] Interpretation of 'Intercept' in a 2-way factorial lm
Hi all,
I hope this question is not too trivial. I can't find an explanation
anywhere (Stats and R books, R-archives) so now I have to turn to the
R-list.
Question:
If you have a factorial design with two factors (say A and B with two levels
each). What does the intercept coefficient with treatment.contrasts
represent??
Here is an example without interaction where A has two levels A1 and A2, and
B has two levels B1 and B2. So R takes as a baseline A1 and B1.
coef( summary ( lm ( fruit ~ A + B, data = test)))
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.716667 0.5484828 4.953058 7.879890e-04
A2 6.266667 0.6333333 9.894737 3.907437e-06
B2 5.166667 0.6333333 8.157895 1.892846e-05
I understand that the mean of A2 is +6.3 more than A1, and that B2 is 5.2
more than B1.
So the question is: Is the intercept A1 and B1 combined as one mean ("the
baseline")? or is it something else? Does this number actually tell me
anything useful (2.716)??
What does the model (y = intercept + ??) look like then? I can't understand
how both factors (A and B) can have the same intercept?
Thanks in advance!!
Gustaf Granath
Dept of Plant Ecology
Uppsala University, Sweden
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.