michael watson (IAH-C)
2005-Apr-22 09:46 UTC
[R] Anova - interpretation of the interaction term
Hi So carrying on my use of analysis of variance to check for the effects of two factors. It's made simpler by the fact that both my factors have only two levels each, creating four unique groups. I have a highly significant interaction term. In the context of the experiment, this makes sense. I can visualise the data graphically, and sure enough I can see that both factors have different effects on the data DEPENDING on what the value of the other factor is. I explain this all to my colleague - and she asks "but which ones are different?" This is best illustrated with an example. We have either infected | uninfected, and vaccinated | unvaccinated (the two factors). We're measuring expression of a gene. Graphically, in the infected group, vaccination makes expression go up. In the uninfected group, vaccination makes expression go down. In both the vaccinated and unvaccinated groups, infection makes expression go down, but it goes down further in unvaccinated than it does in vaccinated. So from a statistical point of view, I can see exactly why the interaction term is significant, but what my colleage wants to know is that WITHIN the vaccinated group, does infection decrease expression significantly? And within the unvaccinated group, does infection decrease expression significantly? Etc etc etc Can I get this information from the output of the ANOVA, or do I carry out a separate test on e.g. just the vaccinated group? (seems a cop out to me) Many thanks, and sorry, but it's Friday. Mick
On 22-Apr-05 michael watson \(IAH-C\) wrote:> [...] > So from a statistical point of view, I can see exactly why the > interaction term is significant, but what my colleage wants to know is > that WITHIN the vaccinated group, does infection decrease expression > significantly? And within the unvaccinated group, does infection > decrease expression significantly? Etc etc etc Can I get this > information from the output of the ANOVA, or do I carry out a separate > test on e.g. just the vaccinated group? (seems a cop out to me)If I understand right, each of these questions can only be answered in terms of the changes in mean level *within* group. However, you are entitled to use the residual sum of squares (after estimating both effects and interaction) for the estimate of SE to which you compare these within-group changes. Provided, of course, that the variance is homogenous across groups (i.e. treatment and/or infection has no influence on variability). You can get the latter from the original ANOVA (interaction term included) but I think you should get the difference of means (infected vs non-infected) by a separate anlysis of each group.> Many thanks, and sorry, but it's Friday.Don't apologise. Not your fault it's Friday. (Who *can* I blame?) Cheers, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 22-Apr-05 Time: 11:52:40 ------------------------------ XFMail ------------------------------
Bill.Venables@csiro.au
2005-Apr-23 02:57 UTC
[R] Anova - interpretation of the interaction term
: -----Original Message----- : From: r-help-bounces at stat.math.ethz.ch : [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of : michael watson (IAH-C) : Sent: Friday, 22 April 2005 7:47 PM : To: r-help at stat.math.ethz.ch : Subject: [R] Anova - interpretation of the interaction term : : : Hi : : So carrying on my use of analysis of variance to check for the effects : of two factors. It's made simpler by the fact that both my : factors have : only two levels each, creating four unique groups. : : I have a highly significant interaction term. In the context of the : experiment, this makes sense. I can visualise the data : graphically, and : sure enough I can see that both factors have different effects on the : data DEPENDING on what the value of the other factor is. : : I explain this all to my colleague - and she asks "but which ones are : different?" This is best illustrated with an example. We have either : infected | uninfected, and vaccinated | unvaccinated (the two : factors). : We're measuring expression of a gene. Graphically, in the infected : group, vaccination makes expression go up. In the uninfected group, : vaccination makes expression go down. In both the vaccinated and : unvaccinated groups, infection makes expression go down, but it goes : down further in unvaccinated than it does in vaccinated. : : So from a statistical point of view, I can see exactly why the : interaction term is significant, but what my colleage wants to know is : that WITHIN the vaccinated group, does infection decrease expression : significantly? And within the unvaccinated group, does infection : decrease expression significantly? Etc etc etc Can I get this : information from the output of the ANOVA, or do I carry out a separate : test on e.g. just the vaccinated group? (seems a cop out to me) No, you can't get this kind of specific information out of the anova table and yes, anova tables *are* a bit of a cop out. (I sometimes think they should only be allowed between consenting adults in private.) What you are asking for is a non-standard, but perfectly reasonable partition of the degrees of freedom between the classes of a single factor with four levels got by pairing up the levels of vaccination and innoculation. Of course you can get this information, but you have to do a bit of work for it. Before I give the example which I don't expect too many people to read entirely, let me issue a little challenge, namely to write tools to automate a generalized version of the procedure below. Here is the example, (drawing from the explanation given in a certain book, to wit chapter 6):> dat <- expand.grid(vac = c("N", "Y"), inf = c("-", "+")) > dat <- rbind(dat, dat) # to get a bit of replicationNow we make a 4-level factor from vaccination and infection and generate a bit of data with an infection effect built into it:> dat <- transform(dat, vac_inf = vac:inf,y = as.numeric(inf) + rnorm(8))> datvac inf vac_inf y 1 N - N:- 0.2285096 2 Y - Y:- 1.3504610 3 N + N:+ 2.5581254 4 Y + Y:+ 2.9208313 11 N - N:- -0.8403039 21 Y - Y:- -0.2440574 31 N + N:+ 2.4844055 41 Y + Y:+ 2.0772671 Now give the joint factor contrasts reflecting the partition we want to effect:> levels(dat$vac_inf)[1] "N:-" "N:+" "Y:-" "Y:+"> m <- matrix(scan(), ncol = 4, byrow = T)1: -1 1 0 0 5: 0 0 -1 1 9: 1 1 -1 -1 13: Read 12 items> fractions(ginv(m)) ## just to see what it looks like[,1] [,2] [,3] [1,] -1/2 0 1/4 [2,] 1/2 0 1/4 [3,] 0 -1/2 -1/4 [4,] 0 1/2 -1/4 Note that we could have simply used t(m), but this is not always possible. Associate these contrasts, fit and analyse:> contrasts(dat$vac_inf) <- ginv(m) > gm <- aov(y ~ vac_inf, dat) > summary(gm)Df Sum Sq Mean Sq F value Pr(>F) vac_inf 3 12.1294 4.0431 7.348 0.04190 Residuals 4 2.2009 0.5502 This doesn't tell us too much other than there are differences, probably. Now to specify the partition:> summary(gm,split = list(vac_inf = list("- vs +|N" = 1, "- vs +|Y" = 2))) Df Sum Sq Mean Sq F value Pr(>F) vac_inf 3 12.1294 4.0431 7.3480 0.04190 vac_inf: - vs +|N 1 7.9928 7.9928 14.5262 0.01892 vac_inf: - vs +|Y 1 3.7863 3.7863 6.8813 0.05860 Residuals 4 2.2009 0.5502 As expected, infection changes the mean for both vaccinated and unvaccinated, as we arranged when we generated the data. : : Many thanks, and sorry, but it's Friday. : : Mick : : ______________________________________________ : R-help at stat.math.ethz.ch mailing list : https://stat.ethz.ch/mailman/listinfo/r-help : PLEASE do read the posting guide! : http://www.R-project.org/posting-guide.html :
> From: Ted.Harding at nessie.mcc.ac.uk > > On 23-Apr-05 Bill.Venables at csiro.au wrote:[snip]> [technical setup snipped] > > >> contrasts(dat$vac_inf) <- ginv(m) > >> gm <- aov(y ~ vac_inf, dat) > >> summary(gm) > > Df Sum Sq Mean Sq F value Pr(>F) > > vac_inf 3 12.1294 4.0431 7.348 0.04190 > > Residuals 4 2.2009 0.5502 > > > > This doesn't tell us too much other than there are differences, > > probably. Now to specify the partition: > > > >> summary(gm, > > split = list(vac_inf = list("- vs +|N" = 1, > > "- vs +|Y" = 2))) > > Df Sum Sq Mean Sq F value Pr(>F) > > vac_inf 3 12.1294 4.0431 7.3480 0.04190 > > vac_inf: - vs +|N 1 7.9928 7.9928 14.5262 0.01892 > > vac_inf: - vs +|Y 1 3.7863 3.7863 6.8813 0.05860 > > Residuals 4 2.2009 0.5502 > > Wow, Bill! Dazzling. This is like watching a rabbit hop > into a hat, and fly out as a dove. I must study this syntax. > But where can I find out about the "split" argument to "summary"? > I've found the *function* split, whose effect is similar, but > I've wandered around the "summary", "summary.lm" etc. forest > for a while without locating the *argument*.gm is fitted with aov(), so wouldn't it make sense to look at ?summary.aov? It says: split an optional named list, with names corresponding to terms in the model. Each component is itself a list with integer components giving contrasts whose contributions are to be summed. and even has an example showing how it's used. Andy> My naive ("cop-out") approach would have been on the lines > of (without setting up the contrast matrix): > > summary(aov(y~vac*inf,data=dat)) > Df Sum Sq Mean Sq F value Pr(>F) > vac 1 0.3502 0.3502 0.6364 0.46968 > inf 1 11.3908 11.3908 20.7017 0.01042 * > vac:inf 1 0.3884 0.3884 0.7058 0.44812 > Residuals 4 2.2009 0.5502 > > so we get the 2.2009 on 4 df SS for redisuals with mean SS 0.5502. > > Then I would do: > > mNp<-mean(y[(vac=="N")&(inf=="+")]) > mNm<-mean(y[(vac=="N")&(inf=="-")]) > mYp<-mean(y[(vac=="Y")&(inf=="+")]) > mYm<-mean(y[(vac=="Y")&(inf=="-")]) > > c( mYp, mYm, mNp, mNm ) > ##[1] 2.4990492 0.5532018 2.5212655 -0.3058972 > > c( mYp-mYm, mNp-mNm ) > ##[1] 1.945847 2.827163 > > > after which: > > 1-pt(((mYp-mYm)/sqrt(0.5502)),4) > ##[1] 0.02929801 > > 1-pt(((mNp-mNm)/sqrt(0.5502)),4) > ##[1] 0.009458266 > > give you 1-sided t-tests, and > > 1-pf(((mYp-mYm)/sqrt(0.5502))^2,1,4) > ##[1] 0.05859602 > > 1-pf(((mNp-mNm)/sqrt(0.5502))^2,1,4) > ##[1] 0.01891653 > > give you F-tests (equivalent to 2-sided t-tests) which agree > with Bill's results above. > > And, in this case, presenting the results as mean differences > shows that the effect of infection appears to differ between > vaccinated and unvaccinated groups; but a simple test shows > this not to be significant: > > 1-pf( (sqrt(1/2)*((mYp-mYm)-(mNp-mNm))/sqrt(0.5502))^2, 1,4) > ##[1] 0.4481097 > > As I said above, this would be my naive approach to this > particular case (and to any like it), and I would expect > to be able to explain all this in simple terms to a colleague > who was asking the sort of questions you have reported. > Or is it the case that offering an anova table is needed, > in order to evoke consent to the results by virtue of the > familiar format? > > > As expected, infection changes the mean for both vaccinated and > > unvaccinated, as we arranged when we generated the data. > > The challenge to generalise is interesting. However, as implied > above, I'm already impressed by Bill's footwork in R for this > simple case, and it might be some time before I'm fluent > enough in that sort of thing to deal with more complicated > cases, let alone the general one. > > For users like myself, a syntax whose terms are closer to > ordinary language would be more approachable. Something > on the lines of > > summary(aov(y ~ vac*inf), by=inf, within=vac) > > which would present a table similar to Bill's above (by inf > within the different levels of vac). > > Intriguing. The challenge is tempting ... ! > > Best wishes, > Ted. > > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> > Fax-to-email: +44 (0)870 094 0861 > Date: 23-Apr-05 Time: 14:33:00 > ------------------------------ XFMail ------------------------------ > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >