dear list,
i just found this post in the archive:
----------------
On 23-Apr-05 Bill.Venables at csiro.au
<https://stat.ethz.ch/mailman/listinfo/r-help>
wrote:>:* -----Original Message-----*>:* From: r-help-bounces at
stat.math.ethz.ch <https://stat.ethz.ch/mailman/listinfo/r-help> *>:*
[mailto:r-help-bounces at stat.math.ethz.ch
<https://stat.ethz.ch/mailman/listinfo/r-help>] On Behalf Of *>:*
michael watson (IAH-C)*>:* [...]*>:* I have a highly significant
interaction term. In the context*>:* of the experiment, this makes sense. I
can visualise the data *>:* graphically, and sure enough I can see that both
factors have*>:* different effects on the data DEPENDING on what the value
of*>:* the other factor is. *>:* *>:* I explain this all to my
colleague - and she asks "but which*>:* ones are different?" This
is best illustrated with an example.*>:* We have either infected |
uninfected, and vaccinated | unvaccinated*>:* (the two factors).*>:*
We're measuring expression of a gene. Graphically, in the*>:* infected
group, vaccination makes expression go up. In the*>:* uninfected group,
vaccination makes expression go down. In*>:* both the vaccinated and
unvaccinated groups, infection makes*>:* expression go down, but it goes down
further in unvaccinated*>:* than it does in vaccinated.*>:* *>:* So
from a statistical point of view, I can see exactly why*>:* the interaction
term is significant, but what my colleage*>:* wants to know is that WITHIN
the vaccinated group, does*>:* infection decrease expression significantly?
And within*>:* the unvaccinated group, does infection decrease
expression*>:* significantly? Etc etc etc Can I get this information
from*>:* the output of the ANOVA, or do I carry out a separate*>:* test on
e.g. just the vaccinated group? (seems a cop out to me)*>* *>* No, you
can't get this kind of specific information out of the anova*>* table and
yes, anova tables *are* a bit of a cop out. (I sometimes *>* think they
should only be allowed between consenting adults in*>* private.)*
I think the "cop out" Michael Watson was referring to means
going back to basics and doing a separate analysis on each group
(though no doubt using the Res SS from the AoV table).
Not that I disagree with your comment: I sometimes think that
anova tables are often passed round between adults in order to
induce consent which might otherwise have been withheld.
>* What you are asking for is a non-standard, but perfectly*>* reasonable
partition of the degrees of freedom between the*>* classes of a single factor
with four levels got by pairing*>* up the levels of vaccination and
innoculation. Of course you*>* can get this information, but you have to do a
bit of work*>* for it. *
It seems to me that this is a wrapper for "separate analysis
on each group"!
>* Before I give the example which I don't expect too many people*>*
to read entirely, let me issue a little challenge, namely to*>* write tools
to automate a generalized version of the procedure*>* below.*
[technical setup snipped]
>>* contrasts(dat$vac_inf) <- ginv(m)*>>* gm <- aov(y ~
vac_inf, dat)*>>* summary(gm)*>* Df Sum Sq Mean Sq F value
Pr(>F)*>* vac_inf 3 12.1294 4.0431 7.348 0.04190*>* Residuals
4 2.2009 0.5502*>* *>* This doesn't tell us too much other than
there are differences,*>* probably. Now to specify the partition:*>*
*>>* summary(gm, *>* split = list(vac_inf = list("- vs
+|N" = 1, *>* "- vs +|Y" =
2)))*>* Df Sum Sq Mean Sq F value Pr(>F)*>*
vac_inf 3 12.1294 4.0431 7.3480 0.04190*>* vac_inf: - vs +|N
1 7.9928 7.9928 14.5262 0.01892*>* vac_inf: - vs +|Y 1 3.7863 3.7863
6.8813 0.05860*>* Residuals 4 2.2009 0.5502*
Wow, Bill! Dazzling. This is like watching a rabbit hop
into a hat, and fly out as a dove. I must study this syntax.
But where can I find out about the "split" argument to
"summary"?
I've found the *function* split, whose effect is similar, but
I've wandered around the "summary", "summary.lm" etc.
forest
for a while without locating the *argument*.
My naive ("cop-out") approach would have been on the lines
of (without setting up the contrast matrix):
summary(aov(y~vac*inf,data=dat))
Df Sum Sq Mean Sq F value Pr(>F)
vac 1 0.3502 0.3502 0.6364 0.46968
inf 1 11.3908 11.3908 20.7017 0.01042 *
vac:inf 1 0.3884 0.3884 0.7058 0.44812
Residuals 4 2.2009 0.5502
so we get the 2.2009 on 4 df SS for redisuals with mean SS 0.5502.
Then I would do:
mNp<-mean(y[(vac=="N")&(inf=="+")])
mNm<-mean(y[(vac=="N")&(inf=="-")])
mYp<-mean(y[(vac=="Y")&(inf=="+")])
mYm<-mean(y[(vac=="Y")&(inf=="-")])
c( mYp, mYm, mNp, mNm )
##[1] 2.4990492 0.5532018 2.5212655 -0.3058972
c( mYp-mYm, mNp-mNm )
##[1] 1.945847 2.827163
after which:
1-pt(((mYp-mYm)/sqrt(0.5502)),4)
##[1] 0.02929801
1-pt(((mNp-mNm)/sqrt(0.5502)),4)
##[1] 0.009458266
give you 1-sided t-tests, and
1-pf(((mYp-mYm)/sqrt(0.5502))^2,1,4)
##[1] 0.05859602
1-pf(((mNp-mNm)/sqrt(0.5502))^2,1,4)
##[1] 0.01891653
give you F-tests (equivalent to 2-sided t-tests) which agree
with Bill's results above.
And, in this case, presenting the results as mean differences
shows that the effect of infection appears to differ between
vaccinated and unvaccinated groups; but a simple test shows
this not to be significant:
1-pf( (sqrt(1/2)*((mYp-mYm)-(mNp-mNm))/sqrt(0.5502))^2, 1,4)
##[1] 0.4481097
As I said above, this would be my naive approach to this
particular case (and to any like it), and I would expect
to be able to explain all this in simple terms to a colleague
who was asking the sort of questions you have reported.
Or is it the case that offering an anova table is needed,
in order to evoke consent to the results by virtue of the
familiar format?
>* As expected, infection changes the mean for both vaccinated and*>*
unvaccinated, as we arranged when we generated the data.*
The challenge to generalise is interesting. However, as implied
above, I'm already impressed by Bill's footwork in R for this
simple case, and it might be some time before I'm fluent
enough in that sort of thing to deal with more complicated
cases, let alone the general one.
For users like myself, a syntax whose terms are closer to
ordinary language would be more approachable. Something
on the lines of
summary(aov(y ~ vac*inf), by=inf, within=vac)
which would present a table similar to Bill's above (by inf
within the different levels of vac).
Intriguing. The challenge is tempting ... !
Best wishes,
Ted.
--------
This way for simple main effects testing looks reasonable to me. I
just wanted to know whether the user should have applied a multiple
comparison correction in this case (bonferroni, FDR, etc)?
cheers
Jake
[[alternative HTML version deleted]]