> On 30 Oct 2015, at 18:46 , Daniel Wagenaar <wagenadl at uc.edu>
wrote:
>
> Dear R users:
>
> All textbook references that I consult say that in a nested ANOVA (e.g.,
A/B), the F statistic for factor A should be calculated as
>
> F_A = MS_A / MS_(B within A).
>
That would depend on which hypothesis you test in which model. If a reference
tells you that you "should" do something without specifying the model,
then you "should" look at a different reference.
In general, having anything other than the residual MS in the denominator
indicates that you think it represents an additional source of random variation.
I don't think that is invariably the case in nested designs (and, by the
way, notice that "nested" is used differently by different books and
software).
If you don't say otherwise, R assumes that there is only one source of
random variation the model - a single error term if you like - and that all
other terms represent systematic variations. In this mode of thinking, an A:B
term represents an effect of B within A (additive and interaction effects
combined), and you can test for its presence by comparing MS_A:B to MS_res. In
its absence, you might choose to reduce the model and next look for an effect of
A; purists would do this by comparing MS_A to the new MS_res obtained by pooling
MS_A:B and MS_res, but lazy statisticians/programmers have found it more
convenient to stick with the original MS_res denominator throughout (to get the
pooling done, just fit the reduced model).
If you want A:B to be a random term, then you need to say so, e.g. using
> summary(aov(Y~A+Error(A:B-1)))
Error: A:B
Df Sum Sq Mean Sq F value Pr(>F)
A 2 0.4735 0.2367 0.403 0.7
Residuals 3 1.7635 0.5878
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 6 4.993 0.8322
(the -1 in the Error() term prevents an error message, which as far as I can
tell is spurious).
Notice that you need aov() for this; lm() doesn't do Error() terms. This
_only_ works in balanced designs.
-pd
> But when I run this simple example:
>
> set.seed(1)
> A <- factor(rep(1:3, each=4))
> B <- factor(rep(1:2, 3, each=2))
> Y <- rnorm(12)
> anova(lm(Y ~ A/B))
>
> I get this result:
>
> Analysis of Variance Table
>
> Response: Y
> Df Sum Sq Mean Sq F value Pr(>F)
> A 2 0.4735 0.23675 0.2845 0.7620
> A:B 3 1.7635 0.58783 0.7064 0.5823
> Residuals 6 4.9931 0.83218
>
> Evidently, R calculates the F value for A as MS_A / MS_Residuals.
>
> While it is straightforward enough to calculate what I think is the correct
result from the table, I am surprised that R doesn't give me that answer
directly. Does anybody know if R's behavior is intentional, and if so, why?
Equally importantly, is there a straightforward way to make R give the answer I
expect, that is:
>
> Df Sum Sq Mean Sq F value Pr(>F)
> A 2 0.4735 0.23675 0.4028 0.6999
>
> The students in my statistics class would be much happier if they
didn't have to type things like
>
> a <- anova(...)
> F <- a$`Sum Sq`[1] / a$`Sum Sq`[2]
> P <- 1 - pf(F, a$Df[1], a$Df[2])
>
> (They are not R programmers (yet).) And to be honest, I would find it
easier to read those results directly from the table as well.
>
> Thanks,
>
> Daniel Wagenaar
>
> --
> Daniel A. Wagenaar, PhD
> Assistant Professor
> Department of Biological Sciences
> McMicken College of Arts and Sciences
> University of Cincinnati
> Cincinnati, OH 45221
> Phone: +1 (513) 556-9757
> Email: daniel.wagenaar at uc.edu
> Web: http://www.danielwagenaar.net
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com