Dear R users: All textbook references that I consult say that in a nested ANOVA (e.g., A/B), the F statistic for factor A should be calculated as F_A = MS_A / MS_(B within A). But when I run this simple example: set.seed(1) A = factor(rep(1:3, each=4)) B = factor(rep(1:2, 3, each=2)) Y = rnorm(12) anova(lm(Y ~ A/B)) I get this result: Analysis of Variance Table Response: Y Df Sum Sq Mean Sq F value Pr(>F) A 2 0.4735 0.23675 0.2845 0.7620 A:B 3 1.7635 0.58783 0.7064 0.5823 Residuals 6 4.9931 0.83218 Evidently, R calculates the F value for A as MS_A / MS_Residuals. While it is straightforward enough to calculate what I think is the correct result from the table, I am surprised that R doesn't give me that answer directly. Does anybody know if R's behavior is intentional, and if so, why? And, perhaps most importantly, how to get the "textbook" result in the most straightforward way? (I'd like to be able to give me students a simple procedure...) Thanks, Daniel Wagenaar -- Daniel A. Wagenaar, PhD Assistant Professor Department of Biological Sciences McMicken College of Arts and Sciences University of Cincinnati Cincinnati, OH 45221 Phone: +1 (513) 556-9757 Email: daniel.wagenaar at uc.edu Web: http://www.danielwagenaar.net
Maybe you want summary(aov(Y ~ A + Error(A:B))) Kevin On Fri, Oct 30, 2015 at 9:32 AM, Wagenaar, Daniel (wagenadl) < wagenadl at ucmail.uc.edu> wrote:> Dear R users: > > All textbook references that I consult say that in a nested ANOVA (e.g., > A/B), the F statistic for factor A should be calculated as F_A = MS_A / > MS_(B within A). But when I run this simple example: > > set.seed(1) > A = factor(rep(1:3, each=4)) > B = factor(rep(1:2, 3, each=2)) > Y = rnorm(12) > anova(lm(Y ~ A/B)) > > I get this result: > > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > A 2 0.4735 0.23675 0.2845 0.7620 > A:B 3 1.7635 0.58783 0.7064 0.5823 > Residuals 6 4.9931 0.83218 > > Evidently, R calculates the F value for A as MS_A / MS_Residuals. While it > is straightforward enough to calculate what I think is the correct result > from the table, I am surprised that R doesn't give me that answer directly. > Does anybody know if R's behavior is intentional, and if so, why? And, > perhaps most importantly, how to get the "textbook" result in the most > straightforward way? (I'd like to be able to give me students a simple > procedure...) > > Thanks, > > Daniel Wagenaar > > -- > Daniel A. Wagenaar, PhD > Assistant Professor > Department of Biological Sciences > McMicken College of Arts and Sciences > University of Cincinnati > Cincinnati, OH 45221 > Phone: +1 (513) 556-9757 > Email: daniel.wagenaar at uc.edu > Web: http://www.danielwagenaar.net > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Kevin Wright [[alternative HTML version deleted]]
On 31/10/15 03:32, Wagenaar, Daniel (wagenadl) wrote:> Dear R users: > > All textbook references that I consult say that in a nested ANOVA > (e.g., A/B), the F statistic for factor A should be calculated as F_A > MS_A / MS_(B within A). But when I run this simple example: > > set.seed(1) > A = factor(rep(1:3, each=4)) > B = factor(rep(1:2, 3, each=2)) > Y = rnorm(12) > anova(lm(Y ~ A/B)) > > I get this result: >> Analysis of Variance Table > > Response: Y Df Sum Sq Mean Sq F value Pr(>F) A 2 0.4735 0.23675 > 0.2845 0.7620 A:B 3 1.7635 0.58783 0.7064 0.5823 Residuals 6 4.9931 > 0.83218 > > Evidently, R calculates the F value for A as MS_A / MS_Residuals. > While it is straightforward enough to calculate what I think is the > correct result from the table, I am surprised that R doesn't give me > that answer directly. Does anybody know if R's behavior is intentional, > and if so, why? And, perhaps most importantly, how to get the "textbook" > result in the most straightforward way? (I'd like to be able to give me > students a simple procedure...)The formula that you specify is based upon factor "B" being a *random* effect. The lm() function handles *fixed* effects only, and thus treats "B" as a fixed effect --- whether this makes any sense or not is another story. (IMHO only random effects make sense as nested effects.) Kevin Wright has already told you how to get what you want/need using aov() and the Error() function. This works only for balanced designs, essentially. For more complicated designs you will need to dive into the nlme and lme4 packages. For which you will need *lots* of patience, determination, and luck! :-) cheers, Rolf Turner P. S. Please provide a useful *subject line* in your posts to this list. R. T. -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Thank you all for your helpful responses. I apologize for the lack of subject line. That is certainly not my habit. It happened because my first email was refused because it was sent using an incorrect "From:" line (an aliases email address, daniel.wagenaar at uc.edu) instead of the address I subscribed to the list with. When I resent it, I failed to copy the subject line. My apologies. - Daniel On 10/30/2015 03:12 PM, Kevin Wright wrote:> Maybe you want > > summary(aov(Y ~ A + Error(A:B))) > > Kevin > > > On Fri, Oct 30, 2015 at 9:32 AM, Wagenaar, Daniel (wagenadl) > <wagenadl at ucmail.uc.edu <mailto:wagenadl at ucmail.uc.edu>> wrote: > > Dear R users: > > All textbook references that I consult say that in a nested ANOVA > (e.g., A/B), the F statistic for factor A should be calculated as > F_A = MS_A / MS_(B within A). But when I run this simple example: > > set.seed(1) > A = factor(rep(1:3, each=4)) > B = factor(rep(1:2, 3, each=2)) > Y = rnorm(12) > anova(lm(Y ~ A/B)) > > I get this result: > > Analysis of Variance Table > > Response: Y > Df Sum Sq Mean Sq F value Pr(>F) > A 2 0.4735 0.23675 0.2845 0.7620 > A:B 3 1.7635 0.58783 0.7064 0.5823 > Residuals 6 4.9931 0.83218 > > Evidently, R calculates the F value for A as MS_A / MS_Residuals. > While it is straightforward enough to calculate what I think is the > correct result from the table, I am surprised that R doesn't give me > that answer directly. Does anybody know if R's behavior is > intentional, and if so, why? And, perhaps most importantly, how to > get the "textbook" result in the most straightforward way? (I'd like > to be able to give me students a simple procedure...) > > Thanks, > > Daniel Wagenaar > > -- > Daniel A. Wagenaar, PhD > Assistant Professor > Department of Biological Sciences > McMicken College of Arts and Sciences > University of Cincinnati > Cincinnati, OH 45221 > Phone: +1 (513) 556-9757 > Email: daniel.wagenaar at uc.edu <mailto:daniel.wagenaar at uc.edu> > Web: http://www.danielwagenaar.net > ______________________________________________ > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- > To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Kevin Wright