Hello all. This may be a trivially simple question to answer, but I'm a little bit stumped with respect to the calculation of the F statistics in nested anovas in R. If I understand correctly, the F statistic for the among-subgroups but within groups hypothesis is calculated as MS_subgroups/MS_error, while the F statistic for the factor is calculated as MS_factor/MS_subgroups (I'm getting this from Sokal & Rohlf's _Biometry_). However, as I understand the output from R, it calculates the F for the factor as MS_factor/MS_error, which can significantly change the results. As an example, I took the values from Sokal & Rohlf's example on mosquitos, which are as follows: cage animal length 1 1 a 58.5 2 1 a 59.5 3 1 b 77.8 4 1 b 80.9 5 1 c 84.0 6 1 c 83.6 7 1 d 70.1 8 1 d 68.3 9 2 a 69.8 10 2 a 69.8 11 2 b 56.0 12 2 b 54.5 13 2 c 50.7 14 2 c 49.3 15 2 d 63.8 16 2 d 65.8 17 3 a 56.6 18 3 a 57.5 19 3 b 77.8 20 3 b 79.2 21 3 c 69.9 22 3 c 69.2 23 3 d 62.1 24 3 d 64.5 Using the following R commands, I get this output for a nested anova:> model<-lm(length~cage/animal) > anova(model)Analysis of Variance Table Response: length Df Sum Sq Mean Sq F value Pr(>F) cage 2 665.68 332.84 255.70 1.452e-10 *** cage:animal 9 1720.68 191.19 146.88 6.981e-11 *** Residuals 12 15.62 1.30 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 According to the book and my understanding of nested anovas, the F statistic for the cage:animal component is correct, but the F statistic for 'cage' should be 332.84/191.19, giving a value of 1.741 which is not significant, and highly different than 255.70. Perhaps I've misunderstood, but could someone explain to me what R is doing? In order to guide you, I'm running linux and my R-version is: R 1.4.1 (2002-01-30). Copyright (C) 2002 R Development Core Team Thanks in advance, -- Matthew Norton nortonm at magellan.umontreal.ca D?pt. des Sciences Biologiques, Universit? de Montr?al C.P. 6128 Succ. centre-ville, Montr?al, Qc H3C 3J7 (514) 343-6111 x1233 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Matthew Norton <matthew.norton at umontreal.ca> writes:> Hello all. This may be a trivially simple question to answer, but I'm a little > bit stumped with respect to the calculation of the F statistics in nested > anovas in R. If I understand correctly, the F statistic for the > among-subgroups but within groups hypothesis is calculated as > MS_subgroups/MS_error, while the F statistic for the factor is calculated as > MS_factor/MS_subgroups (I'm getting this from Sokal & Rohlf's _Biometry_). > However, as I understand the output from R, it calculates the F for the > factor as MS_factor/MS_error, which can significantly change the results. > > As an example, I took the values from Sokal & Rohlf's example on mosquitos, > which are as follows: > > cage animal length > 1 1 a 58.5 > 2 1 a 59.5 > 3 1 b 77.8 > 4 1 b 80.9 > 5 1 c 84.0 > 6 1 c 83.6 > 7 1 d 70.1 > 8 1 d 68.3 > 9 2 a 69.8 > 10 2 a 69.8 > 11 2 b 56.0 > 12 2 b 54.5 > 13 2 c 50.7 > 14 2 c 49.3 > 15 2 d 63.8 > 16 2 d 65.8 > 17 3 a 56.6 > 18 3 a 57.5 > 19 3 b 77.8 > 20 3 b 79.2 > 21 3 c 69.9 > 22 3 c 69.2 > 23 3 d 62.1 > 24 3 d 64.5 > > Using the following R commands, I get this output for a nested anova: > > > model<-lm(length~cage/animal) > > anova(model) > Analysis of Variance Table > > Response: length > Df Sum Sq Mean Sq F value Pr(>F) > cage 2 665.68 332.84 255.70 1.452e-10 *** > cage:animal 9 1720.68 191.19 146.88 6.981e-11 *** > Residuals 12 15.62 1.30 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > According to the book and my understanding of nested anovas, the F statistic > for the cage:animal component is correct, but the F statistic for 'cage' > should be 332.84/191.19, giving a value of 1.741 which is not significant, > and highly different than 255.70. > > Perhaps I've misunderstood, but could someone explain to me what R is doing?R is doing the same thing as SAS and Genstat and probably others: If you don't specify that there are multiple error components, it assumes that there is only one. So you get the decomposition of the sum of squares with everything compared to the residual. Effectively, this makes any test for a main effect if it appears in a significant interaction with another factor. Logically, this makes sense: You cannot talk about an overall cage effect if it differs between animals, *unless* you interpret differences between animals as random. To get a multistratum analysis try aov(length~cage+Error(cage:animal)) (Notice that this only works out correctly for balanced designs. In other cases, you may have to look into using lme().) -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._