Hello, I am getting output from anova() and summary(aov()) that depends on the order of the factors in the fitted model object, and this has me baffled. I see this dependency with the data.frame below but not with an example (table 6.4) from Montgomery's DOE book. This is with R 1.3.0 on Debian GNU-Linux. Where have I gone wrong?> centerptsrun sample CH50mg 1 day1 dev126 0.56 2 day1 dev126 0.70 3 day1 dev126 0.82 4 day1 dev126 0.72 5 day2 dev126 0.57 6 day2 dev126 0.60 7 day3 dev126 0.61 8 day3 dev126 0.64 9 day3 dev126 0.68 10 day3 dev126 0.68 11 day1 dev118 0.77 12 day1 dev118 0.80 13 day1 dev118 0.86 14 day2 dev118 0.71 15 day2 dev118 0.70 16 day3 dev118 0.77 17 day3 dev118 0.77 18 day3 dev118 0.77 19 day3 dev118 0.80 20 day1 rgf108 0.77 21 day1 rgf108 0.86 22 day1 rgf108 0.82 23 day2 rgf108 0.62 24 day2 rgf108 0.63 25 day3 rgf108 0.66 26 day3 rgf108 0.71 27 day3 rgf108 0.69 28 day3 rgf108 0.69> > > anova(lm(CH50mg~run+sample+run*sample,data=centerpts))Analysis of Variance Table Response: CH50mg Df Sum Sq Mean Sq F value Pr(>F) run 2 0.064308 0.032154 12.5597 0.0003343 sample 2 0.068649 0.034324 13.4075 0.0002337 run:sample 4 0.010444 0.002611 1.0199 0.4221699 Residuals 19 0.048642 0.002560> > > anova(lm(CH50mg~sample+run+run*sample,data=centerpts))Analysis of Variance Table Response: CH50mg Df Sum Sq Mean Sq F value Pr(>F) sample 2 0.061927 0.030964 12.0948 0.0004093 run 2 0.071029 0.035515 13.8725 0.0001931 sample:run 4 0.010444 0.002611 1.0199 0.4221699 Residuals 19 0.048642 0.002560 TIA, -- Robert Burrows New England Biometrics rbb at nebiometrics.com -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
On Fri, 26 Oct 2001, Robert Burrows wrote:> Hello, > > I am getting output from anova() and summary(aov()) that depends on the > order of the factors in the fitted model object, and this has me baffled. I > see this dependency with the data.frame below but not with an example (table > 6.4) from Montgomery's DOE book. This is with R 1.3.0 on Debian GNU-Linux. > > Where have I gone wrong? >In worrying about it? In a non-orthogonal design (ie most unbalanced designs) the sums of squares do depend on the order. In an orthogonal design they don't. This is because R uses sums of squares that are projections involving a nested sequence of models. Some packages report sums of squares that are based on comparing the full model to the models with each factor removed one at a time. The question of which set of sums of squares is the Right Thing provokes low-level holy wars on r-help from time to time. You can compute sums of squares comparing any two models you feel like by using anova(model1,model2) This probably should be a FAQ -thomas Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Robert Burrows <rbb at nebiometrics.com> writes:> Hello, > > I am getting output from anova() and summary(aov()) that depends on the > order of the factors in the fitted model object, and this has me baffled. I > see this dependency with the data.frame below but not with an example (table > 6.4) from Montgomery's DOE book. This is with R 1.3.0 on Debian GNU-Linux. > > Where have I gone wrong?In assuming that the order should not matter. Anova() gives the incremental SS, and in an non-orthogonal design the order *does* matter. You might want to try drop1(lm(CH50mg~run+sample)) and also anova(lm(CH50mg~run)) anova(lm(CH50mg~sample)) Also, if you remove one of the first four observations, you will get a balanced design and the order-dependence should disappear.> > centerpts > run sample CH50mg > 1 day1 dev126 0.56 > 2 day1 dev126 0.70 > 3 day1 dev126 0.82 > 4 day1 dev126 0.72 > 5 day2 dev126 0.57 > 6 day2 dev126 0.60 > 7 day3 dev126 0.61 > 8 day3 dev126 0.64 > 9 day3 dev126 0.68 > 10 day3 dev126 0.68 > 11 day1 dev118 0.77 > 12 day1 dev118 0.80 > 13 day1 dev118 0.86 > 14 day2 dev118 0.71 > 15 day2 dev118 0.70 > 16 day3 dev118 0.77 > 17 day3 dev118 0.77 > 18 day3 dev118 0.77 > 19 day3 dev118 0.80 > 20 day1 rgf108 0.77 > 21 day1 rgf108 0.86 > 22 day1 rgf108 0.82 > 23 day2 rgf108 0.62 > 24 day2 rgf108 0.63 > 25 day3 rgf108 0.66 > 26 day3 rgf108 0.71 > 27 day3 rgf108 0.69 > 28 day3 rgf108 0.69 > > > > > > anova(lm(CH50mg~run+sample+run*sample,data=centerpts)) > Analysis of Variance Table > > Response: CH50mg > Df Sum Sq Mean Sq F value Pr(>F) > run 2 0.064308 0.032154 12.5597 0.0003343 > sample 2 0.068649 0.034324 13.4075 0.0002337 > run:sample 4 0.010444 0.002611 1.0199 0.4221699 > Residuals 19 0.048642 0.002560 > > > > > > anova(lm(CH50mg~sample+run+run*sample,data=centerpts)) > Analysis of Variance Table > > Response: CH50mg > Df Sum Sq Mean Sq F value Pr(>F) > sample 2 0.061927 0.030964 12.0948 0.0004093 > run 2 0.071029 0.035515 13.8725 0.0001931 > sample:run 4 0.010444 0.002611 1.0199 0.4221699 > Residuals 19 0.048642 0.002560BTW: The interaction operator is ":" ~run*sample expands to ~sample+run+run:sample -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Many thanks to TL and PD for your replies. I clearly need to learn a bit more about anova(). -- Robert Burrows New England Biometrics rbb at nebiometrics.com -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._