I am using R for the first time in one of my classes. My students have alerted me to a problem for which we have not found an answer. We find that some means returned by the model.tables function are not correct when missing data is present in analysis of variance problems. We have duplicated the problem using R 1.2.0, 1.2.1, and 1.2.2 under Windows 98 and several distributions of Linux (Redhat 7.0, Mandrake 7.2, SuSE 7.0, and 7.1). The situation is best illustrated with a small example of a randomized block design having three treatments and four blocks.> blocks<-factor(c(1,2,3,4,1,2,3,4,1,2,3,4)) > trtmnts<-factor(c(1,1,1,1,2,2,2,2,3,3,3,3)) > data<-c(10,12,9,11,13,15,11,16,18,22,17,19) > balanced<-aov(data~blocks+trtmnts) > summary(balanced)Df Sum Sq Mean Sq F value Pr(>F) blocks 3 28.250 9.417 10.273 0.008868 ** trtmnts 2 147.167 73.583 80.273 4.676e-05 *** Residuals 6 5.500 0.917 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1> model.tables(balanced,"means")Tables of means Grand mean 14.41667 blocks 1 2 3 4 13.667 16.333 12.333 15.333 trtmnts 1 2 3 10.50 13.75 19.00 Entering the data again and dropping treatment 2, block3 and treatment 3, block 4, we have:> blocks2<-factor(c(1,2,3,4,1,2,4,1,2,3)) > trtmts2<-factor(c(1,1,1,1,2,2,2,3,3,3,)) > data2<-c(10,12,9,11,13,15,16,18,22,17) > unbalanced<-aov(data2~blocks2+trtmts2) > summary(unbalanced)Df Sum Sq Mean Sq F value Pr(>F) blocks2 3 18.267 6.089 7.4341 0.0410993 * trtmts2 2 126.557 63.279 77.2587 0.0006367 *** Residuals 4 3.276 0.819 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1> model.tables(unbalanced,"means")Tables of means Grand mean 14.3 blocks2 1 2 3 4 13.67 16.33 13 13.5 rep 3.00 3.00 2 2.0 trtmts2 1 2 3 10.68 14.47 18.97 rep 4.00 3.00 3.00 We find that the treatment means (trtmts2) are incorrect although the number of replications indicated are correct. Block means (blocks2) are correct. The treatment means should be: 10.5, 14.67, and 19.0, respectively. Further investigation reveals that we encounter this problem whenever dealing with unequal replications or missing data. For example, with unequal subsamples, or missing data in factorial experiments. We can get the correct means by using regression techniques (lm) to solve the analysis of variance problems and extracting the fitted values from the appropriate lm model. Since I am learning R, perhaps I have missed something? Is this possibly a bug in the model.tables function? ------------------------------------------------------------ Gary Whysong, Associate Professor, Environmental Resources Morrison School of Agribusiness & Resource Management Arizona State University East Phone: (480) 727-1263, E-mail: gwhysong at Cactus.east.asu.edu ------------------------------------------------------------ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
I believe that these are correct, but you haven't told us why you think they are wrong. In the unbalanced case you need to understand carefully what to expect: your expected answer is definitely wrong, which suggests that R's might well be right. For the record, these R results agree with S-PLUS on your examples. But they are not much documented in the unbalanced case, so perhaps `for experts only'. The FAQ says: 9.1 What is a bug? .... If a command does the wrong thing, that is a bug. But be sure you know for certain what it ought to have done. If you aren't familiar with the command, or don't know for certain how the command is supposed to work, then it might actually be working right. Rather than jumping to conclusions, show the problem to someone who knows for certain. On Sat, 10 Mar 2001, Gary Whysong wrote: [...]> > blocks2<-factor(c(1,2,3,4,1,2,4,1,2,3)) > > trtmts2<-factor(c(1,1,1,1,2,2,2,3,3,3,)) > > data2<-c(10,12,9,11,13,15,16,18,22,17) > > unbalanced<-aov(data2~blocks2+trtmts2) > > summary(unbalanced) > Df Sum Sq Mean Sq F value Pr(>F) > blocks2 3 18.267 6.089 7.4341 0.0410993 * > trtmts2 2 126.557 63.279 77.2587 0.0006367 *** > Residuals 4 3.276 0.819 > --- > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > model.tables(unbalanced,"means") > Tables of means > Grand mean > > 14.3 > > blocks2 > 1 2 3 4 > 13.67 16.33 13 13.5 > rep 3.00 3.00 2 2.0 > > trtmts2 > 1 2 3 > 10.68 14.47 18.97 > rep 4.00 3.00 3.00 > > We find that the treatment means (trtmts2) are incorrect although the > number of replications indicated are correct. Block means (blocks2) are > correct. > > The treatment means should be: 10.5, 14.67, and 19.0, respectively.Why? These are *model*.tables not *data*.tables. You have to adjust for block effects, and they are unbalanced. Blocks 3 and 4 have lower responses than 1 and 2, and they are missing for treatments 2 and 3. Seems to adjust correctly to me. Note the order of terms matters in unbalanced models. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Here is my assessment: The following seems pertinent to making sense of the result from model.tables(,type="means"). As I see it, users should be warned off using model.tables() with data from designs in which treatments are not balanced over blocking factors. df <- data.frame(bl=factor(c(1:4,1,2,4,1,2,3)), tr <- factor(rep(1:3,c(4,3,3))), y=c(10,12,9,11,13,15,16,18,22,17)) options(digits=4) # First fit the block effect, unadjusted for treatments blocks.aov <- aov(y~bl, data=df) # Take residuals from these block effects res <- residuals(blocks.aov) # Fit treatment effects to these residuals res.aov <- aov(res~df$tr) b <- summary.lm(res.aov)$coef # EITHER (1) # Add overall mean to the fitted treatment effects # NB: The residuals above summed to 1. So the average of # the fitted treatment effects is zero.> mean(df$y) + b[1]+c(0,b[2:3]) # with options(digits=4)df$tr2 df$tr3 10.68 14.47 18.97> model.tables(df.aov, type="means")Tables of means Grand mean 14.3 bl 1 2 3 4 13.67 16.33 13 13.5 rep 3.00 3.00 2 2.0 tr 1 2 3 10.68 14.47 18.97 rep 4.00 3.00 3.00 # OR (2)> b <- summary.lm(res.aov)$coef > b[1]+c(0, b[2:3])df$tr2 df$tr3 -3.6250 0.1667 4.6667> # Equivalent to model.tables(df.aov)---------------------------------------------------------- The objection to the treatment effects from model.tables() is that they are based on deviations from block means that have not been adjusted to allow for the different relative numbers of the different treatments in those blocks. The effect can be severe when different blocks have greatly different relative numbers of different treatments. Thus model.tables() should not be used even for balanced incomplete block designs. # Choices that are defendable include:> #1 Add treatment effects from above least squares fit, > # constrained to sum to zero, to overall mean. > options(contrasts=c("contr.sum","contr.poly")) > b <- summary.lm(aov(y~bl+tr, data=df))$coef > mean(df$y)+ c(b[5:6], -sum(b[5:6]))tr1 tr2 10.16 13.67 19.07> #2 Add treatment effects from least squares fit, > # constrained to sum to zero, to mean of block means. > b[1] + c(b[5:6],-sum(b[5:6]))tr1 tr2 10.50 14.01 19.41> #3 Use estimates from lme, with blocks as random effectsBrian Ripley, responding to Gary Whysong, wrote:> For the record, these R results agree with S-PLUS on your examples. > But they are not much documented in the unbalanced case, so perhaps > `for experts only'.. .> On Sat, 10 Mar 2001, Gary Whysong wrote: > > [...] > > > > blocks2<-factor(c(1,2,3,4,1,2,4,1,2,3)) > > > trtmts2<-factor(c(1,1,1,1,2,2,2,3,3,3,)) > > > data2<-c(10,12,9,11,13,15,16,18,22,17) > > > unbalanced<-aov(data2~blocks2+trtmts2) > > > summary(unbalanced) > > Df Sum Sq Mean Sq F value Pr(>F) > > blocks2 3 18.267 6.089 7.4341 0.0410993 * > > trtmts2 2 126.557 63.279 77.2587 0.0006367 *** > > Residuals 4 3.276 0.819 > > --- > > Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 > > > model.tables(unbalanced,"means") > > Tables of means > > Grand mean > > > > 14.3 > > > > blocks2 > > 1 2 3 4 > > 13.67 16.33 13 13.5 > > rep 3.00 3.00 2 2.0 > > > > trtmts2 > > 1 2 3 > > 10.68 14.47 18.97 > > rep 4.00 3.00 3.00 > > > > We find that the treatment means (trtmts2) are incorrect although the > > number of replications indicated are correct. Block means (blocks2) are > > correct. > > > > The treatment means should be: 10.5, 14.67, and 19.0, respectively. > > Why? These are *model*.tables not *data*.tables. You have to > adjust for block effects, and they are unbalanced. Blocks 3 and 4 have > lower responses than 1 and 2, and they are missing for treatments 2 and 3. > Seems to adjust correctly to me. > > Note the order of terms matters in unbalanced models.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
P.S. I notice that I forgot to define: df.aov <- aov(y ~ bl + tr, data = df) with df <- data.frame(bl=factor(c(1:4,1,2,4,1,2,3)), tr <- factor(rep(1:3,c(4,3,3))), y=c(10,12,9,11,13,15,16,18,22,17)) John Maindonald email : john.maindonald at anu.edu.au Statistical Consulting Unit, phone : (6125)3998 c/o CMA, SMS, fax : (6125)5549 John Dedman Mathematical Sciences Building Australian National University Canberra ACT 0200 Australia -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._