Karl Knoblick
2007-Jun-21 23:05 UTC
[R] Result depends on order of factors in unbalanced designs (lme, anova)?
Dear R-Community! For example I have a study with 4 treatment groups (10 subjects per group) and 4 visits. Additionally, the gender is taken into account. I think - and hope this is a goog idea (!) - this data can be analysed using lme as below. In a balanced design everything is fine, but in an unbalanced design there are differences depending on fitting y~visit*treat*gender or y~gender*visit*treat - at least with anova (see example). Does this make sense? Which ordering might be the correct one? Here the example script: library(nlme) set.seed(123) # Random generation of data: NSubj<-40 # No. of subjects set.seed(1234) id<-factor(rep(c(1:NSubj),4)) # ID of subjects treat<-factor(rep(rep(1:4,each=5),4)) # Treatment 4 Levels gender<-factor(rep(rep(1:2, each=20),4)) visit<-factor(rep(1:4, each=NSubj)) y<-runif(4*NSubj) # Results # Add effects y<-y+0.01*as.integer(visit) y<-y+0.02*as.integer(gender) y<-y+0.024*as.integer(treat) df<-data.frame(id, treat, gender, visit, y) # groupedData object for lme gdat<-groupedData(y ~ visit|id, data=df) # fits - different ordering of factors fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id) anova(fit1) fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id) anova(fit2) # Result: identical (balanced design so far), ok # Now change gender of subject 1 gdat$gender[c(1,41,81,121)]<-2 # onece more fits with different ordering of factors fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id) anova(fit1) fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id) anova(fit2) # Result: There are differences!! Hope anybody can help or give me advice how to interpret these results correctly or how to avoid this problem! Is there a better possibility to analyse these data than lme? Thanks! Karl
Prof Brian Ripley
2007-Jun-22 15:26 UTC
[R] Result depends on order of factors in unbalanced designs (lme, anova)?
'anova' is rather a misnomer here. In terms of the description in ?anova.lme, you have When only one fitted model object is present, a data frame with the sums of squares, numerator degrees of freedom, denominator degrees of freedom, F-values, and P-values for Wald tests for the terms in the model ... but there are no 'sums of squares' shown. However, the crucial part of that help page is type: an optional character string specifying the type of sum of squares to be used in F-tests for the terms in the model. If '"sequential"', the sequential sum of squares obtained by including the terms in the order they appear in the model is used; else, if '"marginal"', the marginal sum of squares obtained by deleting a term from the model at a time is used. This argument is only used when a single fitted object is passed to the function. Partial matching of arguments is used, so only the first character needs to be provided. Defaults to '"sequential"'. so these are sequential fits (just like anova for an lm fit), and yes, sequential fits do in general depend on the sequence of terms. The issues of interpretation are exactly those of unbalanced linear models, and you will find advice on that in many places, e.g. in MASS. On Thu, 21 Jun 2007, Karl Knoblick wrote:> Dear R-Community! > > For example I have a study with 4 treatment groups (10 subjects per group) and 4 visits. Additionally, the gender is taken into account. I think - and hope this is a goog idea (!) - this data can be analysed using lme as below. > > In a balanced design everything is fine, but in an unbalanced design there are differences depending on fitting y~visit*treat*gender or y~gender*visit*treat - at least with anova (see example). Does this make sense? Which ordering might be the correct one? > > Here the example script: > library(nlme) > set.seed(123) > # Random generation of data: > NSubj<-40 # No. of subjects > set.seed(1234) > id<-factor(rep(c(1:NSubj),4)) # ID of subjects > treat<-factor(rep(rep(1:4,each=5),4)) # Treatment 4 Levels > gender<-factor(rep(rep(1:2, each=20),4)) > visit<-factor(rep(1:4, each=NSubj)) > y<-runif(4*NSubj) # Results > # Add effects > y<-y+0.01*as.integer(visit) > y<-y+0.02*as.integer(gender) > y<-y+0.024*as.integer(treat) > df<-data.frame(id, treat, gender, visit, y) > # groupedData object for lme > gdat<-groupedData(y ~ visit|id, data=df) > # fits - different ordering of factors > fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id) > anova(fit1) > fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id) > anova(fit2) > # Result: identical (balanced design so far), ok > # Now change gender of subject 1 > gdat$gender[c(1,41,81,121)]<-2 > # onece more fits with different ordering of factors > fit1<-lme(y ~ visit*treat*gender, data=gdat, random = ~visit|id) > anova(fit1) > fit2<-lme(y ~ gender*treat*visit, data=gdat, random = ~visit|id) > anova(fit2) > # Result: There are differences!! > > Hope anybody can help or give me advice how to interpret these results correctly or how to avoid this problem! Is there a better possibility to analyse these data than lme? > > Thanks! > Karl-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595